Using SRI to protect from malicious JavaScript

by Saptak Sengupta published on

At some point of developing a website, there might come a time where we need to progressively enhance using JavaScript. There are few different options of how you add JavaScript. Firstly, we can write our own script using vanilla JS only, and self host the JavaScript file. However, often we find ourselves needing to use a JavaScript framework or library to make our life a little easier. We can definitely download all the JavaScript framework and libraries we are using and self host them in our server, but most framework websites recommend using a CDN.

CDNs for JavaScript frameworks and libraries

CDN or Content Delivery Networks are a group of servers which stores duplicate copies of data and serves them to the user from the server closest to them to improve performance. CDNs are often used to serve static files like CSS and JS files by frameworks and libraries. Developers used CDNs for serving JS files not only because CDNs are usually faster, but also since a lot of commonly used frameworks served via CDN used to already be cached in your browser. For example, let’s say there are 3 websites - websiteA.com, websiteB.com. Both of these websites use jQuery. To use CDN, both of these websites can use the following code in their HTML file:

<script src="https://code.jquery.com/jquery-3.6.1.js"></script>

Now, when a user visits, websiteA.com for the first time, the browsers requests for the jQuery file from https://code.jquery.com CDN and used to cache it. So, when the user would visit websiteB.com, the jQuery file were already cached in the browser, thus improving performance a lot. However, most browsers have stoppped resource caching to prevent tracking users across websites.

Bar chart which provides a view of CDN usage for mobile sites broken up for top 1,000, 10,000, 100,000, 1 million and 10 million popular sites as per Google CRUX data. For Top 1,000 sites the CDN adoption is 64%. For the top 10,000 sites, it's 63%. For the top 100,000 sites, it's 52%. For the top 1 million sites, it's 37%. For the top 10 million sites, it's 29%.
CDN usage by site popularity on mobile as reported in Web Almanac 2022

Even though this change removes the performance improvements from resource caching, CDNs are still usually faster and simpler for serving CSS and JS files and does continue to be used widely. According to Web Almanac 2022, 64% of top 1,000 websites use CDN to serve various content.

Security issue

Using CDN to serve JavaScript comes with certain risks. One of the reasons privacy and security focused users are bit squirmish about JS is because a malicious JS can really harm your user. It can range from being misused for tracking and surveillance to compromising users private data. So, if all websites use the same CDN to serve a JS file, an attacker now needs to gain control of only the CDN server instead of all individual servers. Once they gain control, they can inject malicious code into the JS file, and it gets served to all end users of websites that use the CDN.

How does SRI help?

At this point, the solution to this seems to be to self host all your JavaScript codes instead of using a CDN. But that would be a step backward in terms of performance and not the favourable solution in many situations. This is where Subresource Integrity (or SRI) helps.

Subresource integrity is a way of telling the browser how to verify that the JS it has fetched from the CDN is an untampered JS file. SRI uses the very popular concept of file hashing to verify that the file is still the same. Let’s look into what a <srcipt> code with Subresource integrity to fetch jQuery would look like.

<script
src="https://code.jquery.com/jquery-3.6.1.js"
integrity="sha256-3zlB5s2uwoUzrXK3BT7AX3FyvojsraNFxCc2vC/7pNI="
crossorigin="anonymous">
</script>

The above code is what you get if you go to https://releases.jquery.com/ and click on the release you want to use. Now let’s break it down to see what it means.

integrity

The integrity attribute contains the file hash. The integrity attribute has 2 parts - the hashing algorithm and the hash of the file. In the above code, sha256 is the hashing algorithm which tells the browser which algorithm to use while computing hash of the received file. The part after sha256- is the base64 encoding of the expected hash of the untampered file provided by the code author. So when the browser comes across the above code, it fetches the jQuery file from the CDN server, generates the hash for the file it received and compares it with the hash mentioned in the integrity attribute. If the hashes don’t match, then the browser will refuse to execute the script and throw a network error saying fetching of the resources failed.

crossorigin

Since the CDN resource is not present in the same origin, the crossorigin attribute must be present to make the response eligible for integrity validation. Without the crossorigin attribute, the browser will load the script as if the integrity attribute was not present. We set the value of crossorigin attribute to "anonymous" to ensure that no user credentials or cookie informations are sent to the CDN server.

How to generate SRI?

In most cases, the framework or library we are importing using <script> tag should have the SRI code already mentioned in their documentation. For example, as we saw before, if we go to jQuery’s release page, and click on the release we want, the example code already has the integrity and crossorigin attribute values mentioned. Bootstrap also has the integrity and crossorigin attributes mentioned in their quickstart page. But there are many other frameworks that don’t in which case we will need to generate the hash by ourselves. Also, it’s recommended to use SRI for the scripts written by the developer for the website specifically. There can always be a scenario where the attacker gets access to only certain files, so SRI will ensure that the resources served by the server are untampered.

Bar chart showing percent of HTML elements with SRI using various hash functions. SHA-384 is used in 58.4% of elements with SRI in desktop
websites and 60.7% of elements with SRI in mobile websites. SHA-256 is used in 32.4% of elements with SRI in desktop websites and
30.8% of elements with SRI in mobile websites. SHA-512 is used in 10.9% of elements with SRI in desktop websites and 9.9% of
elements with SRI in mobile websites.
Percentage usage of SRI hash function as mentioned in Web Almanac 2022

To create a hash function, there are 3 different hash functions that are supported for SRI - sha256, sha384 and sha512. We need to use one of these 3 functions to compute the hash for the .js files and .css files and then use those values. The easiest way of doing this is using https://www.srihash.org/. All we need to do is paste the link to the resource we want to generate a hash for, and the website generates the HTML that we need to add. For example, if we wanted to get the HTML for the script tag to import react, all we need to do is paste https://unpkg.com/react@18/umd/react.production.min.js from React’s CDN page in the input field, chose the hash function we want, and it generates the following output:

<script src="https://unpkg.com/react@18/umd/react.production.min.js" 
integrity="sha384-tMH8h3BGESGckSAVGZ82T9n90ztNXxvdwvdM6UoR56cYcf+0iGXBliJ29D+wZ/x8"
crossorigin="anonymous">
</script>

But if you fancy doing it manually, especially for your local JavaScript files and CSS files, we can do that as well using openssl in a command line.

openssl dgst -sha384 -binary FILENAME.js | openssl base64 -A

The above code will generate the base64 encoded hash to be put in the integrity attribute. For example, if we wanted to create the hash for the same react file manually, firstly we will need to download the JS file from https://unpkg.com/react@18/umd/react.production.min.js, then open a terminal and enter openssl dgst -sha384 -binary react.production.min.js | openssl base64 -A. This gives me the output tMH8h3BGESGckSAVGZ82T9n90ztNXxvdwvdM6UoR56cYcf+0iGXBliJ29D+wZ/x8 for React 18.2.0. Since I used the hash function sha384 and with the above output, the value of my integrity attribute will be sha384-tMH8h3BGESGckSAVGZ82T9n90ztNXxvdwvdM6UoR56cYcf+0iGXBliJ29D+wZ/x8. Hence, we get a HTML same as the one we get from srihash.org

The reason I really love Subresource Integrity is because it’s such an important feature that is baked using HTML attributes. It shows that a lot can be achieved, even in making a website secure just by learning HTML better. Just add an integrity attribute and crossorigin attribute to ensure that untampered JS framework codes are fetched. Also, SRI is supported by <link> tags as well so that one can ensure that the CSS files are also fetched untampered. In episode 52 of Darknet Diaries, Jack Rhysider and Jonathan Klijnsma discuss about how a tampered modernizr JavaScript file might have caused the data breach in British Airways, that could have been avoided using SRI

About Saptak Sengupta

Saptak S. is a human rights centered web developer, focusing on usability, security, privacy and accessibility topics in web development. He works as a web development contractor. He is a contributor and maintainer of various different open source projects like The A11Y Project, OnionShare and Wagtail. He is also the author of the Security and Accessibility chapter of Web Almanac 2022. One can find him blogging at saptaks.blog.

Website/blog: saptaks.website
Mastodon: toots.dgplug.org/@saptaks
Twitter: @Saptak013/