Lin Clark: Well, thank you, all. Thank you, all, very much for that welcome.
Why talk about the future of the browser? Because browsers are facing a challenge. Whether or not browsers meet this challenge could change the Web as we know it.
What is this challenge? Browsers need to get faster. Let’s look at the trajectory of the speed of the browser over the past two decades.
In the early days of the Web, speed really didn’t matter that much. We were just looking at static documents. So, as soon as that document was rendered to the screen, the browser was pretty much done with it. It might need to do a little bit more work if you scrolled the page up and down, but that work wasn’t too complicated.
Then people started pushing the boundaries. They started thinking, “What can we do with this Web besides just delivering static documents?” Pages started getting interactive. They started having animations.
Do you remember back when everybody went dropdown crazy, and we had drop-down menus everywhere? People created these fancy, sliding, up and down, and in and out ones with jQuery? Once those were part of the page, the Web page wasn’t just being painted once to the screen. With every change, it needed to be repainted. Sometimes, like when you had this motion, it needed to be repainted multiple times for that change to give you that sense of movement.
For every one of those changes, there were multiple repaints of the screen. If you wanted interactions and animations to look smooth, those repaints needed to be happening at a certain rate. There needed to be 60 of them every second. That meant that you only had 16 milliseconds to figure out what exactly the next version of this page should look like.
It’s not just the content authors that are pushing these boundaries either. It’s also hardware vendors. For example, the new iPad is going from 60 frames per second up to 120 frames per second. That means that the browser has half as much time to do just as much work.
New kinds of content are coming to the Web and pushing this even further. For example, with VR, you have two different displays, one for each eye, and they both need to be going at least 90 frames per second to avoid motion sickness. On top of that, a lot of these are at up to 4K resolution, which means you have a whole lot more pixels that you actually have to paint.
What happens if browsers don’t keep up? Well, as more and more people buy these new devices, and as more and more content moves towards these heavier applications, if browsers don’t keep up, people will stop seeing the Web as the default place where they should put their content. This could mean that the Web, as we know it, withers, which is a pretty scary thought.
But, to be honest, I’m actually not too worried about this. I’m confident that browsers can make this leap. The reason that I’m confident is that, at Mozilla, we’ve been prepping for this change for the past ten years. We’ve been looking at the direction that computer hardware is going. We’ve been figuring out the new way that we need to program to keep up with these changes.
The answer is parallelism. The future of the browser is parallel. We’ve only just started taking full advantage of this in Firefox, but we’re already seeing big wins from it, and every indication is that this new way of doing things can get the browser where it needs to be.
In this talk, I want to explain exactly what browsers need to change in order to keep up with these changes. But, before I do that, let’s talk about what the browser actually does. I’m going to start with the rendering engine. This is the thing that takes your HTML and CSS and turns it into pixels on the screen.
It does this in five steps. But, to make it simpler, I can split up these five steps into two different groups. The first group takes the HTML and CSS and figures out a plan. It figures out what the page should look like. This is kind of like a blueprint. It specifies where everything will go on the screen, and it specifies things like the widths, the color, and the heights of elements. Then the second group takes this plan, and then it turns that into pixels, the pixels that you see on your screen.
Now, let’s look more closely at each step in this process. The first step is called parsing. What the parser does is it turns the HTML into something that the browser can understand because, when this HTML comes into your browser, it’s just one big, long string of text. It’s kind of like a big, long paper ribbon that has a lot of characters all in a row. But, what we need is something different, something that the browser can actually use.
We need a data structure that tells us what the different elements on the page are and how they’re related to each other, like parent/child relationships. I think of this kind of like a family tree. We need to turn this long paper ribbon into a family tree of the page. What the parser does is it goes along this paper ribbon with a pair of scissors. When it seems the opening tag for an HTML element, it cuts out that HTML element, and it puts it on the wall into the family tree.
For example, if it came across a div, it would cut that out and put that into the family tree. Then the next element it comes across goes under that div. It draws a line to represent that parent/child relationship.
At the end of parsing, we have this family tree, the DOM tree. That tells us about the structure on the page and those parent/child relationships. What it doesn’t tell us is what these things should look like. In order to figure out what they would look like, we take the CSS that we’ve downloaded and figure out which styles apply to which of the elements in this tree. That’s the next step: CSS style computation.
I think of CSS style computation like a person filling in a form for each one of these elements in the tree. This form is going to tell us exactly what this element should look like. For example, it’s going to tell us what the color should be and what the font size should be. It has more than 200 form fields on it, one for every single CSS property. This form needs to be completely filled out.
For every element in this tree, we need to fill in all of the form fields on this form. But, for every element, there are going to be some properties that aren’t set in our CSS files. You aren’t going to type in 200 different declarations for everything that you have on your page, so we’re going to have to figure out values for those missing CSS properties. This means going through multiple steps for each element.
First, we figure out which rules in the CSS apply to the element. The rule is the thing between the braces, all of those properties that you declare. For example, if you have a paragraph and it has margin and padding, the rule contains both that margin declaration and the padding declaration. If we’re on a DOM element, if we’re going through this tree and we’re on a paragraph, a DOM element, then that matches; that rule matches that DOM element. Multiple rules can match a single DOM element.
This process of figuring out which rules apply to the DOM element that you’re on is called Selector Matching. We get this list of matching rules for every match to this particular element, and then we sort those rules. We figure out which is the most specific. Then we fill in the form field values based on that. We take whatever properties it has and fill that into the form.
After that, we go to the next most specific. If it has values for anything that we haven’t already filled in, we fill those in. We keep going down the list until we get to the end of this list of rules. But, we’re still going to have lots of empty form fields. If none of the matching rules contained a declaration for this property, then that property is going to be empty.
Some properties get a default value in this case. Others inherit from their parents. In that case, we just look at the parent and use whatever value it has. That process is called The Cascade.
At the end of CSS style computation, each element has its form completely filled out with all of the CSS property values, but there’s still a little bit more that we need to figure out. We need to figure out how wide and how high things are going to be and where they’re going to be on the page. The next step, Layout, is what takes care of this.
It looks at the dimensions of the browser window. From that, it calculates those relative values. If a div is 50% of its parent, it will figure out exactly what that means. It will also do things like break up a paragraph element into multiple elements, one for each line in the paragraph. That way it knows how many lines high the box needs to be to accommodate that whole paragraph.
The output of this step is another tree. In Firefox we call it the Frame Tree, but in other browsers, it’s called the Render Tree or the Box Tree. This is the ultimate plan. This is that blueprint. This is what tells us exactly what the page should look like.
Now we move on to the next part of rendering, which is turning that plan into the pixels on the screen. Before I get into the details of this, I want to talk about what that means, what it means to put pixels on the screen. You can think of the screen as basically a piece of graph paper. There are lots of tiny boxes, rows, and columns. When you’re rendering, what you’re doing is you’re coloring in each one of these boxes with a single color.
Of course, there’s not actually graph paper in your computer. Instead, there’s a part of memory called The Frame Buffer. In that part of memory, there’s a box that corresponds to every pixel. It says what color the pixel should be, and the screen checks that part of memory. Every 16 milliseconds, it will check that part of memory and see what the color should be for each pixel. Whatever colors are in that frame buffer whenever it checks are what would get shown on the screen.
But, we don’t just fill in this frame buffer once. We end up having to fill it in over and over and over again. Anytime you have that interaction or animation, any time you have a change to the page, you have to update that frame buffer. Even if the user is just highlighting something on the page, though, so even if you don’t have interactivity in your site, that act of highlighting means that the browser is going to need to refill this frame buffer as well.
This frame buffer can be pretty big. Depending on how big your screen is, you can have millions of pixels. That’s a huge piece of graph paper. It means that filling in pixels can take a really long time, especially when there are a lot of things that are rapidly changing on the page. Because of this, browsers have tried to figure out ways to reduce the amount of work that they have and make it faster.
One way that they’ve done this is by splitting off what gets rendered into multiple different layers. These layers are like you have in PhotoShop. I think of them like the layers that you would have had if you were doing animation in the old days, like those onion skin layers that they used to do Bugs Bunny cartoons on.
You have the background on one layer, and then the characters on another layer or multiple layers. Then if the characters move, you don’t need to repaint the background. The background just stays the same. It’s only that top layer that needs to change.
The first step in this process is called painting. That’s where you actually create these layers. The next step is called compositing. That’s where you take these layers and put them together. Then you basically take a picture of it. That is what goes to the screen. That’s how the page gets rendered.
Now, I framed this as the way that a Web page’s content goes from HTML and CSS to pixels. But, what you might not know is that there’s another part of the browser, the tabs and the URL bar, all of that stuff. That part is actually separate. It’s called the Browser Chrome.
In some browsers, like Firefox, rendering Browser Chrome is also handled by the rendering engine. You have this rendering engine, and it has two different tasks that it needs to do. It has to render the inside of this window, which is called the Content, and outside, which is called the Chrome.
This is one of the earliest places where parallelism was introduced to the browser. In 2008, you started seeing a browser take advantage of new hardware to run these all at the same time, independently of each other. But, it wasn’t us that did that. It was Chrome.
When Chrome launched, its architecture was already using parallelism like this. It’s called the Multi-process Architecture. That’s one of the reasons why Chrome was faster and more responsive than Firefox.
Now, I feel like I should take a step back here and explain what this all really means, what hardware changed specifically that Chrome was taking advantage of, and what that change made possible. So, let’s do a little crash-course in computers and how they work. You can think of a computer kind of like you think of a brain.
There are different parts of this brain. There’s a part that does the thinking, so that’s addition, subtraction, any logical operations like “and” or “or.” Then there’s some short-term memory. These two are pretty close together in the same part of that brain. Then there’s long-term memory.
These parts all have names. The part that does the thinking, that’s called the arithmetic and logic unit. The short-term memory, those are called registers. Those two are grouped together on the central processing unit or the CPU. The long-term memory is called random access memory or RAM.
Now, in order to get this brain to do anything, we need to give it an instruction. This instruction is going to tell us what we need to do with some bit of the short-term memory. Each box of the short-term memory has a label so that we can refer to it. Then we can use these labels in the instruction to say which value the instruction should act upon. For example, we could add a number to a value that’s in short-term memory to get a result, so we could add one to the value that’s in R4.
Now, one thing that you may have figured out from this is that we can only do one thing at a time. This brain can only really think one thought at a time. That was really true for most computers, from the earliest computers to the mid-2000s. Even though these computers have these limitations for all those years, though, they were still getting faster.
They are still able to do more. Every 18 months to 2 years or so, they were getting twice as fast. We could run twice as many instructions in the same amount of time.
What made it possible for these computers to get faster was something called Moore’s Law. The little electrical circuits that you use to build all these components, like the CPU, they were getting smaller and smaller. That meant that more and more of them could fit on a chip. With more of these building blocks, you can make more powerful CPUs. Also, there was less distance for electricity to travel between the circuits, so they were able to work together faster. But, of course, you can only make things so small, and there’s only so much electricity that you can course through a circuit before you start burning it up.
In the early 2000s, these limitations were starting to become apparent. Chip manufacturers had to think, how are they going to make faster and faster chips? The answer that they came up with was splitting up this chip into more than one brain, basically making it possible to think more than one thought at a time, in parallel. These separate brains that the CPU has are called cores. When you hear people talk about a multicore architecture, that’s what they’re talking about.
Even though each one of these cores, or each one of these brains, is limited in how fast they can think, if you add more of them, they can do more in the same amount of time. But, the thing is, in order to take advantage of this, you need to be able to split up the work that you have to do across these different cores, across your different brains. Unlike before where the speedups were happening automatically and programmers didn’t need to do anything, with this, for these speedups, they actual require programmers to change the way that they code. This is harder to do than you might think.
Imagine that two of these cores need to work with the same bit of long-term memory. They both need to add something to it. Well, what number is going to end up in long-term memory at the end of this calculation? Who knows? It depends on the timing of when the instructions run on the different cores.
Let’s walk through an example. We start with the number eight in long-term memory. Both cores need to add one to it. Our end result should be ten.
Instructions have to use things that are in short-term memory. They can’t act on long-term memory directly. Each core has its own short-term memory.
Let’s say that the first core pulls eight from long-term memory into its short-term memory, and then it adds one to get nine, and then puts the value back into long-term memory. That means that the other cores can now access the result of this operation. The long-term memory holds nine now, and the second core is going to pull nine into its short-term memory, add one to get ten, and then put ten back in long-term memory. That means our end result is ten, so all is well. But, it wasn’t guaranteed to end up this way.
Let’s see what happens when we change the order that the instructions happen on the different cores. The first core pulls eight from long-term memory. Then the second core pulls eight from long-term memory. You may already see where the problem is here.
Then the first core adds one to get nine and then puts nine back in long-term memory. Then the second core adds one to get nine and puts nine back in long-term memory. We end up with a result of nine, which is not what we wanted. This kind of bug is called a Data Race. When you have parallel code with shared memory, so two different cores working with the same part of long-term memory at the same time, you’re very likely to have these data races. One way to get around this is to choose paths that are pretty independent of each other so that they don’t need to share memory.
Now, let’s go back to the Chrome and content example that we were looking at before. You may remember that I said all of these are fairly independent of each other. That means that they’re perfect for this kind of parallelism where you don’t have to share memory between the cores. That’s called Core Screened Parallelism. That’s where you split up your program into some pretty large tasks that can be done independently of each other so that they don’t have to share memory, but they’re still going at the same time.
It’s actually pretty straightforward to do this. You just need to figure out those--
[Loud alarm ringing]
Lin: You just need to figure out--
Male: You’re all awake again.
Lin: [Laughter] Yeah, woke you all up. This core screened parallelism, it’s pretty straightforward. You just need to figure out those large, independent tasks. Chrome had this from the beginning. The Chrome engineers saw that they were going to need to have some level of parallelism to be fast with these new architectures.
Around the same time that Chrome was seeing this change in hardware and seeing that if they wanted to have a fast browser, they were going to need to take advantage of this parallelism, we were seeing the same thing. We knew that we were going to have to have this core screened parallelism in our browser too if we were going to keep up, and we do now, although it took us a while to get there. It was a multiyear effort.
We started testing our multi-process architecture in Firefox 48 with a small group of test users. But, it wasn’t until this past summer with Firefox 5 that we turned it on for all users.
Lin: It took us a while to get there because we weren’t starting fresh like Chrome was. We had a bumpier road. We were starting with this existing code base, which was developed before multicore architectures were common. And, we needed to figure out how to break apart this code base without breaking anything for our users while we were doing it.
We needed to plan for that. But, we didn’t just stop at making plans for this core screened parallelism. We saw that we were going to need to take it further because, when you have this kind of core screened parallelism, there’s a good chance that you’re still not making the best use of all of your cores.
[Microphone audio feedback]
Lin: Thank you. Thank you for reminding me where I was. [Laughter]
We saw we were going to need to take it further because, when you have this kind of core screened parallelism, there’s a good chance you’re still not making good use of all of your cores, of all of the hardware that’s in your machine.
Male: The audio was on.
Lin: [Laughter] For example--
Lin: --you might have one tab that’s doing a whole lot of work, but the others might not be doing much work at all. That means those other cores are sitting idle, so you’re not getting the kind of speedup that you could get from a parallel architecture.
We saw that if we wanted to make a browser that was really fast, we couldn’t just add this core screened parallelism. We need to add fine-grained parallelism too. What is fine-grained parallelism? Well, that’s when you take one of these big tasks, and you split it up into smaller tasks. These can be more easily split up across the different brains or across your different cores, but that does usually mean that you’re going to have to share memory between the cores. This opens you up to those data races that I was talking about before. These data races are nearly impossible to avoid when you’re sharing memory, and they’re incredibly tricky to debug.
The thinking at the time was basically that, to safely program in parallel, you had to have kind of a wizard level understanding of the language that you were working in. One of the distinguished engineers at Mozilla actually put a sign about eight feet high and said, “You must be this tall in order to write multithreaded code.”
Lin: Now, of course, when you have a project like an open source browser where you have thousands of contributors who are adding code to the code base, you can’t code in a way that requires a wizard level understanding of the language. If you do, you’re going to run into these data races for sure. These data races and other memory issues cause some of the worst security vulnerabilities in browsers. If we wanted to take advantage of this fine-grained parallelism without the peril that these data races introduce, we couldn’t just start hacking a parallel browser. We had to find a way to make it safe to do that.
Rather than starting a project to rewrite the browser, we started sponsoring a project to create a new language to write that browser in. This language is the Rust programming language. As part of its design, it makes sure that these kinds of data races can’t happen. If you have code that would introduce these data races to your code base, it just won’t compile.
We started sponsoring work on Rust around 2009 or 2010. It wasn’t until 2013 that we actually started putting it to use in a browser, though. We started seeing whether or not it could create the browser that we wanted.
I don’t know if any of you have heard of the term “yak shaving” where you have to do one seemingly unrelated task before you get to the thing that you really meant to do. But at Mozilla, we have some pretty big yaks to shave.
Lin: In 2013, with this language in hand that allowed us to code in parallel without fear, we started looking at how we could really introduce fine-grained parallelism into the browser. The project that we started to do this is called Servo. We started by looking at the rendering engine pipeline and asking, “What happens if we parallelize all of the things?” This means we’re not just sending the different content windows with different pages to different cores.
We’re taking a single content window and splitting up the different parts of that page. That means if you have a site like Pinterest, each different pinned item can be processed separately from the others.
For example, for CSS, you could send each pinned item to get its CSS filled out by a different core. This means that you can speed up different parts of the rendering pipeline by however many cores you have, which means that, as chip manufacturers add more cores into the future, these pages are going to get faster and faster automatically.
This is the key. This is why fine-grained parallelism is so important, and this is why we spent so much time and risked so much in pursuing it because it wasn’t clear at the start of this project that it was actually going to work. Core screened parallelism is pretty straightforward, but this fine-grained parallelism created a language that made it safe. And then, implementing it in a browser, that was a tough research problem.
That time and effort have really paid off. We found out that these ideas work and that they work really well. Over the past year, we’ve started bringing pieces from Servo into Firefox. We’ve been doing this as part of our Project Quantum, which is a major speedup of Firefox that we’ve been working on for the past year. It’s kind of like replacing the different parts of a jet engine mid-flight.
One thing that we brought over is our parallel style engine called Stylo. That splits up all of the CSS processing across the different CPU cores, as I mentioned before. It uses a technique called work stealing to split up that work. Whenever a core runs out of work, it can steal work from the other cores. This makes splitting up work efficient, and it makes it possible to speed up CSS style computation by however many CPU cores you have.
Another piece that we’re bringing over is called Web Render. Web Render takes the paint and composite stages and combines them into a single rendering stage. It uses the hardware in a smart way to make painting the pixels faster, which means it can keep up with those larger displays. To do this, it uses another highly parallelized part of the computer, which is specifically meant for graphics. This is called the GPU or the Graphics Processing Unit.
The cores on the GPU are a little different from the cores on the CPU. Instead of having two, four, or six of them like you have on the CPU, there are hundreds or thousands of them. But, they can’t do things independently. They have to all be working at the same time on the same thing. You need to do a lot of planning if you want to maximize the amount of work that they can do at the same time, and that’s what Web Render does.
With Web Render, we can get rid of performance cliffs that trip up Web developers. For example, if you animate background color right now, your animations can start and stop. It can get janky. It can make it look janky because the paint and composite phase has too much work to do. Because of this, there are currently a lot of rules about what you should and shouldn’t animate.
But, what if application code could run in parallel? What if application code could take advantage of these multiple cores in the same way that the browser’s own code is doing?
Over the past few years, browsers have been adding features that make this possible. One that you may have heard of before is called Web Workers. Those have been in browsers for a few years now. They allow you to have JS code, which runs on different cores.
You may have also heard of shared array buffers, which started landing in browsers this past summer. Those give you the shared memory that I was talking about, which you often need if you have this fine-grained parallelism. But, like I talked about before, it’s pretty tricky to actually manage the shared memory on your own, and Web workers can be really hard to use. That’s why so few sites and applications are using them today, even though they’ve been around for a number of years.
It would be nice to have a language like Rust, which gives you those guarantees that you’re not going to have data races, which makes it easy to work across different cores, across these different workers without having to do too much on your own. Well, there’s actually another standard that landed in browsers that can help with this. It landed this past year.
There are also other things about Web Assembly that help with application performance. It was designed for machines to run it quickly. It wasn’t designed to be easy for humans to write. The reason for that is because usually when people are writing Web Assembly, they’re not writing it by hand. Instead, they’re writing in a language like Rust, C, or C++ that compiles to Web Assembly. That means that Web Assembly doesn’t have to be easily readable by programmers. And so, this means that the engine doesn’t have to do that guesswork to figure out where it can take shortcuts with your code. This speed of Web Assembly, even without threading, even without multiple cores, is what’s making it possible to run PC games in the browser today.
These standards--Web works, shared array buffers, Web Assembly--they make it possible for applications to take advantage of parallelism too. For example, in a framework like React, you could rewrite the core algorithm in parallel. That way the work of figuring out the changes that it needs to make to the DOM, that could be happening across the different cores.
Ember is already starting to experiment with Web Assembly for their Glimmer VM. They may be able to start introducing some of this parallelism, too, and take advantage of it. I think we’re going to see a big shift towards using these standards in frameworks over the next few years.
Let’s get back to this challenge that we had. Here are the pieces of the puzzle. Here’s how we address this challenge. This is what browsers need to do to support the new devices and the new types of applications that are coming to the Web.
There’s the core screened parallelism of splitting the different content windows in the Chrome across different processes. There’s the fine-grained parallelism of splitting up the work of a single Web page so that it can be distributed across different cores. Then there’s enabling application code to be parallelized as well.
The core screened parallelism is already there in all of the browsers. Chrome was the first to do this, but pretty much all of the browsers have caught up by now. Enabling application code to go parallel, that’s something that’s happening in standards bodies, and it’s being adopted by all of the browsers.
It’s this fine-grained parallelism that’s still a question mark. This is where most browsers have done the least so far, and it’s actually not clear how to do it in most browsers because the C++ that most browsers are written in, it’s actually pretty hard to parallelize in this way. But, I think that all of the browsers are going to need to do this. We may be the first browser together. We may be the first browser to actually get this fine-grained parallelism in there and deliver the speedups, but we really want all of the browsers to get there too. We want them to all get there with us because that’s the way that we’re going to keep the Web going. That’s the way that we’re going to keep it healthy and vibrant no matter how much the browser’s limits are pushed.
Thank you to beyond tellerrand for having me, and thank you, all, for listening.