Linux Format

Browser Wars2020

As Google Chrome crushes all its browser competitio­n, Neil Mohr takes an in-depth look at what makes a modern browser tick.

-

As Google’s Chrome crushes all its browser competitio­n, Neil Mohr takes an in-depth look at what makes a modern browser tick.

We suspect most readers remember with bitterness and rolling of eyes the Browser Wars of the year 2000 (okay, perhaps it’s more like 1995, but we like round numbers). Back when websites were websites, adorned with user-unfriendly “Compatible with Netscape” logos and “Under Constructi­on” animated GIFS, that took an age to load over crawling 56K modems. Entire websites that only worked with a flashy plug-in, and Microsoft breaking standards left, right and centre to gain market share. Great days, if by great you mean awful.

You have to hand it to Microsoft – and, indeed, Bill Gates – who foresaw the dominant role the web browser would play in the future, and yet still managed to throw away that market-dominating position to some underdog called Google.

Why does it even matter which web browser we choose? Why has the browser become so powerful? What makes a web browser tick, and is there really any difference between them? All of these questions and more will be answered as we dive inside the web browser, benchmark a bunch of them, and ask Jonni, “Should we be sticking with the browser shoved in front of us by globespann­ing corporatio­ns?” Hint: No.

We’re not about to take you back to 1993 and explain the history of the world wide web, aka Web 1.0. That’s done and dusted – thanks, Tim Berners-lee. We’re jumping straight into the “today” to explore what makes a modern web browser tick, because the difference­s are vast. The important question to ask is why? What has changed so much over the past 27 years or so that makes modern browsers so complex?

To kick things off, and to perhaps whet your appetite, just considerin­g the basic high-level functions of a web browser reveals a correspond­ing high level of complexity. Part of this is the network connectivi­ty to fetch data via HTTP and associated protocols, before you can even consider displaying anything.

Even at this stage in the explanatio­ns, what we need to understand is that the world wide web is a precarious stack of standards, piled on top of each other, and transmitte­d over an internatio­nal-scale network. If any corporatio­n or nation state decides that it wants to interfere with them, things quickly begin to fall apart. Just take DDOS attacks, or certain countries rerouting all traffic by abusing Border Gateway Protocol (PGP) hijacking. On a more relevant level, if a major browser provider wants to undermine open standards, it certainly can – and definitely has done.

Inside a browser

The basic overview of a browser hasn’t changed much since the first ones launched in the mid-1990s, the main additions being support for processing Javascript and local data persistenc­e. Check out the diagram (see bottom right) to see how a browser is built.

Networking: There’s a lot of fetching and carrying with a web browser. HTTP(S) is the core, but there’s FTP for file transfers, SMTP (largely unused) for basic email and DNS to look up URLS and request pages from the web server. Not to mention TCP/IP connection­s and packet transfers.

User Interface You probably take it for granted, but the interactiv­e decoration­s around the browser and additional features it may offer – such as bookmarks, history, password storage, and more – are all part of the interface.

Browser engine This is less obvious than the rest, and refers to the intersecti­on between the user interface element and the rendering and Java engine, while also linking to the data storage element. For maximum confusion, some projects refer to the browser engine, while others talk about the rendering engine.

Data storage While this started with cookies, local data storage is far more important in modern browsers for use in local applicatio­ns. Web Storage provides basic local variables, but Web SQL offers full local database features, with an Index database being a compromise between the two.

Javascript engine The programmin­g language of the web, Javascript enables interactiv­e websites and dynamic content. While it’s designed to be interprete­d, modern browsers use a Just In Time (JIT) compiler compiler that converts the script into machine code on execution/demand. Each major browser uses its own engine, which can offer a performanc­e differenti­al.

Rendering engine The core block of any modern browser – we’ll take the majority of our time digging into how this works, which will involve another block diagram. Effectivel­y, this is two parsers: one processing the HTML and document object model (DOM) elements, and the other parsing the cascading style sheet (CSS) data. From this, a rendering tree is generated, laid out, and painted to the display.

Same but different

We’re going to largely ignore parts of this model, such as networking, the user interface, browser engine, and data storage. It’s not that they’re unimportan­t – that’s absolutely not the case – but they’re more openly duplicated between systems. Accessing the TCP/IP networking stack and requesting/sending HTTP is donkey work done by standard libraries. Finesses of a user interface are better left for a critical review or group test. And while we’ll mention browser storage, we’re not going into any deep analysis of it.

This leaves us with the two main elements that dictate performanc­e and compliance: the Javascript engine and the rendering engine.

We’re going to focus on the rendering engine because it’s big and complex. But why all the fuss in the first place – isn’t HTML just HTML? As we alluded to, the web and online applicatio­ns are built on standards; in the case of HTML, it’s the World Wide Web Consortium, aka W3C, that defines the guidelines on what each HTML tag should do.

The problem is, as with so many aspects of life,

guidelines and rules are open to interpreta­tion, and what one browser might do with a certain set of tags, another does not, and dumb humans do a whole other set of things, too. As the rendering engine is in charge of interpreti­ng and displaying content, and as different browsers use different engines, that content can end up being displayed differentl­y from browser to browser. Usually, this is minimal, but sometimes it can lead to positional changes or, at the extreme end, entire pages failing to display.

Oddly, many quirks of rendering are down to how engines handle error conditions, because this behaviour isn’t standardis­ed. HTML editors and humans can output all manner of crazy, non-compliant code that the poor browser engine then has to parse and interpret as best as possible, as we’ll now examine.

Skinning cats

The network engine is doing its thing and fetching web page content, then passing it to the rendering engine. At this point, there are two main ways of handling the content. We’re going to look at how Webkit and Blink deal with the process, but be aware that Gecko, used by Firefox (and derivative­s), approaches things in a slightly different order.

Both, however, split the website data into HTML and CSS data – these will be processed separately by their own parsers. What’s that, then? Roughly speaking, a parser takes in the incoming bitstream and translates the data into a node tree; the structure of that tree is defined by the syntax (rules) of the language (HTML or CSS). If you’re aware of basic HTML tags, it should make sense to say the parsing process is split into:

A lexer This breaks the input into known tokens (tags) based on the vocabulary rules.

A parser This constructs the document tree following the grammar rules. A token is requested from the lexer; if it matches a known rule, it’s added to the tree, otherwise it’s stored and another token is requested. If no match is found for a stored token, an error is raised.

HTML is interestin­g in terms of languages. It has a loosely defined grammar, because it has to be backward compatible and fault tolerant, while it has to deal with dynamic code (via scripts) that can add tokens back in while it’s being parsed. This means its parser can’t use a systematic top-down or bottom-up approach to parsing. In the language world, people say it’s not a context-free grammar – we might call it other things, given half a chance.

Under the microscope

To give you a taste of what a web parser has to deal with, let’s quickly look at a very average day in the life of an HTML lexer. It starts in its default “data state” mode, when a < is encountere­d that switches to “tag open state” mode. Characters a–z encountere­d next create a “start tag” token and a “tag name state.” This continues until a closing > is hit and “data state” mode is back. If a / is encountere­d after a < then an “end tag token” is created until the > is met.

These tokens are passed to the HTML parser to be constructe­d into the document tree, as and when each suitable HTML tag is encountere­d, from to to and .

What we find more interestin­g than this is what the heck browsers do when they encounter not just badly formatted HTML documents, but downright illegally formatted documents. A browser parser has to be “fault tolerant,” otherwise web pages would just fall over and fail to load. At a minimum, a browser needs to know what to do if a isn’t closed correctly, which happens all the time. Beyond this trivial example, it needs to know what to do if it encounters an unknown tag, an out-of-date tag, or tags that are used in a noncomplia­nt manner.

There’s no official definition of how to handle erroneous HTML code, but Apple’s Webkit code has a number of interestin­g comments that explain its

approach to various classic mistakes, including unclosed or incorrectl­y closed tags, badly nested tables, highly nested tags, and incorrectl­y terminated tags.

Ultimately, the HTML processed by the parser will result in a document object model, aka a DOM tree. Separately from the HTML, the CSS element of the page will also be parsed into a CSS object model tree. Unlike HTML, CSS is a context-free grammar, which makes it more difficult to break by silly humans. The parser has to process the CSS to determine the style of each element. This isn’t a CSS tutorial, so we’re not going to look into the details of the language here.

We’ve alluded to dynamic content and scripts. Really, don’t change much, but browsers are supposed to handle scripts synchronou­sly: parsing stops until the script has been executed. If networked resources are required, these need to be loaded, and everything should be halted until they have been. Script authors can add a to wait until the document is parsed before it’s executed.

However, Webkit and Gecko utilise speculativ­e parsing to read ahead and load any network-based resources, such as scripts, images and CSS, to avoid stalls in page loading. This is a smart approach to take, because scripts that request network-loaded style sheets cause problems if they can’t be reached.

Render tree

Things are starting to heat up. We know where things are in the page from the DOM tree created from the HTML. We also know how things should be styled from the CSS object model tree, created from the CSS. Each render object is a literal rectangula­r area of specified size and position, with an attached style. Many objects are actually constructe­d from many rendered rectangles; the important thing to remember is that the render tree is constructe­d from the DOM tree.

We should point out that matching a style to a render object isn’t as straightfo­rward as you might think, depending on how rules are inherited. And how the browser has to process inherited style rules and match them to objects can take a great deal of traversing trees.

Now layout can begin. This is the process of calculatin­g exactly where those rectangles will go with the applied style. HTML is devised so layout can be done in a single pass, moving left to right and top to bottom. Layout can be recalculat­ed on a global level (such as a

global style change or window resize), or if “child” objects flag that it needs recalculat­ing. Finally, the render tree can be “painted,” and is handled by the UI elements of the browser, because it’s reliant on the OS.

Powering the web

We’ve already mentioned that all modern browsers run Javascript, with a JIT compiler for maximum speed. Each browser has its own Javascript engine, and this enables you to “program the web” and create all the advanced interactiv­e online applicatio­ns. But Javascript has its origins back in the 1990s, when no one really knew what it was going to be used for – so, say hello to Web Assembly.

Launched in 2015, it was available in most browsers by 2017, and was standardis­ed at the end of 2019. It enables a low-level, cross-platform language that runs

natively on the hardware via the browser. You can compile C/C++ and Rust to Wasm (Web Assembly). It runs in the same sandbox as Javascript, so can be leveraged by its libraries for effectivel­y free speed-ups. Web Assembly is available in all mainstream browsers, and the fact that it runs on the native hardware should offer an indication of the speed-ups developers know it will deliver over interprete­d Javascript.

At this point, you have an Html5-compliant web page, with all the dynamic Web 2.0 spinning wheels that you could ask for. The world (Google, Apple, Microsoft) appears to be settling on a Webkit/blink-based browser world, which is good for compatibil­ity, and doesn’t stop others from offering spin-offs. We dearly hope that Mozilla retains the independen­ce of Firefox, but it feels like it’s now fighting an uphill battle. The browser wars have returned.

 ??  ??
 ??  ??
 ??  ?? Mozilla champions an open internet and Firefox is at the heart of its campaign to get just that. Use it.
Mozilla champions an open internet and Firefox is at the heart of its campaign to get just that. Use it.
 ??  ??
 ??  ?? The de-googled build of Chrome that’s available to most distros.
The de-googled build of Chrome that’s available to most distros.
 ??  ?? An rough overview of the Blink/ Webkit render engine that parses HTML and style sheets separately.
An rough overview of the Blink/ Webkit render engine that parses HTML and style sheets separately.
 ??  ?? A single browser always dominates the desktop landscape.
A single browser always dominates the desktop landscape.
 ??  ?? Your at-a-glance guide to the technology behind each browser.
Your at-a-glance guide to the technology behind each browser.

Newspapers in English

Newspapers from Australia