How-Browers-Work

 

##### Main features of the browser

The main function of the browser is to display the resources you choose, which are usually HTML documents, but also PDFs, images, and other formats. These resources are located by Uniform Resource Identifier (URI).

HTML documents are specified through the HTML and CSS specification classes. These specifications are maintained by the W3C organization.

 

##### High-level structure of the browser

1. **The user interface**
2. **The browser engine**
3. **The rendering engine**
4. **Networking**
5. **UI backend**
6. **JavaScript interpreter**
7. **Data storage**

###### The render engine

Default rendering shows HTML, XML and images. Support for presentations like PDFs via plugins or extensions.

The rendering engine, also known as the browser kernel. Different browsers use different rendering engines:

- Firefox --> Gecko
- Safari --> WebKit
- Chrome --> Blink

##### The main flow

* Parsing HTML to construct the DOM tree(content tree)
* Render tree construction (add style)
* Layout of the render tree (giving each node the exact coordinates)
* Painting the render tree (painted node using the UI backend layer)

This is a gradual process, and rendering causes the content to be displayed as quickly as possible for a better user experience. While some of the content is still being transmitted over the network, another part of the content has been parsed and displayed.

 

###### Parsing-general

Parsing a document means translating it into a **structure** that the code can use. The result of parsing is usually a tree of nodes representing the document structure, also called a parse tree or syntax tree.

###### Grammars

Parsing is based on the grammar rules that the document follows. The format that can be parsed must have a **determined grammar** composed of **vocabulary** and **grammar rules**, also known as **context-free grammar**.

###### Parser-Lexer combination

Parsing can be divided into lexical analysis and syntax analysis.

* lexical analysis: convert input into tokens (language vocabulary), which are effective building blocks
* syntax analysis: use language language rules

Obtain valid token blocks through **lexer**, and then use **parser** to structure these token blocks with grammar rules.

###### Translation

In some cases, the parse tree is not the final product, but an intermediate format used in translation. Such as compilation (compilation).

###### Formal definitions for vocabulary and syntax

* vocabulary is usually expressed by **regular expressions**
* syntax is usually defined in a format called **BNF**

The grammar of a language must be a context free grammar in order to be parsed regularly. An intuitive definition of a context-free grammar is that the grammar can be fully expressed by BNF.

###### Type of parsers

There are two types of parses:

* top down parsers: Start from the high level structure of the grammar and look for matching rules.
* bottom up parsers: Starting from the input, it is gradually transformed into grammar rules, from low level rules to high level rules.

###### Generating parsers automatically

Writing efficient parsers by hand is hard, but there are tools out there to automatically generate parsers from input grammars (vocabulary & syntax rules).

WebKit uses two well-known parser generators:

* Flex for creating a lexer
* Bison for creating a parser

###### HTML Parser

Parse HTML markup into parse tree.

###### The HTML grammar definition

The vocabulary and syntax of HTML are defined in **specifications** created by the W3C organization.

###### Not a context free grammar

Compared with the strict syntax of XML, HTML has a looser syntax, which makes HTML not contextually ungrammatical, and therefore unable to be parsed by conventional parsers.

###### DOM

The parsed output tree (parse tree) is a tree structure consisting of DOM elements and attribute nodes. DOM (Document Object Model) is the object notation of HTML and the interface of HTML elements.

###### The parsing algorithm

HTML cannot use conventional parsing techniques:

1. The permissive nature of the language
2. The browser's fault-tolerant support for notoriously invalid HTML
3. The parsing process is re-entrant, and calling document.write() will change the dynamic HTML.

The HTML parsing algorithm is given by the HTML5 specification, and the algorithm consists of two phases: tokenization (lexical analysis) and tree construction.

###### The tokenization algorithm

The output of this algorithm is an HTML token. It is expressed as a state machine.

###### Actions when the parsing is finished

After HTML parsing is complete, start parsing the script that executes the "deferred mode". The document state will be set to "complete" and the "load" event will be fired.

###### CSS parsing

CSS is a context-free grammar and can be parsed directly with a regular parser. The CSS lexical and syntax grammars are defined in the CSS specification.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325825323&siteId=291194637