Do you understand how browsers work? Talk about the process of browser parsing html

The html file is of the same nature as the txt text before the html tag is written, without any style. Just plain text preview files. Once the html tag is added, it means that the content has semantics! The browser's rendering engine will start parsing according to the semantics of the tag.

Webkit core working process

The browser rendering engine executes code (including HTML, CSS and JS) from top to bottom, parses html to generate a DOM tree, parses CSS code to generate a css rule tree, and merges the DOM tree and css rule tree to generate a render tree. To request some other resources, so the rendering engine will do a lot of things at the same time, and can't wait to render the content. If the code behind will change the previous style, it will cause reflow and redraw.

After the render tree is generated, the layer layout is calculated. The relative position and size information of all elements are calculated in this step, and then the page layer is converted into pixels. Finally, all layers are integrated to obtain the page

Summarize the basic workflow of the rendering engine:

  • Parse HTML to build DOM tree
  • render tree construction
  • render tree layout
  • Draw the render tree

The rendering engine parses the HTML document and converts tags into DOM nodes in the content tree. It parses style elements and style data from external files. Style data and display controls in HTML will be used together to create another tree - the render tree. The rendering engine will try to display the content as quickly as possible. It doesn't wait until all HTML has been parsed to create and lay out the render tree. It will display the processed partial content first while processing the subsequent content.

analyze

Parsing a document means translating it into a meaningful structure for use by code. The result of parsing is usually a tree of nodes representing the document, called a parse tree or a syntax tree.

Parsers usually divide the work between two components - the word breaker and the parser . The word segmentation program is responsible for dividing the input into legal symbol sequences, and the parsing program is responsible for analyzing the document structure and constructing the syntax tree according to the syntax rules.

  • The lexer knows how to filter extraneous characters like spaces, newlines, etc.
  • The tree output by the parser is composed of DOM elements and attribute nodes. The full name of DOM is: Document Object Model. It is the object description of HTML documents, and also the interface between HTML elements and the outside world (such as Javascript).

There is almost a one-to-one relationship between DOM and tags, such as the following tags:

 will be transformed into a DOM tree like:

Analytical algorithm

HTML cannot be parsed using the usual top-down or bottom-up methods. The main reasons are as follows:

  • The "tolerant" feature of the language itself
  • HTML itself may be broken, and for common ones, browsers need to have traditional fault-tolerant mechanisms to support them
  • The parsing process needs to be repeated. For other languages, the source code does not change during the parsing process, but for HTML, dynamic code, such as the document.write() method contained in the script element will add content to the source code, that is, the parsing process actually changes what is entered

Unable to use common parsing techniques, browsers created parsers specifically for parsing HTML. The parsing algorithm is described in detail in the HTML5 standard specification. The algorithm mainly includes two stages: tokenization and tree construction .

Tokenization parses the input into a sequence of tokens. Symbols in HTML are start tags, end tags, attribute names, and attribute values. The tokenizer recognizes these tokens and feeds them to the tree builder, which then continues to analyze and process the next token until the end of the input.

parsing blocking

In the process of parsing the HTML document, you will encounter a css file outside the link tag. At this time, the css file will be requested and parsed, but it will not block the first step of parsing the HTML file.

However, when parsing HTML, when encountering a js file outside the link tag or a js code inside a script, the parsing of the HTML file will stop

        The process of parsing JS files:

  1. The browser will request the js code and return it. At this time, the parsing of the HTML file will stop.
  2. But the parsing of CSS files will not stop, so a CSSOM tree will be constructed,
  3. When building the CSSOM tree, the returned js file will not be executed, and the JS file will be run only after the CSSOM tree is built.

  After the JS file is executed, the HTML continues to be parsed and a DOM tree is built.

Types of external JS

External JS can be divided into three types: common type, built-in defer attribute and built-in async attribute

1. Ordinary type

   External JS files are generally imported through script tags, and the default is synchronous loading , that is, the following HTML code continues to be executed after the js file is executed. When the browser parses the <script> tag, it will be executed by the js engine, and the content contained in the script tag will not be parsed, and the </script> tag ends.

<script src='abc'>console.log('1111')</script>
//不会输出1111

It has two placements on the page, the first is in the head tag, and the second is at the bottom of the body tag, in front of </body>.

   In the head tag: all js codes in the head tag will be downloaded first, and then the HTML code in the body tag will be parsed. When there are too many js files, the browsing will display blank in a short time, and the user experience is not good

   Inside the body tag: This will not affect the parsing of the html code, but for some web pages that rely on js, it will be slow.

   So the best way is to parse and download at the same time

2. Comes with defer attribute

   defer attribute: download the js file while parsing the HTML, put it in a sequence after downloading, and execute it after the DOM tree is created

    If there are multiple external defer js files, they will definitely be executed in the order of writing , and must be executed before the DOMContentLoad event, so the attention order is introduced

3. Comes with async attribute

   async attribute: download the js file while parsing the HTML, execute it immediately after the download is complete , and still block the html code parsing during execution

    If there are multiple external async js files, which one is downloaded first and executed first, not necessarily in the order of writing

Guess you like

Origin blog.csdn.net/m0_65335111/article/details/127400627