In-depth study on the browser HTML parsing process

HTML

HTML parsing

HTML is parsed into a character bytes, character parsed into tokens, tag generation node, the node tree building process.

Tokenization algorithm

Lexical analysis, content parsing input into a plurality of markers. HTML tags include a start flag, end flag, attribute names and attribute values. Tokenizer identification mark is transmitted to the tree structure, and then accepts the next character to identify the next mark; and so forth until the end of input.

The output of the algorithm is an HTML tag. The algorithm uses a state machine to represent. Each state of receiving one or more characters from the input information stream, in accordance with these characters and a status update. The current tokenization state and the state tree structure will affect the decision to enter the next state

Tree construction algorithm

In the tree construction stage, Document for the root node of the DOM tree will continue to make changes, add a variety of elements to it.

Each node tokenizer transmitted are processed by the tree builder. DOM specification defines each mark corresponding element, these elements create upon receiving the appropriate marker. Not only will these elements are added to the DOM tree, and they add to the stack of open elements. This stack is used to correct errors and handling nested tag is not closed. The algorithm can also be used to describe a state machine. These states are called "Insert Mode."


<html>
    <body>hello</body> </html> 
  • Tokenization

    • The initial state is a state data.

    • When the <character is encountered, change the status “标记打开状态”. Az receives a character is created “起始标记”, change status “标记名称状态”. This state will be maintained until receiving> character. In the meantime received each character will be appended to the new tag name. In this example, we created a mark html tags.

    • When the> tag, it will send the current mark, the state changed back “数据状态”. Mark will perform the same process. Currently html and body tags have been issued. Now we return to the "data state." Upon receiving the Hello world character H is created and transmitted 字符标记until receiving </ body> in <. Hello world we will have to send each character a character mark.

    • Receiving </ body> in <, now we go back “标记打开状态”. When receiving the next input character / end tag token will be created and changed “标记名称状态”. We will maintain this status again, until receiving>. Then send the new mark, and back “数据状态”. Input will also perform the same process.

  • Tree construction

    • Tree construction stage is input from a stage of the tag标记序列

    • The first mode is initial mode" ."

    • After receiving the HTML tags to " before html" mode, and reprocess this tag in this mode. This will create a HTMLHtmlElement element, and attach it to the Document root object.

    • Then the state will be changed to before head" ." We receive the "head" tag. Even our example there is no "head" tag, the system will implicitly create a HTMLHeadElement, and add it to the tree.

    • Now we are entering the " in head" mode,

    • And then transferred to the " after head" mode. System for body re-marking process, create and insert HTMLBodyElement,

    • Meanwhile mode into body" ." Now, receiving a series of character string generated by the mark "Hello world". Will create and insert "Text" node receives the first character, but other characters will be added to the node

    • Receiving body closing tag will trigger the " after body" mode. Now we will receive the closing tag HTML,

    • Then enter the " after after body" mode. After receiving the end of file marker resolution process to an end. Operation after the end of parsing

    • When the HTML parsing is complete, the browser will mark the document as an interactive state, and start parsing scripts that are in "deferred" mode, that is, those scripts should be executed only after the document is parsed. Then, the document status is set to "complete", a "load" event will be fired.

Complete resolution process

 

Reference https://mp.weixin.qq.com/s/WtRxcyBbZQRcfFhfVJLBQA

 

Guess you like

Origin www.cnblogs.com/yiyi17/p/11031305.html