Complete network request and response, if the response header Content-Type
value is text/html
, then the next step is the browser 解析
and 渲染
work.
First, let's introduce the Resolution section, divided into the following steps:
- Construction of
DOM
tree 样式
Compute- Generate
布局树
(Layout Tree
)
Build a DOM tree
Since browsers can not be understood HTML字符串
, and therefore this series is converted into byte stream data structure in a meaningful and easy to operate, which is a data structure DOM树
. DOM树
It is to essentially a document
multi-tree root.
It is to be resolved it by what way?
Nature HTML grammar
First, we should have a clear grasp it: HTML is not grammar 上下文无关文法
.
Here, it is necessary to discuss what is 上下文无关文法
.
In computer science compiler theory disciplines, there is a very clear definition:
If a formal grammar G = (N, Σ, P, S) of production rules take the following form: V-> w, is called the context-free grammar. Which V∈N, w∈ (N∪Σ) *.
Wherein the meaning of G = (N, Σ, P, S) in the respective parameters explain:
- N is a non-terminal (as the name implies, that is to say it is not the last symbol, empathy below) set.
- Σ is the terminator collection.
- P is the start symbol, which must belong to N, i.e., non-terminal symbol.
- S is a collection of different productions. The S -> aSb like.
Plainly speaking, 上下文无关的文法
that is left of all the productions of the grammar is a nonterminal.
See here, if there is a little ignorant laps, I give an example that you will understand.
such as:
A -> B
复制代码
This grammar, each production will have left a nonterminal, this is 上下文无关的文法
. In this case, xBy
it must be possible statute out xAy
of.
Here we take a look to see a counter-example:
aA -> B
Aa -> B
复制代码
This is not the case 上下文无关的文法
, when faced with B
the time, we do not know in the end can not be out of the statute A
, depending on whether left or right side there a
exist, and that is context-sensitive.
About why it is 非上下文无关文法
, first of all we need to pay attention to that standard HTML syntax, it is consistent with 上下文无关文法
, and be able to reflect it 非上下文无关
is not a standard syntax . Here I take just one counterexample to prove.
For example, the parser to scan form
the label when context-free grammar approach is to directly create the corresponding form DOM object, but the real scene HTML5 is not the case, the parser will look at form
the context, if the form
parent tag label is form
then skip the current form
label, or just create a DOM object.
Conventional programming languages are context-free , but HTML contrary, it is precisely the non-context-free characteristics, determines HTML Parser
not to use conventional programming language parser to complete, requires a different approach.
Parsing algorithm
HTML5 specification describes in detail parsing algorithm. This algorithm is divided into two stages:
- Tokenization.
- Achievements.
Two corresponding process is the lexical analysis and parsing .
Tokenization algorithm
The algorithm input HTML文本
, output HTML标记
, has become a marker generator . Wherein the use of finite state machine automatically accomplished. I.e. when the current state, receiving one or more characters, will be updated to the next state.
<html>
<body>
Hello sanyuan
</body> </html> 复制代码
Through a simple example to show you 标记化
the process.
Encounter <
, state flag is ON .
Receiving [a-z]
characters, will enter the tag name status .
This state is maintained until the encounter >
, a mark indicating the name of the recording is completed, this time into a data state .
The next encounter body
label do the same process.
This time html
and body
marks are recorded good.
Now to the <body> of>, enter data state , then holding the received character so that a state behind Hello sanyuan .
Then receives the </ body> in the <
back flag is on , to receive the next /
post, which creates a time end tag
of token.
Then enter the tag name states , met >
back to data state .
Followed by the same processing style </ body>.
Achievements algorithm
Mentioned before, DOM is a tree with document
multi-tree root. Therefore, the parser will first create an document
object. Tag generator tag will transmit information to each contribution unit . Contribution device upon receiving a respective tag, will create the corresponding DOM object . Creating this DOM对象
post will do two things:
- Will be
DOM对象
added to the DOM tree. - Storing the corresponding tag is pressed into the opening (and
闭合标签
the corresponding mean) elements in the stack.
Or take the example below, he said:
<html>
<body>
Hello sanyuan
</body> </html> 复制代码
First, the state is initialized state .
Receiving the transmitted tag generator html
tag, this time becomes a state before html state . While creating a HTMLHtmlElement
DOM element, it is added to document
the root object, and push operation.
Then this automatically before head , at this time there came from the marker generator body
, not represented head
, this time contribution is automatically creates a HTMLHeadElement and added to DOM树
the.
Now go to in head state, then skip ahead to the After head .
Now tokenizer came the body
numerals, creating HTMLBodyElement is , inserted into DOM
the tree, while the press-open the mark stack.
Next state is changed in body , and then receives a series of characters that follow: the Hello sanyuan . Receiving first character you will create a Text node and wherein the characters are inserted, and the Text node into the DOM tree body元素
below. With receiving back characters that will be attached to Text on the node.
Now, the tokenizer pass over a body
closing tag, into the after body state.
Tokenizer last pass over a html
closing tag, into the after after body state, showing an analysis process ends.
Fault Tolerance
Mentioned HTML5
specifications, it would have a strong tolerance policy , fault tolerance is very strong, although we mixed, but I think as a senior front-end engineer, it is necessary to know HTML Parser
what had been done things in fault tolerance.
Next is WebKit in some of the classic examples of fault-tolerant, we found that there are other also welcome to add.
- Use </br> instead <br>
if (t->isCloseTag(brTag) && m_document->inCompatMode()) {
reportError(MalformedBRError);
t->beginTag = true;
}
复制代码
All replaced <br> form.
- Discrete form
<table>
<table>
<tr><td>inner table</td></tr> </table> <tr><td>outer table</td></tr> </table> 复制代码
WebKit
It will be automatically converted to:
<table>
<tr><td>outer table</td></tr> </table> <table> <tr><td>inner table</td></tr> </table> 复制代码
- Nested form elements
This time simply ignored inside form
.
Style computing
About CSS styles, its source is generally three types:
- link label references
- style tag style
- Inline style attributes of the element
Format Stylesheet
First, after the browser is not directly identify the CSS style of the text, therefore rendering engine receives the CSS text first thing is to convert it into an object-oriented structure, namely styleSheets.
The formatting process is too complicated, but for different browsers have different optimization strategies, there is not carried out.
In the browser console able document.styleSheets
to see the final structure. Of course, this structure contains these three sources of CSS, provides the basis for the following operating style.
Standardization style properties
Some CSS style value is not readily understood by the rendering engine, it is necessary before calculation of their standardized pattern, such as em
-> px
, red
-> #ff0000
, bold
-> 700
and the like.
Computing specific styles of each node
Style has been 格式化
and 标准化
, then you can calculate specific style information of each node.
In fact, the way computing is not complicated, mainly two rules: Inheritance and stacked .
Each child will inherit the parent node of the default style attributes, if the parent node is not found, will use the browser's default style, also called UserAgent样式
. This is the inheritance rules, very easy to understand.
Then the rules are stacked, CSS biggest feature is its layered nature, which is the ultimate effect depends on the style of interaction of each attribute, even a lot of strange layered phenomenon, read "CSS in the world," the students should have this deep experience, specific CSS cascading rules belong to in-depth language category, there is not much introduced.
But it is worth noting that After computing style, all style values will be hung on to window.getComputedStyle
them, that is, after the style can be obtained by calculating JS, very convenient.
Create a layout tree
Now it has been generated DOM树
and DOM样式
, the next thing to do is through the browser's layout system 确定元素的位置
, which is to generate a 布局树
(Layout Tree).
Generating a layout tree is substantially as follows:
- Traversing the DOM tree generated by the node, and add them to
布局树中
. - Calculating the coordinate position of the layout tree node.
Notably, the layout tree tree values contain visible elements for head
the label set and display: none
elements will not be put into it.
Some say will first generation Render Tree
, that is, rendering tree, in fact, this is the thing 16 years ago, and now Chrome team has done a lot of remodeling, has not generated Render Tree
the process of. The information Tree layout has been very perfect, complete with Render Tree
features.
The reason why the layout of the details do not speak, because it is too complicated, introduced one article would seem too bloated, but in most cases we only need to know that the work done is what you can, if you want in-depth principle which, you know it is how to do , I highly recommend you go read all articles FED team from the Chrome source code to see how the browser layout layout .
to sum up
Comb main context of this section: