[Interview question] What has happened from entering the URL to the rendering of the browser?

Preface:
    I believe that many developers have encountered this problem during interviews. This question can be said to be very, very difficult, because the depth can be very deep and the breadth can be very wide. This question is a question that can test the knowledge system of a front-end developer. Here, record it and deepen your memory.

1. General process

When you answer like this:

  • The user enters the url address, and the browser queries the DNS to find the corresponding requested IP address;
  • Establish a TCP connection;
  • The browser wants the server to send an http request. If the server returns a redirection such as 301, the browser sends the request again according to the location in the corresponding header;
  • The server receives the request, processes the html code generated by the request, and returns it to the browser. At this time, the html page code may be compressed;
  • The browser receives the response result from the server, and if there is compression, it first performs decompression processing, followed by page parsing and rendering;
  • The parsing process is divided into: parsing HTML, constructing DOM tree, attaching DOM tree and CSS style to construct presentation tree, layout, and drawing.
    Although the general process is correct, the answer cannot be answered in detail, and the depth is not enough.
    The face the interviewer will give you is: "Unfortunately, this is not the answer we want!"

2. Detailed process

Next, let us take off the coat of the details of each process and take a look together.

2.1 Enter address

The browser introduces DNS prefetching technology. It uses the existing DNS mechanism to resolve possible network connections in web pages in advance.
When we start to enter the URL in the browser, the browser is already intelligently matching the URL. It will find the url that may correspond to the entered string from the history, bookmarks, etc., find items that match the entered address, and then give a smart prompt so that you can complete the url address. Before the user presses the enter key, the browser has already started to resolve the domain name using DNS prefetching technology.
For the chrome browser, if there is a cache related to the domain name, it will display the webpage directly from the cache, that is, the page will come out before you press enter. If there is no cache, the resource will still be requested again.

2.2 Query DNS to find the corresponding requested IP address

Suppose you enter www.baidu.com, the approximate process:

  • The browser searches its own DNS cache;
  • If it is not found in the browser cache, it will be searched in the cache of the operating system. In this step, the hosts of the machine will also be searched to see if there is a corresponding domain name mapping;
  • If it is not in the system, go to your router to find it, because routers generally have their own DNS cache;
  • If not, the operating system sends the domain name to the local domain name server——recursive query, the local domain name server queries its own DNS cache, and returns the result if the search is successful, otherwise, iterative query is used. The local domain name server is generally provided by your network access server provider, such as China Telecom and China Mobile;
  • The local domain name server returns the obtained IP address to the operating system, and at the same time caches the IP address itself;
  • The operating system returns the IP address to the browser, and at the same time caches the IP address itself, so that the next time other users query, it can directly return the result and speed up network access;
  • So far, the browser has obtained the IP address corresponding to the domain name.

2.3 Establish a TCP connection

TCP is a connection-oriented transport layer protocol;
it can ensure that the communication between the two ends (the sending end and the receiving end) between the communication hosts is reachable;
it can handle abnormal situations such as packet loss and disordered transmission order during transmission ; In addition, it can effectively utilize broadband and relieve network congestion.

The steps of the three-way handshake: (abstract)

Client: hello, are you a server?
Server: hello, I am server, are you client?
Client: yes, I am client

After the TCP connection is established, HTTP requests can be sent.
Then, when the connection is disconnected, you need to wave four times (because it is full-duplex, so you need to wave four times)
Steps for four waves: (abstract)

Active party: I have closed the active channel to you, so I can only receive it passively
Passive party: Received the message that the channel is closed
Passive party: Then let me tell you, my active channel to you is also closed
Active party: The data is finally received, and the two parties cannot communicate after that

2.4 The server receives the request and responds to the HTTP request

After receiving and interpreting the request message, the server returns an HTTP response message.
The HTTP response consists of three parts: status line, message header, and response body .
Status code: consists of three digits, the first digit defines the category of the response, and there are five possible values:

  • 1xx : Instruction information – indicates that the request has been received and continues processing;
  • 2xx : Success – indicates that the request has been successfully received, understood, and accepted;
  • 3xx : Redirection – further action is necessary to complete the request;
  • 4xx : Client error – the request has a syntax error or the request cannot be fulfilled;
  • 5xx : Server Error – The server failed to fulfill a valid request.

Common status codes, status descriptions, instructions:

  • 200 OK: The client request is successful;
  • 400 Bad Request: The client request has a grammatical error and cannot be understood by the server;
  • 401 Unauthorized : The request is unauthorized, this status code must be used with the WWW-Authenticate header field;
  • 403 Forbidden: The server receives the request, but refuses to provide the service;
  • 404 Not Found: The requested resource does not exist, eg: wrong URL is entered;
  • 500 Internal Server Error: An unexpected error occurred on the server;
  • 503 Server Unavailable: The server is currently unable to process the client's request and may return to normal after a period of time.

HTTP message headers include: common headers, request headers, response headers, and entity headers. Details will not be introduced.

Response body: It is the content of the resource returned by the server.

2.5 The browser receives the response from the server and processes it

When the browser does not fully accept all HTML documents, it has already started to display this page. Different browsers may have different parsing processes. Here we only introduce the rendering process of WebKit.

The rendering steps can be roughly divided into the following steps:

  1. Parse HTML and build a DOM tree;
  2. Parse CSS and generate a CSS rule tree;
  3. Merge DOM numbers and CSS rules to generate a render tree;
  4. Layout render tree (Layout / reflow), responsible for the calculation of the size and position of each element;
  5. Draw the render tree (paint) and draw the page pixel information;
  6. The browser will send the information of each layer to the GPU, and the GPU will composite the layers and display them on the screen.

In the process of each explanation, WebKit provides many related classes to explain the corresponding internal modules step by step, which will not be described in detail here.
The following is a step-by-step explanation based on the above general process.

2.5.1 Constructing the DOM tree

When the browser parses the html file, the html parser in webkit interprets the HTML web pages and resources obtained from the network or local disk from the byte stream into the DOM tree structure. The specific process is
insert image description here
as follows: In WebKit, the process is as follows: first, it is a byte stream, and after decoding, it is a character stream, and then it will be interpreted into words (Tokens) by a lexical analyzer, and then constructed into nodes by a grammatical analyzer, and finally these Nodes are organized into a DOM tree.

In the process of parsing the html file, the browser will load it "top-down", and perform parsing and rendering during the loading process. During the parsing process, if you encounter a request for external resources, such as pictures, external chain CSS, iconfont, etc., the request process is asynchronous and will not affect the loading of html documents, and it will be handled by the Browser process uniformly, which makes Sharing of resources between different web pages becomes easy.

HTML interpretation, layout and rendering are basically done in the rendering thread (this is not absolute). Because the DOM tree can only be created and accessed on the rendering thread, which means that the process of building the DOM tree can only be performed in the rendering thread, but the stage from characters to words can be handed over to another separate thread.

And because of the DNS prefetching technology, when the user is browsing the current webpage, Chromium extracts the hyperlinks in the webpage, extracts the domain names, and uses relatively little CPU and network bandwidth to resolve these domain names or IP addresses. In this way, Users do not feel this process at all. When users click these links, it can save a lot of time, especially when the domain name resolution is slow.
During the parsing process, the browser first parses the HTML file to build a DOM tree, and then parses the CSS file to build a Render tree. After the Render tree is built, the browser starts to lay out the Render tree and put it together on the screen .

2.5.2 Interpreting CSS

The CSS interpretation process refers to the representation process from the CSS string to the internal rules of the rendering engine after being processed by the CSS interpreter .

After the style rules are generated, the style rules will be matched, and WebKit will select appropriate style information for some of the nodes (only visible nodes). The rule matching is calculated and obtained by the ElementRuleCollector class, which is based on the attributes of the elements, etc. , and get the set of rules from the DocumentRuleSets class, and match them one by one to get the style of the element according to the selector information such as ID, category, and label.

Finally, WebKit sorts the rules. For the style attributes required by this element, WebKit chooses from the higher-priority rules and returns the style attribute values.

From the point of view of the loading and rendering process of the entire web page, CSS interpretation and rule matching are after the establishment of the DOM tree, and before the establishment of the RenderObject tree, the results interpreted by the CSS interpreter will be saved, and then the RenderObject tree will perform specification matching and layout based on the results calculate. When the web page has actions such as user interaction or animation, JavaScript code can also modify CSS code very conveniently through technologies such as CSSDOM. At this time, WebKit needs to reinterpret the style and repeat the above process.

2.5.3 The rendering process encounters JavaScript

When a js file is encountered during document loading, the html document will suspend the rendering (loading, parsing and rendering synchronization) thread, not only waiting for the js file in the document to be loaded, but also waiting for the parsing to be executed before resuming the rendering thread of the html document . Because JS may modify the DOM, the most classic document.write, which means that the download of all subsequent resources may not be necessary before the execution of JS is completed, which is the root cause of js blocking subsequent resource downloads. So in our usual code, js is placed at the end of the html document.

And when encountering the execution of JavaScript code, WebKit first suspends the execution of the current JavaScript code, and uses the pre-scanner HTMLPreloadScanner class to scan the following words. If WebKit finds that they need to use other resources, then use the pre-resource loader HTMLPreloadScanner class to send the request, after which, the JavaScript code is executed. The pre-scanner itself does not create node objects, nor does it build a DOM tree, so it is faster.

When the DOM tree is built, WebKit triggers the "DOMContentLoaded" event, and the JavaScript function registered on this event will be called. When all resources are loaded, WebKit fires the "onload" event.

WebKit hands over the JavaScript code that needs to be executed during the DOM tree creation process to the HTMLScriptRunner class. The way it works is very simple, using the JavaScript engine to execute the code contained in the Node node.

The parsing of JS is done by the JavaScript engine in the browser. JS runs on a single thread, that is to say, only one thing can be done at the same time, all tasks need to be queued, and the next task can start after the previous task ends. However, there are some tasks that are time-consuming, such as IO read and write, etc., so a mechanism is needed to execute the tasks that are queued first. This sentence is: synchronous tasks and asynchronous tasks.
The execution mechanism of JS can be regarded as a main thread plus a task queue. Synchronous tasks are tasks executed on the main thread, and asynchronous tasks are tasks placed in the task queue. All synchronous tasks are executed on the main thread to form an execution stack; an asynchronous task will place an event in the task queue when the result of the asynchronous task is running; when the script is running, the execution stack will be run in sequence first, and then events will be extracted from the task queue. Running the tasks in the task queue, this process is repeated, so it is also called the event loop (Event loop).

2.5.4 Rendering composite Render tree

After HTML is interpreted by WebKit, a DOM tree is generated. After the DOM tree is built, WebKit will build the RenderObject tree for the DOM tree nodes, and then build the RenderLayer tree through the RenderObject tree.
The RenderObject tree is a new tree based on the DOM tree, and it is a new internal representation built for layout calculation and rendering mechanisms. There is not a one-to-one correspondence between RenderObject tree nodes and DOM nodes, because there are visible nodes (commonly used div img tags, etc.) and invisible nodes (such as head, meta tags), and invisible nodes will not form a RenderObject tree.
Web pages have a hierarchical structure and can be layered. One is for the convenience of setting the hierarchy of the web page, and the other is for the convenience of WebKit processing and to simplify the rendering logic.
Moreover, the RenderLayer node and the RenderObject node are not in a one-to-one correspondence, but a one-to-many relationship .

2.5.5 Layout

After WebKit creates RenderObject objects, each object does not know its position, size and other information. The process of WebKit calculating their position, size and other information based on the frame model is called layout calculation.
Layout calculation is a recursive process, because the size of a node usually needs to calculate the position, size and other information of its child nodes first .
When the animation of the user's web page, scrolling the web page, JavaScript code through CSSDOM, etc., there will be a re-layout.

2.5.6 Drawing

In WebKit, the drawing operation is the drawing context, and all drawing operations are performed in this context.

Drawing contexts can be divided into two types:

One is the 2D graphics context (GraphicsContext), the context used to draw 2D graphics;

The second is the 3D drawing context, which is used to draw 3D graphics.

The specific role of the 2D drawing context: provide the drawing interface of the basic drawing unit and set the drawing style. The drawing interface includes drawing points, drawing lines, drawing pictures, drawing polygons, drawing text, etc. The drawing style includes color, line width, font size, gradient, etc.

Regarding the 3D drawing context, its main use is to support CSS3D, WebGL, etc.

There are three ways to render web pages, one is software rendering, the other is hardware accelerated rendering, and the third can be said to be a hybrid mode.

Ideally, each layer has a drawing storage area, which is used to save the drawing results. Finally, it is necessary to merge the contents of these layers into the same image, which can be called compositing, and rendering using compositing technology is called compositing rendering.
Therefore, after completing the construction of the DOM tree, WebKit will call the drawing operation, software rendering or hardware accelerated rendering or both to draw the model and present it on the screen.
At this point, the browser rendering is complete.

at last

Now, when the interviewer asks you "from typing the URL to the completion of the browser rendering", is this what you think in your heart?
(Inner OS: back, I'm going to start pretending ( ))

Reference blog:

Interview questions from typing URL to browser rendering completion https://github.com/biaochenxuying/blog/issues/3

Guess you like

Origin blog.csdn.net/qq_26780317/article/details/126009526