What happened from entering the url to the completion of the page loading (interview)

It is a classic interview question. There is no standard answer to this question. It involves a lot of knowledge points. The interviewer will use this question to know which aspect of knowledge you are good at, and then continue to ask to see how well you have mastered it. . Of course, what I wrote is just my simple understanding. From the perspective of the front end, I think the answer must first include a few basic points, and then answer in depth based on your understanding.

1. Enter the URL in the address bar of the browser and press Enter.

2. The browser checks whether there is a cache for the current URL, and compares whether the cache is expired.

3. DNS resolves the IP corresponding to the URL.

4. Establish a TCP connection based on IP (three-way handshake).

5. HTTP initiates a request.

6. The server processes the request, and the browser receives the HTTP response.

7. Render the page and build the DOM tree.

8. Close the TCP connection (wave four times).

After talking about several key points of the whole process, let's expand on it.

1. URLs

Our common RUL is this: http://www.baidu.com, this domain name consists of three parts: protocol name, domain name, port number, where the port is hidden by default. In addition, the URL will also contain some paths, queries and other fragments, for example: http://www.tuicool.com/search?kw=%E4%. Our most common protocol is the HTTP protocol, in addition to the encrypted HTTPS protocol, FTP protocol, FILe protocol and so on. The middle part of the URL is the domain name or IP, followed by the port number. Usually the port number is not common because most of them use the default port, such as HTTP default port 80, HTTPS default port 443. Speaking of which, some interviewers may ask you the same-origin policy and deeper cross-domain questions, so I won't expand here today.

Two, cache

After talking about the URL, let's talk about browser caching . HTTP caching has various rules, which are classified according to whether it needs to re-initiate a request to the server. I divide it into mandatory caching and contrasting caching.
Mandatory cache judgment HTTP header fields: cache-control, Expires.

Expires is an absolute time, that is, server time. The browser checks the current time and uses the cache file directly if the expiration time has not yet expired. But there is a problem with this method: the server time may not be consistent with the client time. Therefore this field is rarely used.

The max-age in cache-control saves a relative time. For example, Cache-Control: max-age = 484200 means that after the browser receives the file, the cache is valid within 484200s. If cache-control and Expires exist at the same time, the browser always uses cache-control first.

The comparison cache is judged by the last-modified and Etag fields of HTTP.

last-modified is the field returned by the server when the resource is requested for the first time, indicating the time of the last update. The if-modified-since field is sent the next time the browser requests the resource. The server compares the local Last-modified time with the if-modified-since time. If they are inconsistent, the cache is considered to have expired and new resources are returned to the browser; if the time is consistent, a 304 status code is sent to let the browser continue to use the cache.

Etag: The entity identifier (hash string) of the resource. When the resource content is updated, the Etag will change. The server will judge whether the Etag has changed, and return a new resource if it changes, otherwise return 304.

insert image description here

3. DNS domain name resolution

We know that the domain name entered in the address bar is not the real location of the last resource, and the domain name is just a mapping to the IP address. There are so many IP addresses of network servers, it is impossible for us to remember a series of numbers, so the domain name is generated, and the process of domain name resolution is actually the process of restoring the domain name to the IP address.

First, the browser checks whether the local hosts file has this URL mapping relationship, and if so, invokes this IP address mapping to complete domain name resolution.

If not found, the local DNS resolver cache will be looked up, and if found, it will be returned.

If it is still not found, it will search the local DNS server, and return if it is found.

Finally, iteratively query and find the IP address in the order of root domain server -> top-level domain, .cn-> second-level domain, http://hb.cn -> subdomain, http://www.hb.cn.

insert image description here

For recursive query, press the upper level DNS server->upper level->...to find the IP address step by step.

insert image description here

4. TCP connection

After passing the DNS domain name resolution in the first step, the IP address of the server is obtained. After obtaining the IP address, a connection will be established, which is completed by the TCP protocol, and the connection is mainly made through three handshakes.

The first handshake: When the connection is established, the client sends a syn packet (syn=j) to the server, and enters the SYN_SENT state, waiting for the server to confirm;

The second handshake: The server must confirm the client's SYN (ack=j+1) after receiving the syn packet, and at the same time send a SYN packet (syn=k), that is, the SYN+ACK packet. At this time, the server enters the SYN_RECV state;

The third handshake: The client receives the SYN+ACK packet from the server and sends an acknowledgment packet ACK (ack=k+1) to the server. After the packet is sent, the client and server enter the ESTABLISHED (TCP connection is successful) state and complete three times shake hands.

After completing the three-way handshake, the client and server begin to transmit data.
insert image description here

5. The browser sends an HTTP request to the server

A complete HTTP request includes three parts: the request start line, the request header, and the request body.

insert image description here

6. The browser receives the response

After receiving the HTTP request sent by the browser, the server will encapsulate the received HTTP message into an HTTP Request object and process it through different web servers. The processed result will be returned as an HTTP Response object, mainly including the status There are three parts: code, response header, and response message.

The status code mainly includes the following parts

1xx: Instructions – Indicates that the request has been received and continues to be processed.

2xx: Success – Indicates that the request has been successfully received, understood, accepted.

3xx: Redirection – Further action is necessary to complete the request.

4xx: Client Error – The request has syntax errors or the request cannot be fulfilled.

5xx: Server-Side Error – The server failed to fulfill a legitimate request.

The response header is mainly composed of Cache-Control, Connection, Date, Pragma, etc.

The response body is the information returned by the server to the browser, mainly composed of HTML, css, js, and image files.

Seven, page rendering

If the content of the response is an HTML document, the browser needs to parse and render it to the user. The whole process involves two aspects: parsing and rendering. Before rendering the page, the DOM tree and CSSOM tree need to be built.

insert image description here

insert image description here

Before the browser receives the complete HTML file, it starts to render the page. When it encounters an externally linked script tag or style tag or image, it will send an HTTP request again to repeat the above steps. After receiving the CSS file, it will re-render the rendered page, adding their proper styles, and the image file will be displayed in the corresponding position immediately after loading. This process may trigger a redraw or reflow of the page. Two important concepts are involved here: Reflow and Repaint.

Reflow, also known as Layout, is called reflow in Chinese, which generally means that the content, structure, position or size of the element has changed, and the style and rendering tree need to be recalculated. This process is called Reflow.

Repaint, Chinese repainting, means that the change of the element only affects some appearance of the element (for example, background color, border color, text color, etc.), at this time, you only need to apply the new style to draw the element and it will be OK , this process is called Repaint.

So the cost of Reflow is much higher than the cost of Repaint. Each node in the DOM tree has a reflow method, and reflow of a node is likely to cause reflow of child nodes, even parent nodes and nodes of the same level.

The following actions are likely to be relatively expensive:

When adding, deleting, or modifying DOM nodes, it will cause Reflow or Repaint to move the position of DOM, or the content changes
when an animation is made. When modifying the CSS style , when resizing the window (the mobile terminal does not have this problem), or scrolling When modifying the default font of a web page, basically speaking, there are several reasons for reflow as follows:




Initial, Incremental when the web page is initialized
, some js Resize when operating the DOM tree
, and the size of some elements changes
StyleChange, if the CSS property changes to
Dirty, several Incremental reflows occur on the subtree of the same frame

Eight, close the TCP connection or continue to maintain the connection

Close the connection with four waves (FIN ACK, ACK, FIN ACK, ACK).

insert image description here

The first waving is when the browser sends a FIN request to disconnect after sending the data.

The second waving is when the server sends ACK to express its agreement. It seems that it is not wrong if the server also sends a FIN request to disconnect this time, but considering that the server may still have data to send, the server should send FIN for the third time. waved.

In this way, the browser needs to return ACK to agree, which is the fourth wave.

So far, the whole process from entering the URL in the browser address bar to presenting the page in front of you has been analyzed.

Guess you like

Origin blog.csdn.net/weixin_45127646/article/details/128419555