What happened after entering the url in the browser

A page is entered from the URL to the page is loaded and displayed, what happened in the process?
It mainly includes the following basic steps:

  1. Enter the URL in the address bar of the browser and press Enter.
  2. The browser checks whether the current URL has a cache, and compares whether the cache is expired.
  3. DNS resolves the IP corresponding to the URL.
  4. Establish a TCP connection based on IP (three-way handshake).
  5. HTTP initiates the request.
  6. The server processes the request and the browser receives the HTTP response.
  7. Render the page and build the DOM tree.
  8. Close the TCP connection (waves four times).
    Next, let’s talk about some of the steps

1. URL

After entering the URL, it will be parsed (the essence of the URL is the Uniform Resource Locator)
URL generally includes several parts:

  • protocol, protocol header, such as http, encrypted https, ftp, etc.
  • host, host domain name or IP address
  • port, port number (usually the port number is not common because most of them are hidden because the default port is used, such as HTTP default port 80, HTTPS default port 443.)
  • path, directory path
  • query, the query parameter
  • fragment, the hash value after #, which is generally used to locate a certain position

Knowledge points that other interviewers may ask: Same-origin strategy, cross-domain questions (to be added)

2. Cache

According to the logic in the figure below, determine whether to directly use the cached content or re-request resources from the server

 

3.DNS domain name resolution

We know that the domain name entered in the address bar is not the real location of the last resource, the domain name is just a mapping with the IP address. There are so many IP addresses of the network server, it is impossible for us to remember a string of numbers, so the domain name is generated. The process of domain name resolution is actually the process of restoring the domain name to an IP address .
First, the browser first checks whether the local hosts file has this URL mapping relationship, and if there is, it calls the IP address mapping to complete the domain name resolution.
If it is not found, it will look up the local DNS resolver cache, if it is found, it will return.
If it is still not found, the local DNS server will be searched, and if it is found, it will be returned.
Finally, iterate the query and find the IP address in the order of root domain server -> top-level domain, .com -> second-level domain, baidu.com -> subdomain, www.baidu.com .

4. TCP connection

After the first step of DNS domain name resolution, the server's IP address is obtained. After the IP address is obtained, a connection will be established. This is done by the TCP protocol, which is mainly connected through a three-way handshake.

  • The first handshake: When the connection is established, the client sends a syn packet (seq=x) to the server, and enters the SYN_SENT state, waiting for the server to confirm;
  • The second handshake: When the server receives the syn packet, it must confirm the client's SYN (ack=x+1), and at the same time send a SYN packet (seq=y), that is, the SYN+ACK packet, and the server enters the SYN_RECV state;
  • The third handshake: The client receives the SYN+ACK packet from the server, and sends an acknowledgment packet ACK (ack=y+1) to the server. After the packet is sent, the client and server enter the ESTABLISHED (TCP connection successful) state, which is completed three times shake hands.

After the three-way handshake is completed, the client and server begin to transmit data.

5. The browser sends an HTTP request to the server

A complete HTTP request includes three parts: the request start line, the request header, and the request body.

 

Common request headers (parts)

Accept: 接收类型,表示浏览器支持的MIME类型
(对标服务端返回的Content-Type)
Accept-Encoding:浏览器支持的压缩类型,如gzip等,超出类型不能接收
Content-Type:客户端发送出去实体内容的类型
Cache-Control: 指定请求和响应遵循的缓存机制,如no-cache
If-Modified-Since:对应服务端的Last-Modified,用来匹配看文件是否变动,只能精确到1s之内,http1.0中
Expires:缓存控制,在这个时间内不会请求,直接使用缓存,http1.0,而且是服务端时间
Max-age:代表资源在本地缓存多少秒,有效时间内不会请求,而是使用缓存,http1.1中
If-None-Match:对应服务端的ETag,用来匹配文件内容是否改变(非常精确),http1.1中
Cookie: 有cookie并且同域访问时会自动带上
Connection: 当浏览器与服务器通信时对于长连接如何进行处理,如keep-alive
Host:请求的服务器URL
Origin:最初的请求是从哪里发起的(只会精确到端口),Origin比Referer更尊重隐私
Referer:该页面的来源URL(适用于所有类型的请求,会精确到详细页面地址,csrf拦截常用到这个字段)
User-Agent:用户客户端的一些必要信息,如UA头部等

6. The browser receives the response from the server

After the server receives the HTTP request sent by the browser, it encapsulates the received HTTP message into an HTTP Request object, and processes it through different Web servers. The processed result is returned as an HTTP Response object, mainly including status Code, response header and three parts of response message.
The status code mainly includes the following parts

  • 1xx: Indication information-indicates that the request has been received and continues to be processed.
  • 2xx: Success-indicates that the request has been successfully received, understood, and accepted.
  • 3xx: Redirection-to complete the request must take further action.
  • 4xx: Client error-The request has a syntax error or the request cannot be fulfilled.
  • 5xx: server-side error-the server failed to fulfill a legitimate request.
    The response header is mainly composed of Cache-Control, Connection, Date, Pragma, etc.
    The response body is the information returned by the server to the browser, which is mainly composed of HTML, css, js, and image files.
    Common response headers (parts):
Access-Control-Allow-Headers: 服务器端允许的请求Headers
Access-Control-Allow-Methods: 服务器端允许的请求方法
Access-Control-Allow-Origin: 服务器端允许的请求Origin头部(譬如为*)
Content-Type:服务端返回的实体内容的类型
Date:数据从服务器发送的时间
Cache-Control:告诉浏览器或其他客户,什么环境可以安全的缓存文档
Last-Modified:请求资源的最后修改时间
Expires:应该在什么时候认为文档已经过期,从而不再缓存它
Max-age:客户端的本地资源应该缓存多少秒,开启了Cache-Control后有效
ETag:请求变量的实体标签的当前值
Set-Cookie:设置和页面关联的cookie,服务器通过这个头部把cookie传给客户端
Keep-Alive:如果客户端有keep-alive,服务端也会有响应(如timeout=38)
Server:服务器的一些相关信息

 

7. Page rendering

I mentioned http interaction earlier, so the next step is to get the html from the browser, then parse and render

  1. Parse HTML and build DOM tree
  2. Parse CSS and generate CSS rule tree
  3. Combine the DOM tree and CSS rules to generate a render tree
  4. Layout render tree (Layout/reflow), responsible for the calculation of the size and position of each element
  5. Draw the render tree (paint), draw the page pixel information
  6. The browser will send the information of each layer to the GPU, and the GPU will composite the layers and display it on the screen

  • Before the browser receives the complete HTML file, it starts to render the page. When it encounters an externally linked script tag or style tag or picture, it will send an HTTP request again to repeat the above steps. After receiving the CSS file, the rendered page will be re-rendered, adding their proper styles, and the image file will be displayed in the corresponding position immediately after loading. In this process, redrawing or rearrangement of the page may be triggered. Two important concepts are involved here: Reflow and Repaint.
  • Reflow, also called Layout, is called reflow in Chinese. It generally means that the content, structure, position or size of the element has changed, and the style and rendering tree need to be recalculated. This process is called Reflow.
  • Repaint, Chinese repainting, means that when the change of the element only affects some of the appearance of the element (for example, background color, border color, text color, etc.), you only need to apply the new style to draw the element at this time. , This process is called Repaint.
  • So the cost of Reflow is much higher than the cost of Repaint. Each node in the DOM tree will have a reflow method. The reflow of a node is likely to cause the reflow of the child node, or even the parent node and nodes of the same level.

8. Close the TCP connection or keep the connection

Close the connection by waving four times (FIN ACK, ACK, FIN ACK, ACK).

  • The first wave is after the browser sends the data, it sends a FIN request to disconnect.
  • The second wave is that the server sends an ACK to agree. If the server also sends a FIN request to disconnect at this time, it doesn’t seem wrong, but considering that the server may still have data to send, the server sends FIN for the third time. Wave your hands.
  • In this way, the browser needs to return an ACK to agree, which is the fourth wave of hands.

 

Guess you like

Origin blog.csdn.net/qq_43737121/article/details/114643926