This happened from the URL input to the page display! (Expanded explanation)

This question is really a commonplace question. As a front-end or back-end, you should remember it by heart. This process can be very rough or can be discussed in detail. It shows many network-related knowledge points, so it must be sorted out. under

Insert picture description here
Generally speaking, it is divided into the following processes

  • URL input

  • DNS resolution

  • TCP connection (three-way handshake) connection multiplexing

  • Send an HTTP request (the four parts of the request)

  • Server sends HTTP response

  • Close the TCP connection (waves four times)

  • The browser parses and renders the page

1. URL input

First of all, we enter the URL in the browser. The Chinese name of the URL is called Uniform Resource Locator. Uniform Resource Locator is a concise representation of the location and access method of resources available on the Internet. It is a standard resource on the Internet. address. Every file on the Internet has a unique URL, which contains information that indicates the location of the file and how the browser should handle it.

Main components: protocol (protocol): // hostname (hostname)[:port] (port) / path (path) / [;parameters] (parameters)
Here we should pay attention to the same-origin policy that the browser follows, our front-end access interface commonly encountered problems when cross-domain, where domain is a collection of all the agreements, domain name and port number, with the agreement that domain, domain name and port number are the same, no one is a different cross-domain
on For cross-domain and cross-domain solutions, please refer to the following;

2. DNS resolution

The process of DNS resolution is the process of finding which machine has the resources you need. When you enter an address in the browser, such as www.google.com, it is not actually the address in the real sense of the Google website. The unique identification of every computer on the Internet is its IP address, but the IP address is not easy to remember. Users prefer to find other computers on the Internet with easy-to-remember URLs, that is, the Baidu URL mentioned above. Therefore, Internet designers need to make a trade-off between user convenience and usability. This trade-off is the conversion of a URL to an IP address, and this process is DNS resolution. It actually acts as a translator, realizing the conversion of web addresses to IP addresses. How is the process of URL to IP address conversion performed?

2.1 DNS resolution process

Insert picture description here
DNS lookup sequence :
browser cache -> operating system cache -> local host file -> router cache -> ISP DNS cache -> top-level DNS server / root DNS server
analysis process to find the IP address of www.google.com: . -> .com -> google.com. -> www.google.com.
Someone would be wondering how to add one more point. It's not that I typed one more. This. corresponds to the root domain name server. By default The last digit of all URLs is ., since it is by default, for the convenience of users, it is usually omitted. The browser will automatically add it when requesting DNS. The real resolution process of all URLs is:. -> .com -> google.com. -> www.google.com..

2.2 Two ways of DNS query: recursive query and iterative query

  1. Recursive resolution
    ​ When the local DNS server cannot answer the client's DNS query by itself, it needs to query other DNS servers. There are two ways at this time, as shown in the figure is the recursive way. The local DNS server is responsible for querying other DNS servers. Generally, it first queries the root domain server of the domain name, and then queries the root domain name server level by level. The finally obtained query result is returned to the local DNS server, and then the local DNS server returns to the client.
    Insert picture description here

  2. Iterative resolution
    When the local DNS server cannot answer the client's DNS query, it can also be resolved by iterative query, as shown in the figure. The local DNS server does not query other DNS servers by itself, but returns the IP addresses of other DNS servers that can resolve the domain name to the client DNS program, and the client DNS program continues to query these DNS servers until the query result is obtained. until. In other words, iterative analysis will only help you find the relevant server, but will not help you find it. For example: the ip address of the baidu.com server is 192.168.4.5, you can check it yourself. I am busy, so I can only help you get here.
    Insert picture description here

2.3 DNS load balancing

When a website has enough users, if each requested resource is located on the same machine, then this machine may jump off at any time. The solution is to use DNS load balancing technology. Its principle is to configure multiple IP addresses for the same host name in the DNS server. When responding to DNS queries, the DNS server will use the IP address recorded by the host in the DNS file for each query. Return different analysis results in order, and guide the client's access to different machines, so that different clients can access different servers, so as to achieve the purpose of load balancing. For example, according to the load of each machine, the machine is away from the user Geographical distance and so on.

2.4 DNS cache

In order to increase the access efficiency, the computer has a domain name caching mechanism. When a certain website has been visited and its IP is obtained, its domain name and IP will be cached. The next time you visit, you do not need to request the domain name server to obtain the IP and use it directly. The IP in the cache improves the response speed. Of course, the cache has a valid time. After the valid time has passed, if you request the website again, you still need to request domain name resolution first.

But the domain name caching mechanism may also cause trouble. For example, if the IP has changed, if you still use the IP in the cache to access, the access will fail. Another example is that the corresponding IP of a domain name is different when accessing the internal network and the external network. For example, when accessing the external network, the external network IP is mapped to the internal network IP. The same computer accesses this domain name in the external network environment, and then changes to the internal network to access this domain name. Under the effect of the DNS cache, it will also access the IP of the external network, causing the access to fail. According to the situation, you can manually clear the DNS cache or disable the DNS cache mechanism.
Enter: chrome://dns/ in your chrome browser, you can see the DNS cache of the chrome browser. The system cache is mainly stored in /etc/hosts (Linux system)

3. TCP connection (three-way handshake)

Insert picture description here

After the first step of DNS domain name resolution, the server's IP address is obtained. After the IP address is obtained, a connection will be established. This is done by the TCP protocol, which is mainly connected through a three-way handshake.

  1. First, the client sends a connection and request message: the client sends a syn packet to the server. At this time, the client is in a SYN_SENT state, waiting for the server to confirm.
  2. When the server receives the syn packet, it must confirm the client's SYN (ack=j+1), and at the same time send a SYN packet (syn=k), that is, the SYN+ACK packet, and the server enters the SYN_RECV state;
  3. The client receives the SYN+ACK packet from the server, sends an acknowledgment packet ACK (ack=k+1) to the server, and allocates resources. After the packet is sent, the client and server enter the ESTABLISHED (TCP connection successful) state and complete the three-way handshake Then the TCP connection is established

4. Send HTTP request

After the TCP connection is established, an http request is initiated. A typical http request header generally need to include the request method, for example GET, or POSTother unusual there PUTand DELETE, HEAD OPTIONand TRACE method
. Tell the server in the form of messages we need something, complete HTTP request contains the request start line , The request header and the request body are three parts.
Insert picture description here

Supplement (1) GET vs. POST

Reference answer: w3school: GET vs. POST
Insert picture description here
In fact, you can know when and which method is used by looking at the names of other request methods. This is a good example of semantics.
First, there is no substantial difference between GET and POST methods. The text format is different.
GET and POST are just two request methods in the HTTP protocol, while the HTTP protocol is an application layer protocol based on TCP/IP. Regardless of GET or POST, the same transport layer protocol is used, so there is no difference in transmission.
In the message format, when there is no parameter, the biggest difference is that the method name of the first line is different. When there is no parameter, the difference is only the first few characters of the
message. The first line of the POST method request message is like this POST /uri HTTP/1.1 \r\n
The first line of the GET method request message is like this. GET /uri HTTP/1.1 \r\nThe
difference
in the message with parameters : In the convention, the parameters of the GET method should be placed in the url In the POST method parameters should be placed in the body.
For example, if the parameters are name=qiming.c, age=22.
The simplified version of the GET method message looks like this

GET /index.php?name=qiming.c&age=22 HTTP/1.1
Host: localhost

The simplified version of the POST method message looks like this

POST /index.php HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded

name=qiming.c&age=22

Summary

  • GET-Request data from the specified resource. It has no side effects, is idempotent, and can be cached
  • POST-used to submit data to be processed to the specified resource, has side effects, is not idempotent, and cannot be cached
  • parameter. GET parameters are placed in the query parameters of the url, and POST parameters (data) are placed in the request message body.
  • Safety. GET is relatively safer than POST (only relatively safe)
  • GET URL has a length limitation, POST can transmit a lot of data, and GET parameters (url query parameters) have a length limitation, generally 1024 characters. The POST parameters (data) have no length limit (there are also 4~10Mb limits)
  • GET is used to read data, POST is used to write data, POST is not idempotent (idempotent means that no matter how many requests are sent, the result will be the same.)

Supplement (2) The difference between HTTPS and HTTP

You can refer to the detailed analysis of the difference between HTTP and HTTPS. The
main differences are as follows

  1. The https protocol needs to go to ca to apply for a certificate. Generally, there are fewer free certificates, so a certain fee is required.

  2. http is a hypertext transfer protocol, information is transmitted in plain text, and https is a secure ssl encrypted transfer protocol.

  3. HTTP and https use completely different connection methods and use different ports. The former is 80 and the latter is 443.

  4. The http connection is very simple and stateless; the HTTPS protocol is a network protocol constructed by the SSL+HTTP protocol for encrypted transmission and identity authentication, which is more secure than the http protocol.

5. The server returns an HTTP response

After the server receives the HTTP request sent by the browser, it encapsulates the received HTTP message into an HTTP Request object, and processes it through different Web servers. The processed result is returned as an HTTP Response object, mainly including status Code, response header, three parts of response body.
Insert picture description here

Supplement: common status codes

Insert picture description here

Generally everyone knows that the 404 page does not exist, 500 server error, 301 redirect, 302 temporary redirect, 200 ok, 401 unauthorized or something.

Focus on three status codes and related knowledge, they are 304 negotiation cache, 101 protocol upgrade, and 307hsts jump.

  • 304 Negotiation Cache
    Let's start with 304 Negotiation Cache. This is relatively basic knowledge. Believe me, as long as you mention 304 negotiation caching, the interviewer will definitely be tempted to ask you, what is negotiation caching?
    "The difference between negotiated caching and mandatory caching is that mandatory caching does not require access to the browser, and the returned result is 200". Forcible caching does not require access to the server and is directly obtained from the browser, which is 200 at this time. To negotiate the cache, you need to access the server. If it hits, it will modify the cache header information in the browser, and finally get the information from the browser. At this time, it is 304. If there is no hit, it will get the information from the server, which is 200 at this time. It's time for you to show off your rich knowledge of browser caching. I usually answer this: browser caching is divided into mandatory caching and negotiation caching, and the mandatory caching is read first. Mandatory cache is divided into expires and cache-control, and expires is a specific time, which is an older standard and cache-control is usually a specific length of time, relatively new, and has a higher priority. The negotiation cache includes etag and last-modified. The last-modified setting standard is the last modification time of the resource, and etag is a value calculated based on the content of the resource in order to deal with the situation where the resource modification time may be very frequent. Therefore, the priority is also higher. The difference between negotiated caching and forced caching is that forced caching does not require access to the browser, and the returned result is 200, and negotiated caching requires access to the server, and the returned result is 304.

  • The 101 protocol upgrade is
    mainly used for websocket, and can also be used for http2 upgrade. The characteristics and functions of websocket are not detailed, everyone is familiar with it.
    There are many benefits of http2. Generally speaking, it supports multiple requests for a single connection, binary, compressed headers, server push and other characteristics interviewers are more satisfied. The specific understanding is also on Google Baidu, and I will not go into details here. Of course, at this time we may have to deal with the interviewer’s next follow-up question: What is the difference between https, http, http2 and its original spdy, what are the advantages and disadvantages of each, and what links do they have to establish a connection? This knowledge It is necessary for readers to fully search and query by themselves.

  • 307 hsts jump
    This is relatively high-end, the original usage is to make the post request jump to the new post request, but it is also used for hsts jump. The full name of hsts is HTTP Strict Transport Security (HTTP Strict Transport Security, abbreviation: HSTS). The function is to require the browser to use https to access the site next time, instead of having to switch to https first. In this way, ssl stripping attacks can be avoided, that is, the attacker conducts an attack while the user uses http access, pretends to be a user to the server, uses https access between the attacker and the server, and uses http access between the user and the server. The specific method of use is to add Strict-Transport-Security to the server response header, and you can set max-age. Of course, the ssl stripping attack is mentioned. You must be very interested in what other methods can be used to attack the so-called secure https? What I learned here is an ssl hijacking attack, which is probably trusting a third-party security certificate, which is used by proxy software to monitor https. If there are more, welcome to add.

Only three status codes can involve such a wealth of knowledge. Regarding the status codes, we can't just recite the status codes and the corresponding meanings one-sidedly. We must actively dig, in-depth, and build our own network system with the help of http status codes. .

Common status codes: 200, 204, 301, 302, 303, 304, 400, 401, 403, 404, 500, 503 (must remember)

  • 200 OK means the client request is successful
  • 204 No Content succeeds, but does not return the body part of any entity
  • 301 Moved Permanently permanent redirection, the Location header of the response message should have the new URL of the resource
  • 302 Found Temporary redirection, the URL given in the Location header of the response message is used to temporarily locate the resource
  • 303 See Other The requested resource has another URI, and the client should use the GET method to obtain the requested resource.
  • 304 Not Modified server content is not updated, you can directly read the browser cache
  • 400 Bad Request means that the client request has a syntax error and cannot be understood by the server
  • 401 Unauthonzed indicates that the request is not authorized. This status code must be used with the WWW-Authenticate header field
  • 403 Forbidden indicates that the server received the request but refused to provide the service. The reason for not providing the service is usually given in the response body
  • 404 Not Found The requested resource does not exist, for example, an incorrect URL was entered
  • 500 Internel Server Error indicates that an unexpected error occurred on the server, which caused the client's request to be unable to be completed
  • 503 Service Unavailable means that the server is currently unable to process the client's request. After a period of time, the server may return to normal

6. Close the TCP connection (waves four times)

  1. The client initiates an interrupt connection request, that is, sends a FIN message. After the server receives the FIN message, it means "I have no data to send to you on the client", but if you still have data that has not been sent, you don't have to close the (Socket) in a hurry, and you can continue to send data.

  2. The server sends an ACK, "Tell the client that I have received your request, but I am not ready yet, please continue to wait for my message." wait: At this time, the client side enters the FIN_WAIT state, and continues to wait for the FIN message from the server side.

  3. When the server side determines that the data has been sent, it sends a FIN message to the client side, "Tell the
    client side, okay, I have finished sending the data here, and I am ready to close the connection".

  4. After the client receives the FIN message, "it knows that the connection can be closed, but he still doesn't believe in the network, because the server does not know to close it, so it enters the TIME_ _WAIT state after sending the ACK. If the server does not receive the ACK, it can Retransmit. "After the server receives the ACK, "it knows that it can be disconnected". The client side waits for 2MSL and still does not receive a reply, it proves that the server side has been closed normally, so good, my client side can also close the connection. Ok, the TCP connection is closed like this!
    Insert picture description here

7. The browser parses the rendered page

After the browser receives HTML, CSS, JS files

  • Parse HTML
  • Download CSS (cache
  • Parse CSS
  • Download JS (cache
  • Parse JS
  • Download pictures
  • Parse the picture
  • Render the DOM tree
  • Render style tree
  • Execute JS

Specifically (using webkit as an example) the
HTML document is parsed through the HTML parser to construct a DOM Tree, and the CSS that exists in the HTML is parsed through the CSS parser to construct Style Rules. The two are combined to form an Attachment.
Construct a Render Tree through Attachment
After the Render Tree is constructed, it enters the layout phase (layout/reflow), and each phase will be assigned an exact coordinate that should appear on the screen.
Finally, after all the nodes are traversed and drawn, a page is displayed.

This process is more complicated and involves two concepts: reflow and repain. I will talk about it later

When a js file is encountered during the document loading process, the html document will hang the thread of rendering (loading, parsing and rendering synchronization). Not only must the js file in the document be loaded, but also the parsing and execution can be completed before the rendering thread of the html document can be resumed. . Because JS may modify the DOM, the most classic document.write, which means that the download of all subsequent resources may not be necessary before the JS execution is completed. This is the fundamental reason why js blocks subsequent resource downloads.

​ JS parsing is done by the JS parsing engine in the browser. JS runs in a single thread, that is, only one thing can be done at the same time, all tasks need to be queued, the previous task ends, and the next task can start. However, there are some tasks that are time-consuming, such as IO reading and writing, so a mechanism is needed to execute the tasks that are ranked later, which are: synchronous tasks (synchronous) and asynchronous tasks (asynchronous).

​ The execution mechanism of JS can be seen as a main thread plus a task queue. Synchronous tasks are tasks that are executed on the main thread, and asynchronous tasks are tasks that are placed in the task queue. All synchronous tasks are executed on the main thread to form an execution stack; an asynchronous task will place an event in the task queue when it has a running result; when the script is running, the execution stack will be run in turn, and then the event will be extracted from the task queue and run Tasks in the task queue, this process is constantly repeated, so it is also called the event loop (Event loop).

Insert picture description here
Insert picture description here

Supplemental reflow and redraw

Each element in the DOM node exists in the form of a box model, which requires the browser to calculate its position and size, etc. This process is called relow; when the position, size and other attributes of the box model, such as color, font, etc., After the decision is made, the browser starts to draw the content. This process is called repain. The page will inevitably undergo reflow and repain when it is first loaded. The reflow and repain process is very performance consuming, especially on mobile devices, it will ruin the user experience and sometimes cause page freezes. So we should reduce reflow and repain as little as possible.

Reflow, also called Layout, is called reflow in Chinese. It generally means that the content, structure, position or size of the element has changed, and the style and rendering tree need to be recalculated. This process is called Reflow.
  Repaint, Chinese repainting, means that when the change of the element only affects some of the appearance of the element (for example, background color, border color, text color, etc.), you only need to apply the new style to draw the element at this time. , This process is called Repaint.
  So the cost of Reflow is much higher than the cost of Repaint. Each node in the DOM tree will have a reflow method. The reflow of a node is likely to cause the reflow of the child node, or even the parent node and nodes of the same level.
  The following actions are likely to be costly :
1. Adding, deleting, or modifying DOM nodes will cause Reflow or Repaint.
2. When moving the position of the DOM, or when making an animation.
3. The content has changed.
4. When modifying CSS styles.
5. When Resize the window (the mobile terminal does not have this problem), or when scrolling.
6. When modifying the default font of the webpage.

Basically, there are several reasons for reflow:
1. Initial, when the web page is initialized.
2. Incremental, some js is operating the DOM tree.
3. Resize, the size of some components has changed.
4. StyleChange, if the property of CSS has changed.

tip
This article summarizes the process from the input URL to the page display. This process is designed with more knowledge points. I have summarized some of them. If there are deficiencies, I hope to point out that the
references
occur from the input URL to the page display. What's up?

Guess you like

Origin blog.csdn.net/pz1021/article/details/105091624