What is the whole process of an http request?

Know a few knowledge points before understanding http requests:

Principles of HTTP and browsers*
1. What happens when the browser enters the url to display the content
(1) The browser submits the requested URL to DNS domain name analysis, finds the real IP, and initiates a request to the server;
(2) The server sends it to the background After the processing is completed, the data is returned, and the browser receives the file (HTML, JS, CSS, image, etc.);
(3) The browser parses the loaded resources (HTML, JS, CSS, etc.) and establishes the corresponding internal data structure (such as the DOM of HTML);
(4) Load the parsed resource file, render the page, and complete.
2. The process of browser rendering
(1) The browser parses the acquired HTML document into a DOM tree.
(2) Process CSS tags to form a cascading style sheet model CSSOM (CSS Object Model).
(3) Merge DOM and CSSOM into a rendering tree (rendering tree) will be created, representing a series of objects to be rendered.
(4) The content contained in each element of the rendering tree is calculated, which is called layout layout. The browser uses a stream-processing method, and only one pass of the drawing operation is required to lay out all the elements.
(5) Draw each node of the rendering tree to the screen, this step is called drawing painting.*

3. How to solve cross-domain problems
(1) CORS cross-domain
backend modification request header
header('Access-Control-Allow-Origin:*'); URL
header('Access-Control-Allow-Method:POST, GET'); the way to allow access
(2) JSONP
(3) proxy mechanism

4. Redrawing and rearranging
(1) Rearranging is responsible for updating the geometric attributes of elements, and redrawing is responsible for updating the style of elements
(2) Redrawing will inevitably bring about redrawing, but redrawing may not necessarily bring about rearrangement
(3) Reduce redrawing Arrangement and redrawing: reduce js manipulation of collection attributes of dom elements to modify DOM cache layout information in batches

5. How does the browser implement caching (strong caching and negotiation caching)
(1) First judge whether it hits the strong cache according to the http header of the resource. If it hits, it will directly obtain the resource from the local cache, and if not, request the resource from the server .
(2) When the strong cache is not hit, the client will send a request to the server, and the server will verify whether the resource hits the negotiation cache through other request headers. This process becomes HTTP re-verification. If it is hit, the server will directly return the request without returning the resource , but tells the client to get it directly from the cache, and the client will get the resource directly from the client after receiving the return
(3) The commonality between strong cache and negotiation cache is that if the cache is hit, the server will not return the resource; the difference Yes: the strong cache does not send requests to the server, but the negotiation cache will send requests to the server
(4) When the negotiation cache fails, the server will return the resources to the client
(5) When ctrl+F5 forcefully refreshes the webpage, directly from the server Load, skip strong cache and negotiation cache
(6) When F5 refreshes the page, skip strong cache but check negotiation cache

6. Front-end storage technology (cookie, session, localStorage, sessionStorage)

                                                                http request process diagram

I. Introduction

  When we enter: http//:www.baidu.com in the browser bar, what exactly happened? How does this request reach the server and return the result?                                        

● URG: Urgent sign. An emergency flag of "1" indicates that this bit is valid. The characteristic is to let the data jump in the queue. URG=1 will be advanced to the first transmission in the cache.

● ACK: acknowledgment flag. Indicates that the confirmation number column is valid. This flag is set in most cases. The confirmation number (w+1) contained in the confirmation number column in the TCP header is the next expected sequence number, and at the same time indicates that the remote system has successfully received all the data.

● PSH: push sign. It is the URG at the receiving end, and the data j with PSH=1 is received as soon as possible. When this flag is set, the receiving end does not process the data in the queue, but transfers the data to the application as soon as possible. This flag is always set when dealing with interactive mode connections such as Telnet or rlogin.

● RST: reset flag. Used to reset the corresponding TCP connection.

● SYN: Synchronization flag. Connection request/connection acceptance. Indicates that the synchronization sequence number column is valid. This flag is only valid when the three-way handshake establishes a TCP connection. It prompts the server of the TCP connection to check the sequence number, which is the initial sequence number of the initial end of the TCP connection (usually the client). Here, the TCP sequence number can be thought of as a 32-bit counter ranging from 0 to 4,294,967,295. Each byte of data exchanged over a TCP connection is sequence numbered. The sequence number column in the TCP header contains the sequence number of the first byte in the TCP segment.

● FIN: end sign. has been sent, request to release the connection.                      

2. Overview

  1. The browser performs DNS domain name resolution (that is, the conversion process from domain name to IP address) to obtain the corresponding IP address
  2. According to this IP, find the corresponding server to establish a connection (three-way handshake)
  3. After establishing a TCP connection, initiate an HTTP request (a complete http request message)
  4. The server responds to the HTTP request, and the browser gets the html code (how the server responds)
  5. The browser parses the html code and requests resources in the html code (such as js, css, pictures, etc.)
  6. The browser renders the page and presents it to the user
  7. The server closes the TCP connection (four waves)

3. Detailed explanation of the process

1.DNS domain name resolution

  1. First, it will search the browser's own DNS cache (the cache time is relatively short, only about 1 minute, and can only accommodate 1000 caches);
  2. If it is not found in the browser's own cache, the browser will search the system's own DNS cache;
  3. If you haven't found it yet, try to find it from the hosts file;
  4. In the case that none of the previous three processes have been obtained, the browser will initiate a DNS system call, and it will send the preferred DNS server configured locally (usually provided by telecom operators, or DNS provided by Google can also be used) The server) initiates a domain name resolution request (through the UDP protocol to initiate a request to DNS port 53, this request is a recursive request, that is, the operator's DNS server must provide us with the IP address of the domain name).

2. TCP three-way handshake

  Three-way handshake: The server creates a new socket, binds the address information and starts listening, and enters the LISTEN state. After the client creates a new socket and binds address information, it calls connect, sends a connection request SYN, and enters the SYN_SENT state, waiting for the server's confirmation. Once the server monitors the connection request, it will put the connection into the kernel waiting queue, and send SYN and confirmation segment ACK to the client, and enter the SYN_RECD state. After receiving the SYN+ACK message, the client sends a confirmation message segment ACK to the server, enters the ESTABLISHED state, and starts reading and writing data. Once the server receives the confirmation message from the client, it enters the ESTABLISHED state and can read and write data.

  1. Why is the handshake three times instead of two or four?

  Answer: Twice is not safe, four times is not necessary. TCP communication needs to ensure that both parties have the ability to send and receive data. If the ACK response is received, it is considered that the other party has the ability to send and receive data. Therefore, both parties must send SYN to ensure that the other party has the ability to communicate.

  • The first handshake is that the client sends SYN, the server receives it, and the server concludes that both the sending ability of the client and the receiving ability of the server are normal;

  • The second handshake is that the server sends SYN+ACK, and the client receives it. The client finds that the client’s sending and receiving capabilities are normal, and the server’s sending and receiving capabilities are also normal, but at this time the server cannot confirm whether the client’s receiving capabilities are normal;

  • In the third handshake, the client sends ACK, and the server receives it, so that the server can conclude that the client’s sending and receiving capabilities are normal, and the server’s own sending and receiving capabilities are also normal.

  2. Can the three-way handshake carry data?

  Answer: The first and second handshakes cannot carry data, but the third handshake can carry data. Assuming that data can be carried for the first time, if someone maliciously attacks the server, a large amount of data will be placed in the SYN message in the first handshake every time, and a large number of SYN messages will be sent repeatedly. At this time, the server will spend a lot of memory space to buffer these packets, the server is more likely to be attacked.

  3. If the TCP three-way handshake fails, what will the server do?

  Answer: There are two reasons for the handshake failure. The first is that the server does not receive the SYN, and then does nothing; the second is that the server does not receive an ACK response for a long time after the server replies with SYN+ACK, and the timeout occurs. After that, an RST reset connection message will be sent to release resources.

  4. What does ISN stand for? What's the point? Is the ISN fixed? Why should ISN be dynamic and random?

  Answer: The full name of ISN is Initial Sequence Number, which is the origin of the byte data number of the TCP sender, and tells the other party the initialization sequence number that I want to start sending data.

  If the ISN is fixed, it is easy for an attacker to guess the subsequent confirmation number. For the sake of security, to avoid being guessed by a third party and sending a forged RST message, the ISN is dynamically generated.

  5. What is a semi-join queue

  Answer: After the server receives the SYN from the client for the first time, it will be in the SYN_RECD state, and the connection between the two parties has not been fully established at this time. The server will put the request connection in this state in a queue, and we call this queue a semi-connection queue. Of course, there is also a full connection queue, that is, the three-way handshake has been completed, and the established connection will be placed in the full connection queue. If the queue is full, packet loss may occur.

3. Initiate an HTTP request

  HTTP is a standard (TCP) for client and server side requests and responses. The client side is the end user and the server side is the website. By using a web browser, web crawler or other tools, the client initiates an HTTP protocol to a specified port on the server (the default port is 80), otherwise it cannot be connected.

  In layman's terms, it is the rule for computers to communicate through the network. It is a request-and-response-based, stateless, application-layer protocol, often based on the TCP/IP protocol for data transmission. At present, any kind of communication between any terminal must be carried out according to the HTTP protocol, otherwise it cannot be connected.

  HTTP request message:

  An HTTP request message consists of four parts: request line, request header, blank line, and request data.

 1) request line

  The request line is divided into three parts: request method, request address, and protocol version.

  request method

        There are 8 request methods defined by HTTP/1.1: GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, TRACE.

  The two most commonly used types are GET and POST. If it is a RESTful interface, GET, POST, DELETE, and PUT are generally used.

  request address

              URL: Uniform Resource Locator, which is an abstract and unique identification method of a voluntary location.

    Consists of: <protocol>://<host>:<port>/<path>.

    The port and path can sometimes be omitted (HTTP default port number is 80).

 2) request header

    The request header adds some additional information to the request message, consisting of "name/value" pairs, one pair per line, and the name and value are separated by a colon.

    There will be a blank line at the end of the request header, indicating the end of the request header, followed by the request data. This line is very important and essential.

 3) Request data

    Optional parts, such as GET requests, have no request data.
 

4. The server responds to the HTTP request

   After receiving the HTTP request, it is the turn of the load balancing to appear. It is located at the front end of the website and distributes the relatively high traffic in a short period of time to different machines for processing. There are two types of load balancing solutions: software and hardware. The most common software solution is NGINX.

   Nginx has two main functions: 1, processing static file requests, and 2 forwarding requests to the back-end server. Then the backend server queries the database to return the data. The data returned to the client is still transmitted through the HTTP protocol.

   The HTTP response message is mainly composed of status line, response header, blank line and response data.

  1) Status line : It consists of 3 parts, namely: protocol version, status code, and status code description.

   The protocol version is consistent with the request message, and the status code description is a simple description of the status code, so only the status code is introduced here. Some common status codes are as follows:

   2) Response header

    Similar to the request header, some additional information is added to the response message.

   3) Response data

    Used to store data information that needs to be returned to the client.

  In the above HTTP response, the Content-Length in the response header is also used to indicate the number of bytes in the message body. Content-Type indicates the type of the message body. Usually, when browsing a web page, the type is HTML, and of course there are other types, such as pictures and videos.

5. Browser analysis

  After the browser gets the index.html file, it starts to parse the html code in it, and when it encounters static resources such as js/css/image, it requests the server to download (multi-threaded download will be used, each browser's thread The number is different), this time use the keep-alive feature, establish an HTTP connection, you can request multiple resources, the order of downloading resources is in accordance with the order in the code, but because the size of each resource is different, and the browser And multi-threaded requests request resources, so the order displayed is not necessarily the order in the code.

  When the browser requests a static resource (if it has not expired), it initiates an http request to the server (asking whether the resource has been modified since the last modification time), if the server returns a 304 status code (tell the browser server side is not modified), then the browser will directly read the local cache file of the resource.

6. The browser renders the page

  Finally, the browser uses its own internal working mechanism to render the requested static resources and html code, and presents them to the user after rendering. The browser is a process of parsing and rendering.

7. The server closes the TCP connection

  Under normal circumstances, once the web server sends the request data to the browser, it will close the TCP connection. Closing the TCP connection requires four waves of hands.

  The end of the interrupt connection can be the client or the server.

  Wave four times:

  • When the client actively calls close, it sends the end segment FIN message to the server and enters the FIN_WAIT1 state at the same time;
  • The server will receive the end segment FIN report, the server returns the confirmation segment ACK and enters the CLOSE_WAIT state. At this time, if the server has data to send, the client still needs to receive it. After receiving the server's confirmation of the end segment, the client enters the FIN_WAIT2 state and begins to wait for the server's end segment;
  • After the server-side data is sent, when the server actually calls close to close the connection, it will send the end segment FIN packet to the client. At this time, the server enters the LAST_ACK state and waits for the arrival of the last ACK;
  • The client receives the end segment sent by the server, enters TIME_WAIT , and sends the confirmation segment ACK; the server receives the ACK for the end segment confirmation, enters the CLOSED state, and disconnects. The client will wait for 2MSL before entering the CLOSED state.

    Note: SYN: synchronization bit, SYN=1, means to make a connection request;

      FIN: means close the connection;

      ACK: Indicates response, confirmation bit, ACK=1, confirmation is valid, ACK=0, confirmation is invalid;

      seq: serial number, randomly generated by the machine;

      ack: confirmation number, the sequence number sent by the other party + 1 (ack=seq+1);

      PSH: Indicates that there is DATA data transmission;

      RST means connection reset.

  Among them, ACK may be used simultaneously with SYN, FIN, etc. For example, SYN and ACK may be 1 at the same time, which means the response after the connection is established; if it is only a single SYN, it means only the connection is established. But SYN and FIN will not be 1 at the same time, because the former means to establish a connection, while the latter means to disconnect. RST generally appears to be 1 after FIN, indicating that the connection is reset.

 1. Why is it three times to shake hands, but four times to wave?

  Answer: In fact, during the TCP handshake, the receiving end combines the SYN packet and the ACK confirmation packet into one packet and sends it, so the sending of one packet is reduced. For the four hand waves, since TCP is full-duplex communication, sending a FIN request by the active closing party does not mean that the connection is completely disconnected, but only means that the active closing party no longer sends data.

  The receiver may still send data, so the data channel from the server to the client cannot be closed immediately, so the FIN packet from the server and the ACK packet to the client cannot be combined and sent, and the ACK can only be confirmed first, and when the server does not need to send data When sending FIN packets, four data packet interactions are required for four waved hands.

 2. What is the function of the TIME_WAIT state? Why does the active closing party not directly enter the CLOSED state to release resources?

  Answer: If the active closing party enters the CLOSED state, and the passive closing party does not get an ACK confirmation after sending the FIN packet, it will retransmit a FIN packet after a timeout. If the client does not have the TIME_WAIT state and directly enters the CLOSED state to release resources, the next time a new client is started, it may use the same address information as the previous client, and there are two hazards:

  • The first is that when the newly started new client successfully binds the address, it will receive a retransmitted FIN packet, which will affect the new connection.
  • The second is if the new client sends a SYN connection request to the same server, but at this time the server is in the LAST_ACK state, requiring an ACK instead of a SYN, so it will send a RST re-establishment request.

 3. Why does the TIME_WAIT state need to go through 2MSL to enter the CLOASE state?

  Answer: MSL refers to the maximum lifetime of packets in the network. After the client sends the FIN confirmation packet ACK to the server, the ACK packet may not arrive. If the server does not receive the ACK packet, it will resend the FIN packet.

  Therefore, after the client sends the ACK, it needs to set aside 2MSL time (the ACK arrives at the server + the server sends the FIN retransmission packet, one back and one back) to wait for confirmation that the server has indeed received the ACK packet. That is to say, if the client does not receive the FIN packet retransmitted by the server after waiting for 2MSL, it can confirm that the server has received the ACK packet sent by the client, and then end the TCP connection.

  However, if the browser or server adds this line of code to its header information:

  Connection:keep-alive

   The TCP connection will remain open after sending, so the browser can continue to send requests over the same connection. Keeping connections saves the time required to establish a new connection for each request and also saves network bandwidth

 4. What is the reason for a large amount of TIME_WAIT on a host? How should it be handled?

   Answer: TIME_WAIT is caused by the active closing party. A large number of TIME_WAIT on a host proves that a large number of active closing connections have been initiated on this host. Common in some crawler servers. At this time, we should adjust the waiting time of TIME_WAIT, or enable the socket address reuse option.

 5. What is the reason for a large number of CLOSE_WAIT on a host? How should it be handled?

   Answer: CLOSE_WAIT is the state after the passive closing party receives a FIN request and responds, waiting for further processing by the upper program. If a large number of CLOSE_WAIT appears, it may be that the host program of the passive closing party forgot the last step to disconnect and call close to release resources. This is a BUG, ​​you only need to add the corresponding close to solve the problem.

 6. Keep alive mechanism in TCP connection management

   Answer: In TCP communication, if there is no data exchange between the two ends for a long time, the server will send a keep-alive detection datagram to the client every once in a while, asking the client to reply. If no response is received multiple times in a row, the connection is considered disconnected. The default is 7200s for a long time, 75s by default for a certain period of time, and 9 times if there is no response for many times in a row. These data can be modified in the socket, interface: Setsockopt.

Guess you like

Origin blog.csdn.net/weixin_51225684/article/details/129147119