Remote communication protocol learning (1) network protocol

The entire process of an http request

In the distributed architecture, there is a very important link, that is, the computer nodes in the distributed network need to communicate with each other. Therefore, the knowledge of communication is learned through the entire process of an http request

Common terms explained

DNS (responsible for domain name resolution service)

First, when a user accesses a domain name, it will go through DNS resolution (Domain Name System). It is a protocol at the application layer like the HTTP protocol, which mainly provides resolution services from domain names to IP . We can actually access the services of the target host without a domain name, but the IP itself is not so easy to remember, so using a domain name to replace it makes it easier for users to remember.
Insert picture description here

CDN (speed up static content access)

In many large-scale websites, CDNs are introduced to speed up the access of static content. Here is a brief explanation of what CDN** (Content Delivery Network)** is, which means content delivery network. CDN is actually a kind of network caching technology, which can put some relatively stable resources closer to the end user. On the one hand, it can save the bandwidth consumption of the entire WAN. On the other hand, it can increase the user's access speed and improve the user experience. We generally put static files (pictures, scripts, static pages) in the CDN.
If a CDN is introduced, the analysis process may be a little more complicated, so you can find out if you have time. For example, Alibaba Cloud provides the function of CDN.

HTTP protocol communication principle

After the domain name is successfully resolved, how do the client and the server establish a connection and how to communicate? Speaking of communication, you must have heard of the two communication protocols tcp and udp, and the handshake process for establishing a connection. The http protocol communication is an application layer protocol based on the tcp/ip protocol. What other application layer protocols are besides http (FTP, DNS, SMTP, Telnet, etc.).
When it comes to network protocols, we must know the OSI seven-layer network model and the TCP/IP four-layer conceptual model. The OSI seven-layer network model includes (application layer, presentation layer, session layer, transport layer, network layer, data link layer, Physical layer), TCP/IP four-layer conceptual model including (application layer, transport layer, network layer, data link layer).

Network layered model

Layer 7 refers to the OSI seven-layer protocol model, mainly: application layer (Application), presentation layer (Presentation), session layer (Session), transport layer (Transport), network layer (Network), data link layer (Data Link) ), physical layer (Physical).
Insert picture description here
The specific protocol corresponding to each layer is as follows: The
Insert picture description here
Insert picture description here
picture is reproduced to: https://blog.csdn.net/buknow/article/details/81148684

The request initiation process, what is done in the tcp/ip four-layer network model

When an application uses TCP to transmit data, the data is sent to the protocol stack, and then passes through each layer from top to bottom until it is sent to the network as a stream of bits. Each layer must add some header information to the received data (sometimes also add tail information)
Insert picture description here

Insert picture description here
How does the client find the target service

  1. Client initiates a request

  2. Assemble the MAC address of the target machine at the data link layer

    How to get the mac address of the target machine? Here is an ARP protocol , which is simply the ip of the known target machine, and the mac address of the target machine needs to be obtained. (Send a broadcast message, who owns this ip, please come and claim it. The machine that claims the ip will send a response to the mac address), you can get the target MAC address

    In order to avoid using ARP request every time, the machine will also perform ARP cache locally. Of course, the machine will continue to go online and offline, and the IP may also change, so the ARP MAC address cache will expire after a period of time.

  3. The data packet is broadcast on the link, and the network card of the server discovers and receives the data packet.

  4. The server's network card receives the packet, then opens the IP packet, and finds that the IP address is also its own, and then opens the TCP packet, and finds that the port is itself, that is, port 80. At this time, there is an nginx on this machine that listens on port 80. So the request is submitted to nginx, and nginx returns the data.

  5. Send data back to the requesting machine. Layer by layer encapsulation, and finally to the MAC layer. Because the source MAC address comes when it comes, when it returns, the source MAC becomes the target MAC, and then it is returned to the requesting machine.

The processing process after the receiving end receives the data packet
When the destination host receives an Ethernet data frame, the data begins to pass from the lower layer to the upper layer in the protocol stack, and at the same time, the packet headers added by each layer protocol are removed. Each layer of the protocol must check the protocol identifier in the header of the message to determine the upper layer protocol that receives the data.

Insert picture description here
Layered management of TCP/IP

The TCP/IP protocol is divided into 4 layers according to levels: application layer, transport layer, network layer, and data link layer. Everyone is familiar with the concept of layering. For example, our distributed architecture system will be divided into a business layer, a service layer, and a basic support layer. For example, docker is also implemented based on layers. So we will find that complex programs need to be layered. This is the requirement of software design, and each layer focuses on the current field. If some places need to be modified, we only need to replace the changed layer. On the one hand, the impact of the change is less, and on the other hand, the flexibility of the entire architecture is higher. Finally, after layering, the design of the entire architecture becomes relatively simple.

Hierarchical load

  • Layer 2 load balancing
    Layer 2 load is for MAC. The load balancing server still provides a VIP (virtual IP) externally. Different machines in the cluster use the same IP address, but the MAC addresses of the machines are different. When the load balancing server receives the request, it forwards the request to the target machine by rewriting the target MAC address of the message to achieve load balancing.
    Layer 2 load balancing will receive the request through a virtual MAC address and then assign it to the real MAC address
  • Three-layer load balancing The
    three-layer load is for IP. Similar to the two-layer load balancing, the load balancing server still provides a VIP (virtual IP) externally, but different machines in the cluster use different IP addresses. After the load balancing server receives the request, according to different load balancing algorithms, the request is forwarded to different real servers through IP.
    Layer 3 load balancing will receive the request through a virtual IP address, and then assign it to the real IP address
  • Four-layer load balancing The
    four-layer load balancing works at the transport layer of the OSI model. Because at the transport layer, there is only TCP/UDP protocol. In addition to source IP and destination IP, these two protocols also contain source port number and destination port number. . After receiving the client's request, the four-layer load balancing server forwards the traffic to the application server by modifying the address information (IP+port number) of the data packet.
    The fourth layer receives the request through the virtual IP + port, and then distributes it to the real server
  • Seven-layer load balancing The
    seven-layer load balancing works at the application layer of the OSI model. There are many application layer protocols, such as http, radius, dns and so on. The seven-layer load can be loaded based on these protocols. These application layer protocols will contain a lot of meaningful content. For example, in the load balancing of the same Web server, in addition to load based on IP and port, you can also decide whether to perform load balancing based on the seven-layer URL and browser category. The
    seven-layer receives requests through virtual URLs or host names, and then Assign to the real server.

In-depth analysis of TCP/IP protocol

Through the analysis of the previous case, the communication process of the network is basically clear. In the http protocol, the communication protocol of tcp is used in the bottom layer. Next, we will briefly introduce the principle of the communication protocol of tcp .

Communication protocol, let's briefly introduce the principle of tcp communication protocol. If we need to study network protocols in depth, we must first clarify the functions and working processes of some basic protocols. Network devices are not as intelligent as the human brain. They are created by humans, and their working processes must be in line with Human communication habits are designed in accordance with human communication habits. Therefore, these agreements must be understood in a human way of thinking. For example, if you make a call to someone, it is impossible for you to talk to Datong as soon as the call is connected. What if the other party has not had time to listen after the call is connected? This is not in line with normal human communication habits. Generally, after the call is connected, the two parties will have an interactive process. They will first say "Hello", and then the other party will reply with a "Hello". Both parties make it clear that the other party's attention is focused on After the telephone communication, the two of you can start communicating. This is the normal way of human communication. This process is reflected in the computer network as the network protocol! We use the TCP protocol to send data before the two computers establish a network connection. The package communicates, the connection is established after the communication, and then the information is transmitted. The UDP protocol is similar to our campus broadcast. The broadcast content has been broadcast through the broadcast station. Whether you can hear it or not has nothing to do with the broadcast station. Under normal circumstances, it is impossible for you to say that you did not pay attention and then let the broadcast. The station plays the broadcast content again. Based on these ideas, let's first understand the handshake protocol that pays more attention to TCP

TCP handshake protocol (used when establishing a connection)

The reliability of TCP messages first comes from the establishment of an effective connection, so before data transmission, a connection needs to be established through a three-way handshake. The so-called three-way handshake means that when a TCP connection is established, the client and server need to send a total of 3 Package to confirm the establishment of the connection. In socket programming, this process is triggered by the client executing connect
Insert picture description here

  1. The first handshake : (SYN=1, seq=x) The client sends a TCP packet with the SYN flag position 1, indicating the port of the server that the client intends to connect to, and the initial sequence number X, which is stored in the sequence number (Sequence Number) field. After sending, the client enters the SYN_SEND state.
  2. The second handshake : (SYN=1, ACK=1, seq=y, ACKnum=x+1): The server sends back an acknowledgement packet (ACK) response. That is, the SYN flag and the ACK flag are both 1. The server selects its own ISN serial number, puts it in the Seq field, and sets the Acknowledgement Number to the client's ISN plus 1, which is X+1. After sending, the server enters the SYN_RCVD state.
  3. The third handshake : (ACK=1, ACKnum=y+1) The client sends an acknowledgment packet (ACK) again, the SYN flag is 0, the ACK flag is 1, and the sequence number field of the ACK sent by the server is +1, Put it in the confirm field and send it to the other party, and after the data segment is written and ISN is sent, the client enters the ESTABLISHED state. When the server receives this packet, it also enters the ESTABLISHED state, and the TCP handshake ends.

Tips: SYN attack

In the three-way handshake process, after the Server sends the SYN-ACK, the TCP connection before receiving the ACK from the Client is called half-open connect. At this time, the Server is in the SYN_RCVD state. Then, when the ACK is received, the server transfers to Enter the ESTABLISHED state.

The SYN attack is that the Client forges a large number of non-existent IP addresses in a short period of time, and continuously sends SYN packets to the Server. The Server replies to the confirmation packet and waits for the confirmation from the Client. Since the source address does not exist, the Server needs to constantly re- Until the timeout expires, these forged SYN packets will occupy the unconnected queue, causing normal SYN requests to be discarded because the queue is full, causing network congestion and even system paralysis.

The SYN attack is a typical DDOS attack. The way to detect the SYN attack is very simple, that is, when there are a large number of semi-connected states on the Server and the source IP address is random, it can be concluded that the SYN attack has occurred.

TCP four waved hands protocol (used when disconnecting)

Four waves of hands indicate that when TCP is disconnected, the client and server need to send a total of 4 packets to confirm the disconnection; either the client or the server can actively initiate the wave action (because TCP is a full-duplex protocol), In socket programming, any party performs a close() operation to generate a wave operation.

Concept explanation:

  • Simplex : data transmission only supports data transmission in one direction
  • Half-duplex : Data transmission allows data to be transmitted in two directions, but at a certain moment, only allows transmission in one direction, which is actually a bit like simplex communication in which directions are switched
  • Full-duplex : Data communication allows data to be transmitted in both directions at the same time, so full-duplex is a combination of two simplex communication methods, which requires the sending and receiving devices to have independent receiving and sending capabilities

Four waved hands icon:
Insert picture description here

  1. The first wave (FIN=1, seq=x)
    assumes that the client wants to close the connection. The client sends a packet with the FIN flag set to 1, indicating that it has no data to send, but it can still accept data. After sending, the client enters the FIN_WAIT_1 state.
  2. The second wave (ACK=1, ACKnum=x+1) The
    server confirms the FIN packet of the client and sends a confirmation packet to indicate that it has accepted the client's request to close the connection, but is not ready to close the connection. After sending, the server enters the CLOSE_WAIT state. After the client receives this confirmation packet, it enters the FIN_WAIT_2 state and waits for the server to close the connection.
  3. The third wave (FIN=1, seq=w) when the
    server is ready to close the connection, it sends a request to end the connection to the client, and FIN is set to 1. After sending, the server enters the LAST_ACK state and waits for the last ACK from the client.
  4. The fourth wave (ACK=1, ACKnum=w+1) the
    client receives the close request from the server, sends an acknowledgment packet, and enters the TIME_WAIT state, waiting for an ACK packet that may require retransmission.
    After the server receives this confirmation packet, it closes the connection and enters the CLOSED state. After the client waited for a fixed period of time (two maximum segment life cycles, 2MSL, 2 Maximum Segment Lifetime), it did not receive an ACK from the server, thinking that the server had closed the connection normally, so it closed the connection and entered the CLOSED state.

Vernacular explanation:

Suppose that the client initiates a connection interrupt request, that is, sends a FIN message. After the server receives the FIN message, it means "I have no data to send to you on the client", but if you still have data that has not been sent, you don't need to close the Socket in a hurry, you can continue to send data. So you send an ACK first, "Tell the client that I have received your request, but I am not ready yet, please continue to wait for my message." At this time, the client side enters the FIN_WAIT state, and continues to wait for the FIN message from the server side. When the server side determines that the data has been sent, it sends a FIN message to the client side, "Tell the client side, well, I have finished sending the data here, and I am ready to close the connection". After receiving the FIN message, the client "knows that the connection can be closed, but he still doesn't believe in the network, because the server doesn't know to close it, so it enters the TIME_WAIT state after sending the ACK. If the server does not receive the ACK, it can restart Pass." After the server receives the ACK, "you know you can disconnect." If the client side waits for 2MSL and still does not receive a reply, it proves that the server side has been closed normally. Well, my client side can also close the connection. Ok, the TCP connection is closed like this!

problem:

  1. Why is there a three-way handshake when connecting and a four-way handshake when it is closed?
    Answer: The three-way handshake is because when the server receives the SYN connection request message from the client, it can send a SYN+ACK message directly. The ACK packet is used for reply, and the SYN packet is used for synchronization.
    However, when the connection is closed, when the server receives a FIN message, it may not immediately close the SOCKET (because there may be messages that have not been processed), so it can only reply with an ACK message first and tell the client, "You I received the FIN message sent". Only after all the messages on my Server side have been sent can I send FIN messages, so I cannot send them together. So it takes four steps to wave.
  2. Why does the TIME_WAIT state need to pass 2MSL (maximum message segment survival time) to return to the CLOSE state?
    Answer: Although it is reasonable to say that all four messages have been sent, we can directly enter the CLOSE state, but we must assume that the network is unreliable Yes, the last ACK can be lost. So the TIME_WAIT state is used to retransmit ACK packets that may be lost.

Guess you like

Origin blog.csdn.net/nonage_bread/article/details/111193131