Ten questions you must know about TCP/IP!

This article sorts out some of the top ten questions in the TCP/IP protocol suite that need to be known and understood. They are not only high-frequency interview questions, but also essential basic qualities for programmers.

Insert picture description here

1. TCP/IP model

The TCP/IP protocol model (Transmission Control Protocol/Internet Protocol) includes a series of network protocols that form the basis of the Internet and is the core protocol of the Internet.

The reference model based on TCP/IP divides the protocol into four levels, which are link layer, network layer, transport layer and application layer. The figure below shows the contrast between the TCP/IP model and the OSI model.
Insert picture description here
The TCP/IP protocol family is packaged layer by layer from top to bottom. The top layer is the application layer, which contains http, ftp, and other familiar protocols. The second layer is the transport layer, and the famous TCP and UDP protocols are at this level. The third layer is the network layer. The IP protocol is here. It is responsible for adding IP addresses and other data to the data to determine the destination of the transmission. The fourth layer is the data link layer. This layer adds an Ethernet protocol header to the data to be transmitted and performs CRC encoding to prepare for the final data transmission.

Insert picture description here
The above figure clearly shows the role of each layer in the TCP/IP protocol, and the process of TCP/IP protocol communication actually corresponds to the process of data stacking and popping. In the process of stacking, the data sender continuously encapsulates the header and tail at each layer, adding some transmitted information to ensure that it can be transmitted to the destination. In the process of unstacking, the data receiver continuously removes the header and tail at each layer to obtain the final transmitted data

Insert picture description here
The above figure uses the HTTP protocol as an example for specific instructions.

Second, the data link layer

The physical layer is responsible for the interchange of 0 and 1 bit streams with the voltage level of the physical device and the flashing of light. The data link layer is responsible for dividing the sequence of 0 and 1 into data frames that are transmitted from one node to another adjacent node. These nodes are uniquely identified by MAC (MAC, physical address, a host will have a MAC address).

Insert picture description here

Encapsulate into a frame: Add header and trailer to the network layer datagram and encapsulate it into a frame. The frame header includes the source MAC address and destination MAC address.
Transparent transmission: zero-bit padding, escape characters.
Reliable transmission: It is rarely used on links with low error rates, but the wireless link WLAN will ensure reliable transmission.
Error detection (CRC): The receiver detects an error, if an error is found, the frame is discarded

Third, the network layer

1. IP protocol

IP protocol is the core of TCP/IP protocol. All TCP, UDP, IMCP, IGMP data are transmitted in IP data format. It should be noted that IP is not a reliable protocol. This means that the IP protocol does not provide a processing mechanism for data failure. This is considered to be the upper layer protocol: TCP or UDP.

1.1 IP address

In the data link layer, we generally use MAC addresses to identify different nodes, and in the IP layer we also have a similar address identification, which is the IP address.

The 32-bit IP address is divided into network bits and address bits. This can reduce the number of routing table records in the router. With a network address, you can limit the terminals with the same network address to be in the same range, then the routing table only needs By maintaining a direction for this network address, you can find the corresponding terminals.

Class A IP address: 0.0.0.0 _{127.0.0.0 Class B IP address: 128.0.0.1} 191.255.0.0 Class C IP address: 192.168.0.0~239.255.255.0
1.2 IP protocol header

Insert picture description here
Here only introduce: the eight-bit TTL field. This field specifies how many routes the packet will pass through before it will be discarded. Every time an IP data packet passes through a router, the TTL value of the data packet will decrease by 1. When the TTL of the data packet becomes zero, it will be automatically discarded. The maximum value of this field is 255, which means that a protocol packet will be discarded if it travels 255 times in the router. Depending on the system, this number is different, usually 32 or 64.

2. ARP and RARP protocol

ARP is a protocol for obtaining MAC addresses based on IP addresses.

ARP (Address Resolution) protocol is a resolution protocol. Originally, the host does not know which interface of which host this IP corresponds to. When the host wants to send an IP packet, it will first check its own ARP cache (that is, An IP-MAC address correspondence table cache).

If the queried IP-MAC value pair does not exist, then the host sends an ARP protocol broadcast packet to the network. This broadcast packet contains the IP address to be queried, and all hosts that directly receive the broadcast packet will query their own IP address, if a host that receives the broadcast packet finds that it meets the conditions, it will prepare an ARP packet containing its own MAC address and send it to the host that sends the ARP broadcast.

After receiving the ARP packet, the broadcast host will update its ARP cache (where the IP-MAC correspondence table is stored). The host that sends the broadcast will use the new ARP cache data to prepare the data link layer for packet sending.

The RARP protocol works on the contrary, so I won’t repeat it.

3. ICMP protocol

The IP protocol is not a reliable protocol. It does not guarantee that data will be delivered. Naturally, the work of ensuring data delivery should be done by other modules. One of the important modules is the ICMP (Network Control Message) protocol. ICMP is not a high-level protocol, but an IP layer protocol.

An error occurs when transmitting IP packets. For example, the host is unreachable, the route is unreachable, etc. The ICMP protocol will packetize the error information and send it back to the host. Give the host a chance to deal with errors, which is why it is said that protocols built above the IP layer are possible to achieve security.

Four, ping

Ping can be said to be the most famous application of ICMP, and it is part of the TCP/IP protocol. Use the "ping" command to check whether the network is connected, which can help us analyze and determine network failures.

For example: when one of our websites is not available. Usually ping this website. Ping will echo some useful information. The general information is as follows:
Insert picture description here
The word ping is derived from sonar positioning, and this program does exactly that. It uses ICMP protocol packets to detect whether another host is reachable. The principle is to use ICMP with a type code of 0 to send a request, and the requested host responds with ICMP with a type code of 8.

The ping program calculates the interval time and calculates how many packets have been delivered. Users can judge the general situation of the network. We can see that ping gives the time of transmission and TTL data.

五、Traceroute

Traceroute is an important tool used to detect the routing situation between a host and a destination host, and it is also the most convenient tool.

The principle of Traceroute is very, very interesting. After it receives the IP of the destination host, it first sends a UDP packet with TTL=1 to the destination host, and after the first router that passes through receives this packet, it automatically After the TTL is reduced by 1, and the TTL becomes 0, the router discards the packet and generates an ICMP datagram that the host is unreachable to the host. After receiving this datagram, the host sends a UDP datagram with TTL=2 to the destination host, and then stimulates the second router to send an ICMP datagram to the host. Repeat this until it reaches the destination host. In this way, traceroute gets all the router IPs.
Insert picture description here

Six, TCP/UDP

Both TCP/UDP are transport layer protocols, but the two have different characteristics and different application scenarios. The following is a comparative analysis in the form of charts. Insert picture description here
Message-oriented

The message-oriented transmission method is how long a message is handed over to UDP by the application layer. Therefore, the application must select a message of the appropriate size. If the message is too long, the IP layer needs to be fragmented, reducing efficiency. If it is too short, the IP will be too small.

Byte stream oriented

For byte streams, although the interaction between the application and TCP is one data block (with different sizes) at a time, TCP treats the application as a series of unstructured byte streams. TCP has a buffer. When the data block transmitted by the application is too long, TCP can divide it into shorter ones and transmit it.

Regarding congestion control and flow control, it is the focus of TCP, which will be explained later.
Some applications of TCP and UDP protocols

Insert picture description here
When should I use TCP?

When there is a requirement for the quality of network communication, such as: the entire data must be accurately transmitted to the other party. This is often used for applications that require reliability, such as HTTP, HTTPS, FTP and other file transfer protocols, and POP, SMTP and other mail transmissions. Agreement.

When should I use UDP?

When the network communication quality is not high, the network communication speed is required to be as fast as possible, then UDP can be used.

Seven, DNS

DNS (Domain Name System, Domain Name System), a distributed database on the Internet as a mapping between domain names and IP addresses, can make it easier for users to access the Internet without having to remember the IP number string that can be directly read by the machine. Through the host name, the process of finally obtaining the IP address corresponding to the host name is called domain name resolution (or host name resolution). The DNS protocol runs on top of the UDP protocol and uses port number 53.

Eight, TCP connection establishment and termination

1. Three-way handshake

TCP is connection-oriented, no matter which party sends data to the other party, a connection must be established between the two parties. In the TCP/IP protocol, the TCP protocol provides reliable connection services, and the connection is initialized through a three-way handshake. The purpose of the three-way handshake is to synchronize the serial number and confirmation number of the two parties in the connection and exchange TCP window size information.

Insert picture description here
The first handshake: establish a connection. The client sends a connection request segment, setting the SYN position to 1, and the Sequence Number to x; then, the client enters the SYN_SEND state and waits for the server's confirmation;

Second handshake: The server receives the SYN segment. The server receives the SYN segment from the client, and needs to confirm the SYN segment, and set the Acknowledgment Number to x+1 (Sequence Number+1); at the same time, it needs to send the SYN request message itself, setting the SYN position to 1. , Sequence Number is y; the server puts all the above information in a segment (ie SYN+ACK segment) and sends it to the client together, and the server enters the SYN_RECV state;

The third handshake: The client receives the SYN+ACK segment from the server. Then set the Acknowledgment Number to y+1, and send an ACK segment to the server. After this segment is sent, both the client and the server enter the ESTABLISHED state to complete the TCP three-way handshake.

Why do you need to shake hands three times?

In order to prevent the failed connection request segment from being suddenly transmitted to the server, an error occurs.
Specific example: "Failed connection request message segment" is generated in such a situation: the first connection request message segment sent by the client is not lost, but stays at a network node for a long time , So that it is delayed to reach the server some time after the connection is released. Originally, this is a segment that has long expired. However, after the server receives the invalid connection request segment, it mistakes it for a new connection request sent by the client again. So it sends a confirmation segment to the client, agreeing to establish a connection. Assuming that the "three-way handshake" is not used, then as long as the server sends an acknowledgment, a new connection is established. Since the client has not sent a request to establish a connection, it ignores the server's confirmation and does not send data to the server. But the server thinks that the new transport connection has been established, and has been waiting for the client to send data. In this way, a lot of server resources are wasted. The "three-way handshake" approach can prevent the above phenomenon from happening. For example, in the situation just now, the client will not send a confirmation to the server. Because the server cannot receive the confirmation, it knows that the client did not request to establish a connection. "

2. Wave four times

After the client and server establish a TCP connection through the three-way handshake, when the data transmission is completed, the TCP connection must be disconnected. For the disconnection of TCP, there is a mysterious "four breakup" here.
Insert picture description here
The first breakup: Host 1 (which can be a client or a server), set the Sequence Number, and send a FIN segment to host 2. At this time, host 1 enters the FIN_WAIT_1 state; this means that host 1 has no data to Sent to host 2;

The second breakup: Host 2 receives the FIN segment sent by Host 1, and returns an ACK segment to Host 1. The Acknowledgment Number is Sequence Number plus 1; Host 1 enters the FIN_WAIT_2 state; Host 2 tells Host 1, I " Agree to your closing request;

The third breakup: Host 2 sends a FIN segment to Host 1, requesting to close the connection, and Host 2 enters the LAST_ACK state;

The fourth breakup: host 1 receives the FIN segment sent by host 2, and sends an ACK segment to host 2, and then host 1 enters the TIME_WAIT state; after host 2 receives the ACK segment from host 1, it closes the connection ; At this time, host 1 still does not receive a reply after waiting for 2MSL, which proves that the Server side has been closed normally, so good, host 1 can also close the connection

Why break up four times?

The TCP protocol is a connection-oriented, reliable, byte stream-based transport layer communication protocol. TCP is full-duplex mode, which means that when host 1 sends out a FIN segment, it just means that host 1 has no data to send, and host 1 tells host 2 that all its data has been sent; however, At this time, host 1 can still receive data from host 2; when host 2 returns the ACK segment, it means that it already knows that host 1 has no data to send, but host 2 can still send data to host 1; when host 2 also When the FIN segment is sent, it means that host 2 has no data to send at this time, and it will tell host 1 that I have no data to send, and then each other will happily interrupt the TCP connection.

Why wait for 2MSL?

MSL: The maximum survival time of a message segment, which is the longest time in the network before any message segment is discarded. There are two reasons:

Ensure that the full-duplex connection of the TCP protocol can be reliably closed.
Ensure that the repeated data segment of this connection disappears from the network

The first point: If the host 1 is directly CLOSED, then due to the unreliability of the IP protocol or other network reasons, the host 2 did not receive the final ACK from the host 1. Then the host 2 will continue to send the FIN after the timeout. At this time, since the host 1 is CLOSED, it cannot find the connection corresponding to the retransmitted FIN. Therefore, host 1 does not enter CLOSED directly, but keeps TIME_WAIT. When receiving FIN again, it can ensure that the other party receives ACK, and finally closes the connection correctly.

The second point: If host 1 directly CLOSED, and then initiate a new connection to host 2, we cannot guarantee that the port number of this new connection is different from the port number of the connection just closed. In other words, it is possible that the port numbers of the new connection and the old connection are the same. Generally speaking, there will be no problems, but there are special circumstances: assuming that the port number of the new connection and the old connection that have been closed are the same, if some data of the previous connection is still stuck in the network, these delayed data are being established The new connection arrives at host 2. Because the port number of the new connection and the old connection are the same, the TCP protocol considers that the delayed data belongs to the new connection, which is confused with the real new connection data packet. Therefore, the TCP connection has to wait for 2 times the MSL in the TIME_WAIT state, so as to ensure that all the data of this connection disappears from the network.

Nine, TCP flow control

If the sender sends the data too fast, the receiver may not have time to receive it, which will cause data loss. The so-called flow control is to let the sender not send too fast, but let the receiver have time to receive.

Using the sliding window mechanism can easily achieve flow control on the sender on the TCP connection.

Suppose A sends data to B. When the connection was established, B told A: "My receiving window is rwnd = 400" (where rwnd stands for receiver window). Therefore, the sending window of the sender cannot exceed the value of the receiving window given by the receiver. Please note that the unit of TCP window is byte, not message segment. Assuming that each segment is 100 bytes long, and the initial value of the data segment sequence number is set to 1. Uppercase ACK indicates the acknowledgment bit ACK in the header, and lowercase ack indicates the value ack of the acknowledgement field.

Insert picture description here
It can be seen from the figure that B has performed three flow control. The first time the window is reduced to rwnd = 300, the second time to rwnd = 100, and finally to rwnd = 0, which means that the sender is not allowed to send any more data. This state that the sender suspends sending will continue until the host B resends a new window value. The three segments sent by B to A are all set with ACK=1, and the acknowledgment number field is meaningful only when ACK=1.

TCP has a persistence timer for each connection. As long as one party of the TCP connection receives the zero window notification from the other party, the continuous timer is started. If the duration set by the timer expires, a zero-window control message segment (carrying 1 byte of data) is sent, and the party receiving this segment will reset the duration timer.

Ten, TCP congestion control

1. Slow start and congestion avoidance

The sender maintains a congestion window cwnd (congestion window) state variable. The size of the congestion window depends on the degree of network congestion and changes dynamically. The sender makes its sending window equal to the congestion window.
The principle of the sender to control the congestion window is: as long as there is no congestion in the network, the congestion window will be increased a bit more so that more packets can be sent. But as long as the network is congested, the congestion window is reduced to reduce the number of packets injected into the network.

Slow start algorithm:

When the host starts to send data, if a large amount of data bytes are injected into the network immediately, it may cause network congestion, because it is not clear about the load of the network. Therefore, a better method is to probe first, that is, gradually increase the sending window from small to large, that is, gradually increase the congestion window value from small to large.

Usually when the message segment is just started, the congestion window cwnd is first set to the value of the maximum message segment MSS. After each confirmation of a new segment is received, the congestion window is increased to at most one MSS value. Using this method to gradually increase the congestion window cwnd of the sender can make the packet injection rate into the network more reasonable.

Insert picture description here
After each transmission round, the congestion window cwnd is doubled. The time elapsed in a transmission round is actually the round-trip time RTT. However, the "transmission round" emphasizes more: the message segments allowed by the congestion window cwnd are continuously sent out, and the confirmation of the last byte that has been sent is received.

In addition, the "slow" of slow start does not mean that the growth rate of cwnd is slow, but it means that cwnd=1 is first set when TCP starts to send the segment, so that the sender only sends one segment at the beginning (the purpose is to test Look at the network congestion), and then gradually increase cwnd.

In order to prevent the congestion window cwnd from increasing too large and causing network congestion, it is also necessary to set a slow start threshold ssthresh state variable. The slow start threshold ssthresh is used as follows:

When cwnd <ssthresh, use the above slow start algorithm.
When cwnd> ssthresh, stop using the slow start algorithm and use the congestion avoidance algorithm instead.
When cwnd = ssthresh, either the slow start algorithm or the congestion control avoidance algorithm can be used. Congestion avoidance

Let the congestion window cwnd increase slowly, that is, increase the sender's congestion window cwnd by 1 instead of doubling each time a round-trip time RTT passes. In this way, the congestion window cwnd grows slowly according to a linear law, which is much slower than the congestion window growth rate of the slow start algorithm.

Insert picture description here
Whether in the slow start phase or in the congestion avoidance phase, as long as the sender judges that the network is congested (the basis is that no confirmation is received), the slow start threshold ssthresh must be set to half of the sender window value when congestion occurs (but not Less than 2). Then reset the congestion window cwnd to 1, and execute the slow start algorithm.

The purpose of this is to quickly reduce the number of packets sent by the host to the network, so that the congested router has enough time to process the backlog of packets in the queue.

The following figure illustrates the above-mentioned congestion control process with specific values. The size of the sending window is now as large as the congestion window.
Insert picture description here
2. Fast retransmission and fast recovery

Fast retransmission

The fast retransmission algorithm first requires the receiver to send a repeat confirmation immediately after receiving an out-of-sequence segment (in order to make the sender know that there is a segment that has not arrived at the other party) and not wait until it sends data before piggybacking confirm.
Insert picture description here
After receiving M1 and M2, the receiver sends out confirmations respectively. Now suppose that the receiver did not receive M3 but then received M4.

Obviously, the receiver cannot confirm M4, because M4 is a received out-of-sequence message segment. According to the principle of reliable transmission, the receiver can do nothing, or send a confirmation to M2 at an appropriate time.

However, in accordance with the provisions of the fast retransmission algorithm, the receiver should send a repeated confirmation of M2 in time, which can let the sender know that the message segment M3 has not reached the receiver early. The sender then sent M5 and M6. After receiving these two messages, the receiver also sends a repeated confirmation to M2 again. In this way, the sender has received a total of four M2 confirmations from the receiver, of which the last three are repeated confirmations.

The fast retransmission algorithm also stipulates that as long as the sender receives three repeated acknowledgments in a row, it should immediately retransmit the message segment M3 that the other party has not received, instead of waiting for the retransmission timer set by M3 to expire.

Since the sender retransmits the unacknowledged segment as soon as possible, the use of fast retransmission can increase the overall network throughput by about 20%.

Quick recovery

In conjunction with fast retransmission, there is also a fast recovery algorithm. The process has the following two main points:

When the sender receives three repeated confirmations in a row, it executes the "multiplication reduction" algorithm to halve the slow start threshold ssthresh.
The difference with the slow start is that the slow start algorithm is not executed now (that is, the congestion window cwnd is not set to 1), but the cwnd value is set to the value of the slow start threshold ssthresh halved, and then the congestion avoidance algorithm (" Addition increases"), so that the congestion window slowly increases linearly.

Insert picture description here