About the TCP protocol, I think you should understand it!

What is TCP?

TCP (Transmission Control Protocol) is a connection-oriented (connection-oriented), reliable, IP-based transport layer protocol. The protocol number of TCP in IP packets is 6. TCP is a super troublesome protocol, and it is the foundation of the Internet and a must-have for every programmer. First, let's take a look at the seven-layer model of OSI:

OSI's seven-layer model

 

We need to know that TCP works at the fourth layer in the seven-layer model of the network OSI - the Transport layer, IP is at the third layer - the Network layer, ARP is at the second layer - the Data Link layer; the data on the second layer , we call it Frame, the data on the third layer is called Packet, and the data on the fourth layer is called Segment. At the same time, we need to simply know that when data is sent from the application layer, header information will be added to each layer, encapsulated, and then sent to the data receiver. You need to know this basic process, that is, each data will go through the process of data encapsulation and decapsulation. In the OSI seven-layer model, the functions and corresponding protocols of each layer are as follows:

                         In the OSI seven-layer model, the role of each layer and the corresponding protocol

TCP is a protocol, so how is this protocol defined, and what is its data format? To conduct a deeper analysis, you need to understand and even memorize the meaning of each field in the TCP protocol. Oh come on.
OSI meaning

The above is the format of the TCP protocol header. Because it is too important, it is the basis for understanding other contents. The information of each field is explained in detail below.

Source Port and Destination Port: occupy 16 bits respectively, indicating the source port number and destination port number; used to distinguish different processes in the host, and the IP address is used to distinguish different hosts, the source port number and destination port number are matched The source IP address and destination IP address in the IP header can uniquely determine a TCP connection;

Sequence Number: used to identify the data byte stream sent from the TCP sender to the TCP receiver, it represents the sequence number of the first data byte in this segment in the data stream; it is mainly used to solve network chaos problem of order;

Acknowledgment Number: The 32-bit acknowledgment sequence number contains the next sequence number expected to be received by the end sending the acknowledgment. Therefore, the acknowledgment sequence number should be the sequence number of the last successfully received data byte plus 1. However, the field of the acknowledgment sequence number is valid only when the ACK flag (described below) in the flag bit is 1. Mainly used to solve the problem of no packet loss;

Offset: Gives the number of 32bit words in the header. This value is required because the length of the optional field is variable. This field occupies 4 bits (it can represent up to 15 32-bit words, that is, 4*15=60 bytes of header length), so TCP has a maximum of 60 bytes of headers. However, with no optional fields, the normal length is 20 bytes;

TCP Flags: There are 6 flag bits in the TCP header, many of which can be set to 1 at the same time, mainly used to control the state machine of TCP, which are URG, ACK, PSH, RST, SYN, FIN. The meaning of each flag bit is as follows:

URG: This flag indicates that the emergency pointer field of the TCP packet (which will be discussed shortly) is valid, to ensure that the TCP connection is not interrupted, and to urge the middle-layer device to process these data as soon as possible;

ACK: This flag indicates that the response field is valid, which means that the TCP response number mentioned above will be included in the TCP data packet; there are two values: 0 and 1, when it is 1, it indicates that the response field is valid, otherwise it is 0;

PSH: This flag indicates the Push operation. The so-called Push operation means that after the data packet arrives at the receiving end, it is immediately transmitted to the application instead of queuing in the buffer;

RST: This flag indicates a connection reset request. Used to reset connections that generate errors, and to reject erroneous and illegal packets;

SYN: Indicates the synchronization sequence number, which is used to establish a connection. The SYN flag is used in conjunction with the ACK flag. When a connection request is made, SYN=1, ACK=0; when the connection is responded, SYN=1, ACK=1; packets with this flag are often used for port scanning . The scanner sends a data packet with only SYN. ​​If the other host responds with a data packet, it indicates that the host has this port; but since this scanning method is only the first handshake of the TCP three-way handshake, this scanning method A success indicates that the scanned machine is not very secure, a secure host will enforce a strict TCP three-way handshake for a connection;

FIN: Indicates that the sender has reached the end of the data, that is to say, the data transmission between the two parties is completed, and there is no data to transmit. After sending the TCP packet with the FIN flag, the connection will be disconnected. Packets with this flag are also often used for port scanning.

Window: The window size, also known as the sliding window, is used for flow control; this is a complex issue and will not be summarized in this blog post;

Well, the basic knowledge is ready, let's start the next journey.

What is the three-way handshake?

TCP is connection-oriented, and before either party sends data to the other, a connection must be established between the two parties. In the TCP/IP protocol, the TCP protocol provides reliable connection services, and the connection is initialized through a three-way handshake. The purpose of the three-way handshake is to synchronize the serial numbers and acknowledgment numbers of both parties and exchange TCP window size information. This is the TCP three-way handshake that is often asked in interviews. Just understanding the concept of TCP three-way handshake will not help you to get a job, you need to understand some details of TCP three-way handshake. Let’s talk about the picture first.

What is the three-way handshake?


What a clear picture, of course, I didn't draw it, I just quoted it to illustrate the problem.

1. The first handshake: the connection is established. The client sends a connection request segment, sets the SYN bit to 1, and the Sequence Number to x; then, the client enters the SYN_SEND state and waits for the server's confirmation;

2. Second handshake: The server receives the SYN segment. When the server receives the SYN segment from the client, it needs to confirm the SYN segment and set the Acknowledgment Number to x+1 (Sequence Number+1); at the same time, it also needs to send the SYN request message and set the SYN bit to 1 , Sequence Number is y; the server puts all the above information into a segment (ie SYN+ACK segment) and sends it to the client at the same time. At this time, the server enters the SYN_RECV state;

3. The third handshake: The client receives the SYN+ACK segment from the server. Then set the Acknowledgment Number to y+1, and send the ACK segment to the server. After the segment is sent, both the client and the server enter the ESTABLISHED state and complete the TCP three-way handshake.

After completing the three-way handshake, the client and server can start transmitting data. The above is the general introduction of the TCP three-way handshake.

What about the four breakups?

After the client and the server establish a TCP connection through three-way handshake, when the data transmission is completed, the TCP connection must be disconnected. Then for the disconnection of TCP, there is a mysterious "four breakups".

1. The first breakup: Host 1 (can be the client or the server), set the Sequence Number and Acknowledgment Number, and send a FIN segment to host 2; at this time, host 1 enters the FIN_WAIT_1 state; this means that Host 1 has no data to send to host 2;

2. The second breakup: Host 2 receives the FIN segment sent by Host 1, and returns an ACK segment to Host 1. The Acknowledgment Number is the Sequence Number plus 1; Host 1 enters the FIN_WAIT_2 state; Host 2 tells Host 1, I also have no data to send, so I can close the connection;

3. The third breakup: host 2 sends a FIN segment to host 1, requesting to close the connection, and host 2 enters the CLOSE_WAIT state;

4. The fourth breakup: host 1 receives the FIN segment sent by host 2, sends an ACK segment to host 2, and then host 1 enters the TIME_WAIT state; after host 2 receives the ACK segment from host 1, it Close the connection; at this time, if host 1 still does not receive a reply after waiting for 2MSL, it proves that the server side has been closed normally. Well, host 1 can also close the connection.

So far, TCP's four breakups have been completed so happily. When you see this, you will have a lot of questions in your mind, a lot of things you don't understand, and you feel very messy; it's okay, we continue to summarize.

Why three handshakes?

Since the three-way handshake of TCP is summed up, why does it have to be three times? How can I think that two times can be completed. Then why does TCP have to make three connections? In Xie Xiren's "Computer Network", he said:

In order to prevent the failed connection request segment from being suddenly sent to the server again, resulting in an error.

An example is given in the book at the same time, as follows:

The "invalid connection request segment" is generated in such a case: the first connection request segment sent by the client is not lost,

Instead, it stays at a certain network node for a long time, so that the server is delayed until a certain time after the connection is released. It was a

A segment that has long since expired. However, after the server receives this invalid connection request segment, it mistakenly thinks that it is a new one sent by the client again.

connection request. So it sends a confirmation segment to the client and agrees to establish a connection. Assuming that the "three-way handshake" is not used, as long as the server

A confirmation is issued and a new connection is established. Since the client does not send a request to establish a connection now, it will ignore the server's confirmation.

No data will be sent to the server. But the server thinks that a new transport connection has been established and has been waiting for the client to send data. so,

A lot of server resources are wasted in vain. The "three-way handshake" approach can prevent the above phenomenon from happening. For example, in the case just now,

The client does not issue an acknowledgment to the server's acknowledgment. Since the server does not receive the confirmation, it knows that the client has not requested to establish a connection. "

This is very clear, preventing the server side from waiting all the time and wasting resources.

Why break up four times?

What is the reason for the four breakups? The TCP protocol is a connection-oriented, reliable, byte stream-based transport layer communication protocol. TCP is full-duplex mode, which means that when host 1 sends a FIN segment, it just means that host 1 has no data to send, and host 1 tells host 2 that all its data has been sent; however, At this time, host 1 can still accept data from host 2; when host 2 returns an ACK segment, it means that it already knows that host 1 has no data to send, but host 2 can still send data to host 1; when host 2 also When the FIN segment is sent, it means that host 2 has no data to send, and will tell host 1 that I have no data to send, and then they will happily terminate the TCP connection. If you want to correctly understand the principle of the four breakups, you need to understand the state changes during the four breakups.

FIN_WAIT_1: This state needs to be explained well. In fact, the real meaning of the FIN_WAIT_1 and FIN_WAIT_2 states is to wait for the other party's FIN message. The difference between these two states is: the FIN_WAIT_1 state is actually when the SOCKET is in the ESTABLISHED state, it wants to actively close the connection and sends a FIN message to the other party, at which point the SOCKET enters the FIN_WAIT_1 state. When the other party responds to the ACK message, it enters the FIN_WAIT_2 state. Of course, in the actual normal situation, the other party should immediately respond to the ACK message no matter what the situation is, so the FIN_WAIT_1 state is generally more difficult to see, and The FIN_WAIT_2 status can sometimes be seen with netstat. (active party)

FIN_WAIT_2: This state has been explained in detail above. In fact, SOCKET in the state of FIN_WAIT_2 indicates a semi-connection, that is, one party requests to close the connection, but also tells the other party that I still have some data to send to you (ACK message) , and close the connection later. (active party)

CLOSE_WAIT: The meaning of this state actually means that it is waiting to close. How do you understand it? When the other party closes a SOCKET and sends a FIN message to itself, your system will undoubtedly respond with an ACK message to the other party, and then enter the CLOSE_WAIT state. Next, what you really need to consider is to see if you still have data to send to the other party. If not, then you can close the SOCKET and send a FIN message to the other party, that is, close the connection. So you are in the CLOSE_WAIT state, what needs to be done is to wait for you to close the connection. (passive side)

LAST_ACK: This state is relatively easy to understand. It is a passive shutdown party that waits for the other party's ACK message after sending the FIN message. When the ACK message is received, it can enter the CLOSED available state. (passive side)

TIME_WAIT: Indicates that the FIN message from the other party is received and an ACK message is sent. After 2MSL, it can return to the CLOSED available state. If in the FINWAIT1 state, when a message with both the FIN flag and the ACK flag is received from the other party, it can directly enter the TIME_WAIT state without going through the FIN_WAIT_2 state. (active party)

CLOSED: Indicates that the connection is interrupted.

I think you should understand

To sum up here, it is time to end, but the learning of TCP is far from over. TCP is a very complex protocol. Here is a brief summary of what happens when TCP is connected and disconnected. There are still many "pits". Let us continue to fill in when we have time. Alright, done!

【Editor's Choice】

Reprinted from: http://network.51cto.com/art/201411/456783.htm

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325685576&siteId=291194637