Summary of Personal TCP/UDP (1)

In the process of learning, I believe that everyone has had the experience of "learning and forgetting", especially the communication protocol with dense TCP/IP knowledge points, so I will summarize what I have learned here, and hope that I will be helpful to everyone. also helped.

This blog mainly talks about TCP and UDP in the transport layer, and the IP layer will have time to add it later.

 

TCP

The full name of TCP is Transmission Control Protocol, and the purpose of this protocol is to provide reliable transport services for network data.

tcp5 features:

1) Connection-oriented. Before an application uses the TCP protocol, it must first establish a TCP connection ( three-way handshake ). After the data transfer is complete, the connection must be released.

2) Each TCP connection can only have two endpoints .

3) Provide reliable delivery of services. Data transmitted over a TCP connection is error-free, not lost, not duplicated, and arrives in sequence. (In fact, many of our web services also require reliability, so can the design of the service at the application layer refer to the design of the TCP protocol?)

4) Full duplex communication. Both communicating parties can send data at any time.

A key point here is that there are send buffers and receive buffers (in the kernel) at both ends, which are used to temporarily store data for two-way communication . When sending, the application process transfers the data to the send buffer in the kernel , then does other things (typically asynchronously), and then the kernel sends the data out at the appropriate time. When receiving, the kernel puts the received data into the receive buffer , and the application process reads the data in the buffer at the appropriate time. (These two caches are very important, and various IO models will do things on these two caches in the future)

5) Oriented to byte stream. A "stream" refers to a sequence of bytes flowing into or out of a process. Although the interaction between the application process and TCP is a block of data of varying sizes at a time, TCP regards the data transmitted by the application process as a series of unstructured byte streams (as continuous).

                                                 tcp stream-oriented conceptual diagram

TCP does not care how long the application process sends to the TCP buffer at a time, but determines how many bytes a segment should contain according to the window value given by the other party and the current degree of network congestion -- if If the data block sent by the application process to the TCP buffer is too long, TCP will divide it into shorter and then send it; if the application process only sends one byte at a time, TCP can also accumulate enough bytes to form a segment and send it out.

How reliable transmission works]

Said that TCP is reliable, what kind of conditions are reliable?

Ideal reliable transmission conditions have two characteristics:

1) The transmission channel does not generate errors.

2) No matter how fast the sending speed is, the receiver can always process the received data in time.

We can't achieve ideal conditions, but we can achieve as much as possible.

The way to achieve the first point is to let the sender retransmit the data with the error when an error occurs;

The way to achieve the second point is that when the receiver has no time to process the data, it promptly tells the sender to appropriately reduce the sending speed .

 

How to implement the method of the first point? First introduce the simple stop and wait protocol.

Stop and wait for the agreement (to understand, to pave the way for the future)]

"Stop and wait" means to stop sending after sending a packet (collectively referred to as data unit, called data table in IP, and segment in TCP layer), waiting for the confirmation of the other party. Send the next packet after receiving the acknowledgment.

There are three situations in this agreement.

Case 1. No error case]

A sends the packet M1, and then suspends the transmission after sending, and waits for the confirmation (ack) of B. When B receives M1, it sends a confirmation to A. And so on. There is no problem in this case.

Case 2. An error occurs]

In the case of the M1 packet as shown in the figure - when the M1 packet is sent to B, B detects an error in M1, discards M1, and then does nothing (does not notify A of the received packet with errors); or M1 is lost during transmission, at which point B of course doesn't know anything. In both cases, B will not send any messages to A, including confirmation messages.

In this case, the method adopted by A is called timeout retransmission : if A still does not receive an acknowledgment after a period of time, it considers that the packet just sent has been lost, and thus retransmits the packet sent last time.

Therefore, every time a packet is sent, a timeout timer is set for the packet , and if an acknowledgment is received within the time, the timeout timer is cancelled .

There are three main points here :

1. A needs to temporarily keep a copy of the sent packet. Clear the copy after receiving confirmation.

2. The packets and the acknowledgments of the packets are numbered so that it is known which packets received the acknowledgment.

3. From the above figure, it can be seen that the retransmission time is longer than the evaluation round trip. (How much is the specific length, it is more troublesome to determine this value, and it must be analyzed according to the network situation at that time)

Case 3. Confirmation lost and late confirmation]

This is easy to understand literally, that is, for example, B received the M1 packet, but the acknowledgment sent by B was lost or late. Therefore A will retransmit M1 after the timeout timer expires. At this time, B received the previous M1 again! What should B do? (Confirmation lost)

1. Discard the duplicate M1.

2. Then send A confirmation of M1. This acknowledgment must be sent, it cannot be said that it has been sent before and will not be sent.

There is a detail here that when B receives M1 again, it knows that it is a duplicate!

 

Acknowledgment late--because of network congestion and other conditions, the M1 acknowledgment sent by B to A may be late. B's confirmation is late, which means that A times out after waiting for confirmation! If it times out, M will be sent again (after B receives this duplicate M1, it will be discarded directly, such as the method of losing the confirmation above), if A receives the corresponding confirmation from B on time, then the late confirmation will be received repeatedly, and A received What to do with this duplicate confirmation? It is very simple, and it is also discarded directly after receiving it!

 

Reliable transmission protocols like the above stop and wait are usually called ARQ (Automatic Repeat reQuest)

shortcoming:]

The stop and wait protocol has a characteristic. It can be seen from the error-free picture below that it is linear ! After sending one data, you must wait for the confirmation of receipt before sending the second one, which can be seen as the synchronization on the imaging program . We can calculate the channel utilization for doing so. TD is the time it takes for A to send data, RTT is the round-trip time for data in the channel, and TA is the time for B to send an acknowledgment (the acknowledgment time is very short). Because only the TD time is the useful data sent , the utilization formula is:

U = TD / (TD + RTT + TA)

It can be seen from this formula that when the channel is very long (it must be very long, it is all over the world) and the transmission power is very large (with the development of science and technology, the transmission speed must be faster and faster), our channel utilization rate is very low ! That is to say, this channel is idle most of the time.

Then we let A send a packet continuously, without having to stop and wait for confirmation every time a packet is sent ? Similar to the asynchronous service we mentioned in web development.

Of course there is, the following is an asynchronous protocol similar to this - the continuous ARQ protocol , and the following big BOSS sliding window protocol .

 

Continuous ARQ Protocol]

In fact, the protocol is very simple, that is, there is a sending window , and 5 (numbers can be changed) packets in a window are sent together . Then the receiver adopts the method of accumulative acknowledgment , that is, the receiver does not need to send acknowledgments to the received packets one by one, but sends an acknowledgment to the last packet that arrives in sequence after receiving several packets.

As shown in the figure:

(a) The figure first sends the 12345 numbered packet, and after receiving the last packet confirmation sent by the receiver (that is, the confirmation of the fifth packet), (b) the figure moves forward by 5 packets, and then sends it at one time 678910 Packet data.

According to this brief introduction, the continuous ARQ protocol presents a big problem in this case:

 

Cumulative confirmation has advantages and disadvantages. The advantage is that it is easy to implement and does not have to be retransmitted even if the acknowledgment is lost. But the disadvantage is

Information about all packets that the receiver has received correctly cannot be reflected to the sender.

   For example, if the sender sent the first 5 packets and the middle 3rd packet was lost. At this time, the receiver can only send acknowledgments to the first two packets. The sender cannot know the whereabouts of the last three packets, and has to retransmit the last three packets. This is called Go-back-N, and it means that it needs to go back to retransmit the IV packets that have been sent. It can be seen that when the quality of the communication line is not good, the continuous ARQ protocol will bring a negative impact.

   Before discussing the reliable transmission of TCP in depth, let's understand the format of TCP's segment header.

 

In the last paragraph, Barbara, that is to say, the cumulative confirmation is flawed. Before solving this defect, let's take a look at the TCP header format!

 

TCP segment header format]

The unit of data transmitted by TCP is a segment. A segment consists of the TCP header and the segment data part . All the functions of TCP are reflected in the role of each field in its header. The head picture is as follows:

The source port number and destination port number are easy to understand.

Serial number : Occupies 4 bytes, the range is 0~2 to the 32nd power -1 (0~4294967295). After the sequence number is increased to 4294967295, the next sequence number returns to 0. Each byte in the stream of bytes transmitted by TCP is numbered sequentially . The value of the sequence number field in the header refers to the sequence number of the first byte of the data sent in this segment.

For example, the value of the sequence number field of a segment is 301, and the data carried is 100 bytes in total, so the data sequence number of the next segment is 401. (This sequence number is very important and will be used later when reorganizing the message)

There are a total of 4294967296 serial numbers, will they run out ? Under normal circumstances, it can be guaranteed that when the serial number is reused, the old serial number data has already reached the end point through the network.

Acknowledgment number: It is the sequence number of the first data byte expected to receive the next segment of the other party.

For example: B correctly received a segment sent by A, the sequence number value is 501, and the data length is 200 bytes (sequence numbers 501~700), which indicates that B expects to receive the next data number from A is 701, so B sets the acknowledgment number to 701 in the acknowledgment segment sent to A.

Header length (data offset) : The unit of the header length is 32 bits (4 bytes), that is, if the value of this field is 0110 (6), the length of the header is 24 bytes, and the fixed header is 20 bytes, the options section is only 4 bytes. Since the header field occupies 4 bits, that is, the longest header length is 4 * 16 = 60 bytes, that is, options are up to 40 bytes .

Reserved : Reserved for future use, but should be set to 0 for now.

The next six control bits are used to describe the nature of the segment.

Urgent URG : Used to process urgent segments. That is, this segment has a high priority and is processed first. Set this bit to 1 to send urgent segments. How to handle urgent segments will be described later.

Acknowledgment ACK : is the segment that has been acknowledged. For example , all transmitted segments after the connection is established must have ACK set to 1 .

Push PSH : Push operation is rarely used and can be used as an understanding. (This is different from the push I said in my business)

Reset RST : When RST=1, it indicates that there is a serious error in the TCP connection, the connection must be released, and then the connection is re-established . RST is also set to 1 to reject an illegal segment or to refuse to open a connection.

Synchronization SYN : Used to synchronize the sequence number when the connection is established. Generally used in conjunction with ACK.

When SYN=1 and ACK=0, it indicates that this is a connection request segment . If the other party agrees to establish a connection, make SYN=1 and ACK=1 in the response segment .

Therefore, setting SYN to 1 indicates that this is a connection request or connection accept message. Subsequent articles will cover establishment and release.

Window : It is a means of TCP flow control (remember what we said earlier about the second condition for reliability? Control the sending speed! It is through the window to control the sending speed!).

This window refers to the receiving window of the party sending this segment (and the sending window, but this refers to the receiving window), which occupies 2 bytes, so the value is in the range of 0 to 2 to the 16th power -1. Integer.

The window value tells the other party (the point! is to tell the other party) : from the confirmation number of this segment, the amount of data that the other party is currently allowed to send (how many bytes of data can be accommodated), so that the other party can control the sending speed. It's a bit confusing, for example:

If the acknowledgment number of the segment I send is 701 and the window value is 1000, it means that starting from 701 (the serial number must be >=701), I can still receive 1000 bytes of data (receive buffer space).

The window value is the basis for the other party to set the sending window.

 

Checksum : Filled by the sender, and the receiver performs the CRC algorithm on the TCP segment to check whether the TCP segment is damaged during transmission.

This check includes not only the header , but also the data part . It is an important guarantee for reliable transmission !

Urgent Pointer : Indicates the number of bytes of urgent data in this segment. It points out the position of the end of the urgent data in the segment (this is not very understandable, it doesn't matter, it will be introduced later) ! Urgent pointers are only meaningful when URG=1.

 

Header options:

A typical header option structure is as follows:

The first kind of an option specifies the type of option. The second length specifies the total length of the option, which includes the two bytes occupied by kind and length. The third info is the specific information of the option.

Some TCP options only have the kind field. There are 7 common TCP options:

kind=0: indicates the end option

kind=1: empty option, no special meaning

kind=2: Maximum segment length (MSS) option . The TCP module usually sets the MSS to MTU (the maximum transmission unit of the IP layer, the maximum value is 1500) - 40 (20 TCP header bytes + 20 IP header bytes). So the MSS maximum value is 1500-40=1460 bytes. The default value of MSS is 536 bytes.

kind=3: Window enlargement option . Prevents window values ​​from being insufficient. If the window value in the header is N and the window expansion factor is M, then the maximum value of the window is 2 to the (N+M) power - 1.

Window enlargement options can be negotiated when both parties initially establish a TCP connection. If one end realizes window expansion, when he no longer needs to expand its window, set this value to 0.

kind=4: Selective Acknowledgement (SACK) option . We mentioned earlier in the stop and wait protocol that there will be repeated sending of packets or repeated sending of acknowledgments. SACK is a technology invented to improve this situation, it makes the TCP communication end only resend the lost TCP segment.

kind=5: SACK actual working option. The parameter of this option tells the sender that it has received and buffered discontinuous data blocks (with data loss), so that the sender can check and resend the lost data blocks based on this.

Select Confirm We will introduce later.

kind=8: Timestamp option . This option has two main functions:

1. Used to calculate the round-trip time RTT . The sender puts the time value of the current clock into the timestamp value field when sending the message segment, and the receiver copies the timestamp field value to the timestamp echo reply field when confirming the message segment !

2. Handle the case where the TCP sequence number exceeds 2 to the 32nd power ! This is called Preventing Sequence Number Wraparound (PAWS). The serial number is only 32 bits. When using a high-speed network, it is very likely that the serial number will be reused in the data transmission of the TCP connection ! That is to add a timestamp to distinguish it!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325050893&siteId=291194637