[One of the network principles] Application layer protocol, transport layer protocol UDP and TCP, TCP three-way handshake and four-way wave, and TCP reliability and efficiency mechanism

application layer protocol

In the implemented TCP server and client, the application layer protocol determined by the sending parties is to use a newline character as the end of each message. That is to say, it is encoded according to the newline character when sending, and is decoded according to the newline character when receiving. When developing an application program, a big job is to determine the agreement. Common application layer protocols include HTTP, FTP...

XML protocol

Primarily a format for organizing data. In the XML file, each tag appears in pairs, and the closing tag has a /. If a tag contains sub-tags, then this tag can represent an object. If a label contains multiple identical sub-labels, then this label represents a collection.

insert image description here

Disadvantages: complex structure, unsightly, too many redundant characters, and transmission in the network consumes more bandwidth.

JSON

1. Use {} to represent an object;
2. Use [] to represent a collection;
3. Use "key": "value" for attributes. If the value is an integer, no quotation marks are required;
4. Multiple attributes are separated by commas, and the last attribute does not add a comma.

insert image description here

The advantages of the JSON format are good readability, beauty, and strong scalability. The disadvantage is that it introduces additional characters and takes up a lot of bandwidth.

HTTP

The HTTP protocol will be highlighted in later studies.

transport layer protocol

The core protocol
UDP: connectionless, unreliable transmission, datagram-oriented, full-duplex, limited in size.
TCP: connection, reliable transmission, byte stream oriented, full duplex, unlimited size.

UDP protocol

Features of UDP

1. No connection : The process of UDP transmission is similar to sending text messages. If you know the IP and port number of the peer, you can directly transmit without establishing a connection.
2. Unreliable transmission : There is no security mechanism. After the sender sends the datagram, if the segment cannot be sent to the other party due to network failure, the UDP protocol layer will not return any error information to the application layer.
3. Datagram-oriented : UDP will send the long-term message sent by the application layer to UDP as it is, without splitting or merging.

Use UDP to transmit 100 bytes of data: If the sender sends 100 bytes at a time, then the receiver must also receive 100 bytes at a time; instead of receiving 10 times in a loop, 10 bytes each time.

4. Buffer
• UDP only has a receive buffer, not a send buffer;
• UDP has no real send buffer. The sent data will be directly handed over to the kernel, and the kernel will pass the data to the network layer protocol for subsequent transmission actions; •
UDP has a receiving buffer, but this receiving buffer cannot guarantee the order of received UDP messages and the order of sent UDP messages Consistent; if the buffer is full, incoming UDP data will be discarded.
• UDP sockets can both read and write. This concept is called full-duplex.

5. Size limited:
There is a maximum length of 16 bits in the UDP protocol header. That is to say, the maximum length of data that can be transmitted by a UDP is 64K (including the UDP header).

UDP protocol format

insert image description here

UDP is a transport layer protocol, and the transport layer protocol is implemented through the operating system. The operating system manages the processes, and each process opens a port number. 16 bits can represent up to 65535, indicating that the port number ranges from 0 to 65535. The length of 16-bit UDP is 65535 bytes, which is approximately equal to 64KB. The checksum in UDP is a CRC redundancy check. That is, the value obtained by accumulating each byte in the data (byte array). When parsing a UDP message, first 16 bits indicate the source port, and then cut off 16 bits to identify the destination port number.... The final intercepted data length is determined by the length of UDP.

Example of byte accumulation:

public class Demo_CRC {
    
    
    public static void main(String[] args) throws UnsupportedEncodingException {
    
    
        // 定义两个字符串
//        String str = "你好世界";
        String str = "你好啊,一会去吃火锅吧!!!";
        String abc = "how are you.";
        // 转换成byte数组
        byte[] bytes = str.getBytes("UTF-8");
        System.out.println(Arrays.toString(bytes));
        System.out.println(bytes.length);

        // 循环累加每个byte的值,得到CRC结果
        int crc = 0;
        for (int i = 0; i < bytes.length; i++) {
    
    
            crc += bytes[i];
        }
        System.out.println("str crc = " + crc);

        // 转换成byte数组
        bytes = abc.getBytes("UTF-8");
        System.out.println(bytes.length);
        System.out.println(Arrays.toString(bytes));
        // 循环累加每个byte的值,得到CRC结果
        crc = 0;
        for (int i = 0; i < bytes.length; i++) {
    
    
            crc += bytes[i];
        }
        System.out.println("acb crc = " + crc);
    }
}

TCP protocol

Characteristics of TCP

1 Connection : The process of TCP transmission is similar to all aspects of calling.
2 Reliable transmission : Through various mechanisms of TCP itself to ensure reliable transmission, 3-12 3.
Oriented to byte stream : content is sent in bytes and receiving
4. Buffer : TCP has a receiving buffer and a sending buffer. full duplex.
5. Unlimited size .

TCP protocol format

insert image description here

The 16-bit source and destination ports are the same as in UDP and are used to identify processes.
4-bit header length : 1111 = 15. The header can have a total of 15*4byte=60 bytes. The previous option has 4 * 5 = 20 bytes, so the option has a maximum of 40 bytes.
The data is the load sent by the application layer.
Six flags : URG: Whether the urgent pointer is valid. ACK: Whether the acknowledgment number is valid. PSH: Prompt the receiving end application to read the data from the TCP buffer immediately. RST: The other party requests to re-establish the connection: we call the segment carrying the RST flag a reset segment . SYN: Request to establish a connection: We call the SYN identifier a synchronization segment . FIN: Inform the other party that the local end is going to be closed. We call the end segment carrying the FIN flag .
16-bit checksum : CRC checksum.
Options are custom messages.
16-bit urgent pointer : identifies which part of the data is urgent data, which is not concerned for the time being.
The sequence number, confirmation sequence number, and window size will be introduced later.

Security and Efficiency Mechanism of TCP

Acknowledgment response (reliable mechanism)

In the process of chatting with people, the process of sending and receiving is the confirmation response. Due to network reasons, there may be a problem of out-of-order sending and receiving information. To solve this problem, TCP numbers each byte of data. is the serial number.

insert image description here

This sequence number is stored in the 32-bit sequence number and 32-bit confirmation sequence number mentioned above. For sending and receiving data, TCP provides SYN (send) and ACK (response) to mark. The ACK carries the confirmation sequence number , which is to tell the sender where I have received it, and where you want to start sending it next time. When sending a request, set the SYN flag to 1, and set the ACK flag to 1 when replying.
insert image description here

Timeout retransmission (reliable mechanism)

In the process of transmission in the network, the message will pass through the operating system, network card, switch, router and other network devices. Each device has its own load capacity, if it exceeds the range, the current data packet may be blocked or discarded.

1. The sender loses packets

After waiting for a while and finding that the ACK has not been received, then resend the previous data after the specified time.

2. Response timeout

Host B received the data and sent an ACK response, but host A just did not receive the response. In this case there will be a duplicate reception problem . At this time, host B filters out duplicate data through a 32-bit acknowledgment sequence number in its own buffer. And directly give the ACK response.

insert image description here

So, how to determine if the timeout time?

• Ideally, find a minimum time to ensure that "the confirmation response must be returned within this time".
• However, the length of this time varies with different network environments.
• If the timeout is set too long, it will affect the overall retransmission efficiency; if the timeout is set too short, repeated packets may be sent frequently; •TCP is to ensure high-performance communication
in any environment , so the maximum timeout period will be calculated dynamically. In Linux (the same is true for BSD Unix and Windows), the timeout is controlled with a unit of 500ms, and the timeout time for each timeout retransmission is an integer multiple of 500ms. If there is still no response after retransmission, wait for 2 500ms before retransmitting. If there is still no response, wait for 4 500ms for retransmission. And so on, increasing exponentially.
• After accumulating a certain number of retransmissions, TCP considers that the network or the peer host is abnormal, and forcibly closes the connection.

Connection management (reliable mechanism)

When hosts communicate on the network as senders and receivers, they must confirm the ability of both parties to send and receive data, which involves the negotiation process of establishing and disconnecting connections. Every day before the first train of the high-speed rail, it will run empty. For network communication, it is to check the capabilities of the sending and receiving parties .

Three-way handshake (connection process)

Through the process of SYN and ACK twice, it can be guaranteed that there is no problem with the networks of both parties. On this basis, normal data transmission and reception can be carried out. TCP itself optimizes the efficiency and combines SYN+ACK into one operation, which is the three-way handshake .
insert image description here

The two handshakes cannot be used to confirm the sending and receiving capabilities of both parties, because there is no complete verification. Four times is ok, just disassemble SYN+ACK.

Another important function of the three-way handshake is to negotiate where the serial number starts .

insert image description here

port status

Check the port through the netstat-an command:

insert image description here

Waved four times (disconnected process)

insert image description here

The first ACK is the response of the TCP protocol implemented by the operating system, and the second FIN is at the application level. There is a time difference between these two operations, and there is a high probability that they will not be merged and returned together , so it is described as four waved hands. How to deal with the second FIN packet loss? If the packet is lost, it will trigger a timeout retransmission.

state transition

insert image description here

Server:

[CLOSED -> LISTEN]After the server calls listen, it enters the LISTEN state and waits for the client to connect;
[LISTEN -> SYN_RCVD]once it monitors the connection request (synchronization segment), it puts the connection into the kernel waiting queue and sends a SYN confirmation message to the client.
[SYN_RCVD -> ESTABLISHED]Once the server receives the confirmation message from the client, it enters the ESTABLISHED state and can read and write data.
[ESTABLISHED -> CLOSE_WAIT]When the client actively closes the connection (calling close), the server will receive the end message segment, the server returns the confirmation message segment and enters CLOSE_WAIT; after entering CLOSE_WAIT, it
[CLOSE_WAIT -> LAST_ACK]indicates that the server is ready to close the connection (the previous data needs to be processed); when the server really When close is called to close the connection, a FIN will be sent to the client. At this time, the server enters the LAST_ACK state and waits for the last ACK to arrive (this ACK is the client's confirmation that the FIN has been received). If there are a large number of CLOSE_WAIT states in the system, it may be that the program has not called the close() method.
[LAST_ACK -> CLOSED]The server receives the ACK to FIN and completely closes the connection. After Close, it will wait for the system to reclaim resources.

client:

[CLOSED -> SYN_SENT]The client calls connect to send a synchronous segment;
[SYN_SENT -> ESTABLISHED]if the connect call is successful, it enters the ESTABLISHED state and starts reading and writing data;
[ESTABLISHED -> FIN_WAIT_1]when the client actively calls close, it sends the end segment to the server and enters FIN_WAIT_1 at the same time
[FIN_WAIT_1 -> FIN_WAIT_2]; After the confirmation of the end segment, enter FIN_WAIT_2 and start waiting for the end segment of the server; [FIN_WAIT_2 -> TIME_WAIT] The client receives the end segment from the server, enters TIME_WAIT, and sends LAST_ACK; the client has to
[TIME_WAIT -> CLOSED]wait A 2MSL (Max Segment Life, message maximum lifetime) time will enter the CLOSED state.

Sliding window (efficiency mechanism)

The process of sending and receiving data can ensure normal communication, but the efficiency is not high. Since the performance of this method of sending and receiving is low, we send multiple pieces of data at a time , as shown in the figure:

insert image description here

1. Diagram
• The sliding window itself is a data structure used to maintain the size of the window and the data that has been sent and is being sent.
• The data in the white box is the data segment waiting for ACK.
• The window size refers to the maximum value that can continue to send data without waiting for an acknowledgment. The window size in the above figure is 4000 bytes (four segments).
• When sending the first four segments, send directly without waiting for any ACK;
• After receiving the first ACK, the sliding window moves backwards and continues to send the data of the fifth segment; and so on
; To maintain this sliding window, it is necessary to create a sending buffer to record what data is currently unanswered; only the data that has been confirmed and answered can be deleted from the buffer;
• The larger the window, the higher the throughput of the network;
insert image description here

2. Predictable packet loss problem

The ACK response is lost:

Even if a certain ACK is lost in the middle, the confirmation sequence number in the last ACK response indicates that all previous data packets have been received.

insert image description here

SYN request lost:

In the process of receiving data, if it is found that part of the 32-bit sequence number is missing, ACK will be sent all the time to ask the sender for the missing part of the data. At this time, other data received will be cached, and the missing data will be assembled after the data is completed.

insert image description here

3. Efficiency of sliding window

The efficiency depends on the size of the window;
the larger the window, the higher the efficiency;
the smaller the window, the lower the efficiency;
assuming that the window is infinite, the sender does not need to wait for ACK at all, and the efficiency is the same as UDP.

flow control (reliable mechanism)

The efficiency of the sliding window is mentioned above, so how big is the sliding window? Flow control mainly confirms the size of the sliding window, which is confirmed through dynamic negotiation between the sender and the receiver. For example, after cooking, ask how much you can eat, and I will give you as much rice as you like.

1. Send and receive buffer

insert image description here

Each program will apply for system resources when it starts, and the sending and receiving buffers are the requested resources, that is, an area in the memory used to store BYTE data streams. ACK fills the window size protocol field (16-bit window size) with the size of the remaining space in the buffer. The receiver counteracts the sender's limit on the window size, and the sender cannot expand the window size without limit in order to improve efficiency. The size of the used space and the remaining space is dynamic, and each time the receiver reads data from the buffer, the remaining space will become larger.

2. Specific process

• The sender sends data to the receiver;
• After receiving the data, the receiver stores the data in the receiver's buffer (a space opened in the memory); • The
receiver's application reads from the buffer through socket api (lnputStream) When reading data, the data in the buffer will be less, which is equivalent to sending the remaining space of the buffer to the sender when the receiver of the data is ACKed; • The size of the remaining space is equivalent
to The receiving end puts the buffer size it can receive into the "window size field" in the TCP header, and notifies the sending end through ACK; • The larger the window size field, the higher the throughput of the network.
Receive Once the end finds that its buffer is almost full, it will set the window size to a smaller value and notify the sender; •
After receiving the window, the sender will slow down its sending speed. When it is full, the window will be set to 0; at this time, the sender will no longer send data, but it needs to periodically send a window detection data segment, so that the receiver can tell the sender the window size.

If the receiving side has low processing power, it is possible for the buffer to fill up. At this time, the sender will send a window detection request every once in a while , without real data, and ask the receiver how much more it can receive.
insert image description here

3. Actual window size

In the TCP header, there is a 16-bit window field, which stores the window size information. The maximum value of 16 digits is 65535, so is the maximum TCP window 65535 bytes? In fact, the 40-byte option in the TCP header also includes a window expansion factor M, and the actual window size is the value of the window field shifted left by M bits .

Congestion control (reliable mechanism)

The data transmission process in the network is very complicated, which may pass through many network devices such as switches and routers. A problem with every network device will affect the transmission.

insert image description here

TCP introduces a slow start mechanism, which sends a small amount of data first, explores the path, finds out the current network congestion state, and then decides how fast to transmit data.
A concept introduced here is called the congestion window .

1. At the beginning of sending, the congestion window size is defined as 1; each time an ACK response is received, the congestion window is increased by 1; 2.
Every time data is sent next, the window size is expanded exponentially by 2 4 8 16;
3. When it reaches At the initial threshold, it no longer expands exponentially, but increases linearly, adding 1 each time;
4. When the window reaches a certain value, a large number of packet loss occurs, that is to say, frequent timeout retransmissions occur , which means that the network is congested;
5. The size of the congestion window returns directly to the minimum value of 1, and the new threshold of the congestion window will also be adjusted to half of the current congestion window;
6. Repeat steps 1-5.
Each time a data packet is sent, the congestion window is compared with the buffer size of the receiving end, and the smaller value is taken as the actual sending window .
insert image description here

A small amount of packet loss only triggers timeout retransmission; a large amount of packet loss means that the network is congested; when TCP communication starts, the network throughput will gradually increase; as the network becomes congested, the throughput will immediately decrease; congestion control, In the final analysis, the TCP protocol wants to transmit data to the other party as quickly as possible, but it also needs to avoid a compromise that causes too much pressure on the network.
insert image description here

Delayed response (efficiency mechanism)

In the process of sending and receiving, the receiver is constantly processing data, and the unused part of the buffer of the receiver is constantly increasing. By delaying the response, the latest unused size of the buffer can be returned to the sender, thereby increasing the window size and improving the efficiency of network sending and receiving .

insert image description here

1. Interval response: the number of intervals is generally 2. That is, it does not necessarily respond every time, but once receives two requests and responds once. For example, 2 4 6 8 answers. But if the entire sending process is finished after only 3 times, there should be no way to return ACK due to delay.
2. Time limit: Respond once when the maximum delay time is exceeded. The system has a default value, generally 200ms, which can be modified.

Piggybacking (efficiency mechanism)

Normally, when the receiver receives a SYN request, the system kernel will immediately respond with an ACK. The real response is made by the application program, and there is a certain time difference from the timing of the ACK. Due to the existence of delayed response, there may be a situation where the SYN message and the ACK message are sent at the same time , then the system will combine the two messages into one. This mechanism is called slightly responsive.

Note: Although there is a slightly responsive mechanism, it does not happen 100% of the time. This is handled by the system kernel.

stream-oriented

insert image description here

Since the receiver will put the data sent by the sender in the receiver's buffer, and the receiver's buffer is a BYTE array, it cannot effectively distinguish the boundary of the message. This phenomenon is called the sticky packet problem.

Solve the sticky package problem:

1. Add a special delimiter at the end of the message to mark the end of the message; when using it, just intercept the buffer content according to the special character.
2. Use a field specially used to describe the length of the message body to identify the specific length of the message body.

insert image description here

Before reading the message, first read out the 4byte field content indicating the length of the message body, and the value is 42;
continue to read 42 bytes in the buffer, and these 42 bytes represent the content of the message;
then read 4byte Indicates the length of the next message, ..., just execute it repeatedly.

JSON uses braces to wrap messages, so it can be understood that he uses braces as special characters to indicate the end of the message.
HTTP, the application layer protocol, uses a field indicating the length of the message even if the delimiter is used to solve the problem of sticky packets.

Handling of TCP exceptions

1. Program crash : The operating system will perceive it and handle it accordingly. The operating system will reclaim the resources of the process, including the release of the file descriptor, which is equivalent to calling the close of the corresponding socket, and then triggers the FIN operation, and then starts to enter the four-wave wave, which is no different from the ordinary four-wave wave.
2. Normal shutdown : Through the start menu or executing the shutdown command, the system will forcibly end all processes and recycle resources, which is similar to the process of program crash execution.
3. Host power failure : the operating system will not respond.
The receiver is powered off: the sender does not know that the receiver is hung up, and continues to send data and does not receive an ACK response after sending the data, triggering a timeout retransmission for multiple retransmissions without receiving an ACK response, and will try to reset the connection (RST flag) connection reset also fails, and the connection can only be abandoned.
Power failure of the sender: generally occurs in a long connection, the server and the client will maintain a heartbeat packet (the client sends a data packet to the server every 1 second to prove that it is alive if the server has not received the heartbeat packet, such as over If you haven’t received it after 10 seconds, it is judged that the client is hung up, and you can disconnect the connection by yourself and reconnect after the client’s
network recovers . The host is working normally.


Keep going~
insert image description here

Guess you like

Origin blog.csdn.net/qq_43243800/article/details/131535424