Transport Layer TCP and UDP

Transport layer: responsible for the ability of data to be transmitted from the sender to the receiver,process to processCommunication

Let's talk about port numbers

The port number identifies a specific process in the application layer. Only with the port number can the specific process delivered by the transport layer be identified.

In the TCP/IP protocol, a five-tuple such as "source IP", "source port number", "destination IP", "destination port number", "protocol number" is used to identify a communication
insert image description here
port number range division:

0 - 1023: Well-known port numbers, HTTP, FTP, SSH and other widely used application layer protocols, their port numbers are all fixed
1024 - 65535: The port number dynamically allocated by the operating system. The port number of the client program, is allocated by the operating system from this range

When we write a program to use port numbers, we must avoid these well-known port numbers

pidof command

Syntax: pidof [process name]
Function: Find process pid according to process name

UDP

①UDP protocol side format

insert image description here
16-bit UDP length: Indicates the length of the entire datagram (UDP header + UDP data)
16-bit checksum: Checks whether the received message data is consistent with what was sent, if the checksum is wrong, it will be discarded directly

How UDP guarantees the separation of header and
payload

How does UDP decide which protocol to deliver its payload to the upper layer?
Through the 16-bit destination port number, the target process is bound to the port number, and through it, it will find out
how the upper-layer protocol transport layer finds the corresponding service through the port process?
Through the directly mapped hash table, an array of size 65536 is opened, and the array stores the pid bound to the port

Understanding of headers:
insert image description here

②Characteristics of UDP

1. As long as the data sent from the application layer is received, it does not care whether the peer end can communicate or not, and there is no sending buffer, and immediately sends it to the network as it is.
2. The process of UDP transmission is similar to sending a letter, UDP just sends it out, as for the other party I don't care if I receive it or not
. 3. No connection: I know the IP and port number of the peer and transmit it directly without establishing a connection
. 4. Unreliable: There is no confirmation mechanism and no retransmission mechanism; if the segment cannot be sent due to a network failure On the other side, the UDP protocol layer will not return any error information to the application layer.
5. Datagram-oriented: The number and quantity of read and write data cannot be flexibly controlled. The application layer sends the message to UDP, and UDP sends it as it is, not only will not split, nor merge

For datagram-oriented understanding:
if UDP is used to transmit 100 bytes of data: the sender calls sendto once and sends 100 bytes, then the receiver must also call the corresponding recvfrom once, read it once, and receive 100 bytes ; instead of calling recvfrom 10 times in a loop, each receiving 10 bytes

③ UDP buffer

UDP does not have a real send buffer. Calling sendto will be directly handed over to the kernel, and the kernel will directly pass the data to the network layer protocol for subsequent transmission actions

UDP has a receiving buffer, which can temporarily save the arriving packets, but this receiving buffer cannot guarantee that the order of the received UDP packets is the same as the order of the sent UDP packets. If the buffer is full, the incoming UDP data will be throw away

④Precautions for using UDP

There is a maximum length of 16 bits in the UDP protocol header. That is to say, the maximum length of data that a UDP can transmit is 64K (including the UDP header).
If the data we need to transmit exceeds 64K, we need toManual subcontracting at the application layer, sent multiple times, and assembled manually at the receiving end

TCP

TCP is a reliable protocol. In order to ensure reliability, TCP does more processing, and the efficiency will be reduced
. UDP does not guarantee reliability, so it is simpler and faster.

TCP protocol segment format

insert image description here

Serial number: the starting position of the sent data, randomly generated by the computer
Confirmation serial number: Indicates the starting serial number of the message that should be sent next time, and informs the other party that all bytes before the confirmation serial number have been received by the
TCP header Length: Indicates the TCP header Length, the unit is four bytes, so the maximum length of the TCP header is 15 * 4 = 60, including options, and the minimum is the standard header of 20 bytes.
Reserved: six-bit flags for fields that have not been used
: are used to distinguish different SYN : Indicates
the establishment of a connection ACK :
Confirmation number is valid , the upper layer will be notified to read only when the water level is reached, and PSH is to let it read RST when it does not arrive: There is a problem with the connection between the two parties, and the other party requests to re-establish the connection URG: Whether the emergency pointer is valid, if it is valid, it will be View the urgent pointer to read this part of the data first 16-bit urgent pointer: the offset of the urgent data in the message, which part of the data is urgent data, and how many data to read is determined by the description of the option field




For RST:
insert image description here

In fact, if one party disconnects abnormally in the middle, when it receives a message from the other party, it will send RST to the other party, indicating that the other party believes that the connection is still there, but it has been disconnected, and requests to re-establish the connection.

How the header and payload are separated: through the 4-bit header length field

TCP not only guarantees reliability, but also implements measures to improve transmission efficiency

reliability

The reliability of TCP includes two aspects: on the one hand, it is reflected by the header, and on the other hand, it is reflected by the code logic of tcp

①Checksum

Check the entire message, if there is an error, it will be discarded directly

②Acknowledgement acknowledgment (ACK) mechanism

Generally speaking, if the message we send receives a response, then it can be considered that the message is reliably received by the other party, which is the confirmation response mechanism

If the acknowledgment is not received within a period of time, the sender can consider that the data has been lost and retransmit, so that even if the packet is lost, the data can be guaranteed to reach the other party for reliable transmission.
insert image description here

③Serial number

Each byte of data sent by TCP is numbered. That is, the sequence number

The serial number identifies whether the data has been received
insert image description here
. Each ACK has a corresponding confirmation serial number, which means to tell the sender that the previous serial number has been received. Please start sending from here next time.

The sequence number to arrive at the
TCP in sequence is oriented to the byte stream, and the order in which the sent messages arrive in the receiver buffer is unpredictable, so how to ensure the order of the messages in the receiving buffer?
When receiving several messages, Through the 32-bit sequence number, the packets are rearranged in the transport layer according to the sequence number and placed in the receiving buffer, thus ensuring that the sending order is consistent with the receiving order

Sequence number deduplication:
If a message with the same sequence number is received, the receiving end will directly discard the message, so as to achieve the effect of deduplication

Why do you need two sets of serial numbers?
TCP is full-duplex, both parties can send data at the same time, and both parties need to confirm the response mechanism, so two sets of serial numbers are required, one is the serial number of the data sent by itself, and the other is the serial number of the received data.

④Timeout retransmission mechanism

As mentioned above, the basis for interpreting whether the data sent by yourself is received is whether a response is received. If it is not received after a certain period of time, it will be retransmitted.

There are only two reasons for not receiving it. The sending message is lost or the response message is lost
. Case 1: The sending message is lost.

insert image description here
Situation 2: Loss of Response Packetinsert image description here
When a duplicate packet is received, the duplicate packet will be determined according to the sequence number, and the sent packet will be discarded directly

Then, how to determine the timeout time? The
timeout is controlled in a unit of 500ms, and the timeout time for each time-out retransmission is an integer multiple of 500ms.

If there is still no response after resending once, wait for 2*500ms before retransmitting, if still no response, wait for 4*500ms to retransmit. And so on, increasing exponentially

After accumulating a certain number of retransmissions, TCP considers that the network or the peer host is abnormal, stops the retransmission, forcibly closes the connection, and notifies the application that the communication is abnormal and forcibly terminated

⑤Flow control

Data sending process:
insert image description here

Flow control solves the problem of packet loss caused by the fact that the sender's data is too fast and the receiver is too late to receive it, that is,Coordinate sending data and receiving capabilities

insert image description here
After the sender receives this window, it will slow down its sending speed by sliding the window.
insert image description here

So how do you know the window size of the peer end when sending data for the first time?
The window size is negotiated during the three-way handshake

So the window size in the message refers to the size of the remaining space in its own receive buffer

Adjust the speed of sending data by receiving the window size of the other party, so as to achieve the purpose of flow control

⑥Connection management

Under normal circumstances, TCP needs to go through a three-way handshake to establish a connection, and wave four times to disconnect the connection.
Three-way handshake:
insert image description here
Four wave:
insert image description here

TIME_WAIT status

The TCP protocol stipulates that the party that actively closes the connection must be in the TIME_WAIT state and wait for two MSL (maximum segment lifetime) times before it can reach the CLOSED state.

Therefore, in the TIME_WAIT state, the client has not released a clean connection, and the port number is still occupied, which makes it impossible to bind the port number.

So how to solve this binding failure problem?
After creating the listening socket, use setsockopt() to set the socket descriptor. The option SO_REUSEADDR is 1, which means that multiple socket descriptors with the same port number but different IP addresses are allowed to be created
insert image description here

⑦Congestion control

As for the previous flow control, those only considered the situation of double-end hosts, and did not consider the network status.
Although network congestion cannot be solved well, it can not increase the congestion situation and wait for the recovery of the situation, which is equivalent to reducing traffic jams. Vehicles enter the congested area to relieve congestion
Network congestion affects the hosts in the entire network, all hosts are responsible, and not only a single host implements congestion control
. A small amount of packet loss, we only think it is triggering timeout retransmission; a large number of Packet loss, we think the network is congested

In order to limit the amount of data sent at the sender, the concept of congestion window is introduced: the
congestion window is a numerical value, which means that the data sent at one time is larger than the congestion window, which may cause network congestion problems.

Each host has a congestion window size, and the congestion window size is dynamically changed, and the congestion window of different hosts may be different

Sliding window VS TCP packet window VS congestion window

sliding window Represents the size of the sending capability, equal to min (TCP window size, own congestion window size)
TCP window size Represents the size of the receiving capability of the TCP sender (remaining space in the receive buffer)
congestion window Numerical value representing possible network congestion

The following discusses how to solve a single host:

TCP introduces a slow-start mechanism:
first send a small amount of data, explore the path, find out the current network congestion status, and then slowly restore communication
. is exponential

Why restore to 1 and then perform exponential growth?
1. It is necessary to confirm whether the network still has packet loss even when there is only a small amount of data.
2. In the early stage of the exponential growth, it can be tested more accurately, and then the communication state can be quickly restored in an exponential form.

However, during recovery, the exponential growth is too fast. If it is larger than the size of the receiving window of the other party, the meaning of the congestion window will be lost. Therefore, set a threshold for slow start, and if the threshold is exceeded, use linear growth instead.
insert image description here
Every time network congestion occurs, slow The start threshold will be halved, and the congestion window will be set back to 1

Improve performance

①Sliding window and fast retransmission

If you use the acknowledgment response mechanism just talked about, the data segment is sent serially one by one, which is very inefficient.
insert image description here

So why doesn't the sender send the data all at once?
When the sender sends data, not only the receiving ability of the other party but also the congestion of the network must be considered.

Therefore, a sliding window is introduced to send data in parallel to improve efficiency.
The sliding window describes that the senderThe maximum amount of data that can be sent at one time without waiting for an ACK
insert image description here
When the confirmation response is received, the window will slide to the position of the serial number in the confirmation response, and the window size is dynamically changed

There are two situations in which packet loss occurs when sending data in a sliding window:
1. The data packet has arrived, and the ACK is lost
insert image description here

2. The packet sent is lost
insert image description here

Why do you need to retransmit overtime after fast retransmission?
Fast retransmission must receive the same acknowledgment 3 times, if not, it will not be triggered.
Therefore, time-out retransmission is a bottom-guarantee strategy to ensure that it can be retransmitted after loss.

②Delayed response

If the host receiving the data returns an ACK response immediately, the window returned at this time may be relatively small.
In fact, the upper layer takes the data faster than the response speed in the network.

If the receiving end waits for a while before responding, then more data in the buffer will be taken away by the upper layer. At this time, the return window size is larger
, the window is larger, the network throughput is larger, and the transmission efficiency is higher. , which improves the transmission efficiency while ensuring that the network is not congested

③Piggyback response

While responding to the confirmation, it also carries out the data to be sent. The
message can not only confirm, but also send data.

For example, waving your hands four times will combine the two middle ones into one message

byte stream

Byte stream:
The data sent by TCP is like an uninterrupted stream, so the problem of sticky packets must be prevented

Features:
When sending data to the buffer and reading the data in the buffer, it is not necessary to write or read the data at one time,
that is, you can write or read the data arbitrarily as long as the data is written or read.

sticky package problem

Sticky packets: The application layer reads more or less packets, which makes other packets unusable

So how to avoid the sticky package problem?
Clarify the boundary between two packages

There are mainly the following methods
1. Fixed-length message, read the fixed-length size
2. Special characters, reading the special characters means that the message is read
3. Self-describing size + fixed-length (UDP) or self-describing size + special character (HTTP)

For the UDP protocol, is there also a "sticky packet problem"?
UDP adopts a self-describing + fixed-length method. It indicates that the message boundary is
secondly oriented to UDP datagrams, either a complete UDP message is received, or it is not received. No. There will be a "half" situation

So why does the application layer protocol using TCP have the problem of sticky packets?
TCP is byte-oriented, and the packets written and read may be incomplete; and TCP packets do not have a field to identify the length of the packet, and there is no packet and packet boundary. The application layer needs to solve this problem by itself

TCP exceptions

1. Process termination: The process termination will release the file descriptor, and FIN can still be sent. It is no different from a normal shutdown.
2. Machine restart: It will kill all processes, the same as the process termination.
3. Machine power failure / network cable disconnection Open: The peer thinks that the connection is still there. Once the peer has a write operation and the peer finds that the connection is no longer there, it will perform a reset.
Even if there is no write operation, TCP itself has a built-in keep-alive timer (usually implemented by the application layer). ), if the other party has not sent a request for a long time, it will periodically ask whether the other party is still there. If the other party is not there, the connection will also be released, or by periodically reporting the safety of the other party to ensure that the other party still exists

the second parameter of listen

The Linux kernel protocol stack uses two queues for a tcp connection management:

  1. Semi-linked queue (used to hold requests in SYN_SENT and SYN_RECV states)
  2. Full connection queue (used to save connections that are in the established state, have been handshake three times, and can be directly taken away by accept)

The length of the full connection queue will be affected by the second parameter of listen. The length of the queue is the second parameter of listen + 1.
When the full connection queue is full, it will not be possible to continue to let the current connection state enter the established state.

Why have a queue?
When a connection is disconnected internally, a connection can be immediately selected from the connection queue for processing, thereby ensuring that the server is almost 100% loaded.
Why can't it be too long?
If it is too long, it will increase the maintenance cost of the server, and if it is too long, it will increase the waiting time and make the client give up waiting.

SYN flood attack:
insert image description here

Guess you like

Origin blog.csdn.net/hbbfvv1h/article/details/123567591