"Network programming" transport layer protocol _ UDP protocol learning _ and in-depth understanding of principles

The content of the "Preface" article is roughly about the transport layer protocol and the explanation of the UDP protocol.

"Belonging column" network programming

"Homepage link" personal homepage

"Author" Mr. Maple Leaf (fy)

UDP

1. Transport layer

Ordinary users of the HTTP protocol think that requests and responses are sent directly to the network. However, the actual application layer needs to hand over the data to the transport layer first, and then the transport layer will further process the data and then continue to deliver the data downwards. This process runs through the entire network protocol stack, and finally the data can be sent to the network.

The transport layer is responsible for providing reliable data transmission services in the network. It mainly solves the communication problem between hosts.

Common transport layer protocols include TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). The transport layer is already part of the operating system kernel. The following
insert image description here
is the UDP protocol, and TCP is in the next article.

2. UDP protocol

2.1 Talking about the port number again

revisit the concept

A port number is an identifier used in the transport layer to identify different applications or services. It is a 16-bit integer (2 bytes) ranging from 0 to 65535

  • The combination of the port number is composed of the IP address and the port number to form a socket (Socket), which is used to uniquely identify the communication endpoint in the network.
  • In communication at the transport layer, the source host uses the source port number and the destination host uses the destination port number in order to correctly transfer the data to the corresponding application or service.
  • By using different port numbers, the transport layer can enable multiple applications to communicate at the same time, ensuring the correct transmission and reception of data.

The role of the port number

  • The port number (Port) identifies different applications for network communication on a host, that is, identifies the uniqueness of the process on the host
  • When the data obtained from the network is delivered upward, the destination port number corresponding to the data will be extracted at the transport layer, and then it will be determined which service process on the current host the data should be delivered to.
  • That is, fields related to ports will be included in the header of the transport layer protocol

insert image description here
These concepts have been discussed in the socket chapter, so I won't repeat them

Quintuple

In the TCP/IP protocol, a communication is identified by a five-tuple such as "source IP address", "source port number", "destination IP address", "destination port number", and "protocol number".

insert image description here

For example, there are multiple client hosts accessing the server at the same time. There may be one client process or multiple client processes on these client hosts, and they are all accessing the same server.

And this server identifies a communication through "source IP address", "source port number", "destination IP address", "destination port number", and "protocol number".
insert image description here

2.2.1 Port number range division

The length of the port number is 16 bits, so the range of port numbers is 0 ~ 65535:

  • Known port number ( Well-known Ports) refers to the port number assigned to a specific service or protocol, ranging from 0 to 1023
  • The dynamic port number ( Dynamic Ports) refers to the port number temporarily assigned to the application during use, ranging from 1024 to 65535.
  • 0 ~ 1023: Well-known (known) port number. For example, widely used application layer protocols such as HTTP, FTP, and SSH have fixed port numbers.
  • 1024 ~ 65535: The port number dynamically allocated by the operating system. The port number of the client program is assigned from this range by the operating system.

2.2.2 Recognizing well-known port numbers

Some servers are very commonly used. For the convenience of use, people agree that some commonly used servers use the following fixed port numbers:

  • ssh server, using 22port.
  • ftp server, using 21port.
  • telnet server, using 23port.
  • http server, using 80port.
  • https server, using 443port.

View well-known port numbers:

vim /etc/services

insert image description here
When we write a program to use port numbers, we should avoid these well-known port numbers

2.2.3 Notes on port numbers

Can a process bind multiple port numbers?

  • A process can bind multiple port numbers. This does not conflict with "the port number must uniquely identify a process"
  • In some cases, a process may need to provide multiple services or protocols at the same time, or need to listen to multiple ports at the same time to process different types of data

Can a port number be bound by multiple processes?

  • A port number can usually only be bound by one process (in all 99 cases). Each port number can only be used by one process at a specific point in time to ensure correct transmission and reception of data
  • If you bind a port number that has already been bound, there will be a problem of binding failure

2.2.4 netstat command and pidof command

netstat command

netstatis a command to display information about network connections, routing tables, and network interfaces

netstatEnglish full name: network statisticsNetwork Statistics

Common options:

  • -a:all (display all connections and listening ports)
  • -t:tcp (display only TCP connections)
  • -u:udp (show only UDP connections)
  • -n: numeric (displays the IP address and port number in numeric form)
  • -p:program (display process information associated with the connection)
  • -l:listen (only list the service status in Listening)
  • -r: route (display routing table information)
  • -s:statistics (display network statistics)

command demo

When viewing TCP-related network information, generally choose to use -nltpthe combination option.

netstat -nltp

insert image description here

When viewing UDP-related network information, generally choose to use -lnupthe combination option.

netstat -lunp

insert image description here

pidof command

pidofcommand is a command to find running processes. It can find the process ID (PID) of the matching process through the process name, which is more convenient

Syntax : pidof [process name]
Function : View the process id through the process name

command demo

insert image description here
can be used in combination

pidof test | xargs kill -9

Note: xargsis a utility for building and executing command lines. It reads data from standard input and passes it as an argument to the specified command
insert image description here

2.2 UDP protocol format

The UDP protocol format is as follows:
insert image description here
the first 8 bytes are the header of UDP, and the header includes:

  • 16-bit source port number: indicates where the data comes from.
  • 16-bit destination port number: Indicates where the data is going.
  • 16-bit UDP length: Indicates the length of the entire datagram (UDP header + UDP data), that is, the maximum length of a UDP message 64KB.
  • 16-bit UDP checksum: If the checksum of the UDP packet is wrong, the packet will be discarded directly.

The data part (user data) is optional, the data part is the payload

How does UDP separate the header from the payload?

The size of the UDP header has been specified, that is, UDP uses a fixed-length header, and when UDP reads the message, the rest after reading the first 8 bytes is the payload.

How does UDP decide which protocol to deliver the payload to the upper layer?

  • UDP finds the corresponding application layer process through the destination port number in the header, which is determined by the destination port number.
  • When a UDP datagram arrives, the operating system will find the corresponding application or service according to the destination port number, and deliver the datagram to it for processing

Understanding protocol headers

The Linux kernel is written in C language, and the UDP/TCP protocol is in the kernel, so the so-called header is a structured data object

For example, UDP, the left side of the figure below is the bit segment writing method
insert image description here

Data encapsulation:

Taking UDP as an example, when the application layer delivers data to the transport layer, the transport layer will create a UDP header and fill in various fields in the header, including information such as source port number and destination port number. Then, the operating system will open up a space in the kernel to copy the UDP header and payload (that is, application layer data) together to form a UDP message
insert image description here

share

Taking UDP as an example, when the transport layer obtains a message from the lower layer, it will read the first 8 bytes of the message and extract the corresponding destination port number. Find the corresponding upper application layer process through the destination port number, and then deliver the remaining payload up to the application layer process
insert image description here

2.3 Features of UDP

The process of UDP transmission is similar to sending a letter, and its characteristics are as follows:

  • No connection: know the IP and port number of the peer, and then directly transmit data without establishing a connection.
  • Unreliable: There is no confirmation mechanism and no retransmission mechanism; if the segment cannot be sent to the other party due to network failure, the UDP protocol layer will not return any error information to the application layer.
  • Datagram-oriented: It is not possible to flexibly control the number and quantity of reading and writing data.

No connection, already reflected in the socket code (compared with TCP), no explanation; unreliable, you can understand after learning TCP

Let's explain the datagram

Datagram Oriented

  • No matter how long the application layer delivers to UDP, UDP will send it as it is, without splitting or merging. This is called datagram-oriented.
  • It can be imagined as a courier: your friend sends you 1 courier, you can only receive 1 courier, you can’t only receive 0.5 courier, and you can’t receive 2 couriers, you can only receive 1, that is, send as much as you want How much to charge. Compare read data
  • The data sent by UDP cannot be too large, and the maximum length of a UDP message is 64KB (UDP header + UDP data)
  • If it exceeds 64KB, it must be split, and each small data split must be smaller than 64KB. This is the work that the application layer writes its own code. If it exceeds, it will not send
  • For example, if UDP is used to transmit 100 bytes of data, the sender calls the send function once to send 100 bytes, then the receiver must also call the corresponding receive function once to receive 100 bytes; if the sender calls the send function ten times, Then the receiving end must also call the corresponding receiving function ten times, that is, the UDP protocol, the number of sending functions: the number of receiving functions = 1: 1

2.4 UDP buffer

  • UDP has no real send buffer. Calling sendto will be directly handed over to the kernel, and the kernel will pass the data to the network layer protocol for subsequent transmission actions.
  • UDP has a receive buffer. But this receiving buffer cannot guarantee that the sequence of received UDP packets is consistent with the sequence of sending UDP packets; if the buffer is full, the arriving UDP data will be discarded.

The buffer takes TCP as an example

Send and receive data:

  • The send function (write, etc.) has its own application layer send buffer in its own application layer. Calling the send function actually copies the data to the send buffer of the transport layer.
  • Whether the data is sent to the network is determined by the TCP protocol, so the TCP protocol is called the transmission control protocol, keyword: transmission control
  • The receiving function (read, etc.) also has its own application layer receiving buffer in its own application layer. Calling the receiving function actually copies the data in the receiving buffer of the transport layer to its own application layer receiving buffer.
  • So read、write、send、recv、sendtothe etc. function is essentially a copy function
  • One party is sending, and the other is also sending, and the two parties will not affect at all, because they have paired buffers, one is responsible for sending, and the other is responsible for receiving, so TCP is full- duplex
  • This has the idea of ​​a production and consumption model for the buffer zone, and has the advantages of this model

insert image description here

Why UDP has no send buffer

  • Because it is not required, the design goal of the UDP protocol is to provide a simple, connectionless communication method.
  • The original intention of the UDP protocol is to achieve efficient data transmission and minimize the overhead of the protocol itself.
  • The UDP protocol does not provide reliability and flow control mechanisms, so there is no need to send buffers
  • Calling sendto will directly hand over the data to the kernel, and the kernel will pass the data to the network layer protocol for subsequent transmission actions.

Why does UDP have a receive buffer?

If UDP does not have a receiving buffer, then the upper layer is required to read the message obtained by UDP in time. If a message is not read in UDP, then the message data obtained by UDP from the bottom layer will be forced to throw away.

Note: The UDP receiving buffer is full, and the next message will be discarded directly

UDP needs a receive buffer to deal with unreliability and uncertainty in network transmission.

Because UDP itself is unreliable, the receiver may face the following situations:

  • Data arrives faster than the application can process it: If the receiving application cannot process the incoming datagram in time, the receive buffer can temporarily store the data to avoid data loss.
  • The order in which data arrives may differ from the order in which they were sent: due to the connectionless nature of UDP, datagrams may arrive at the receiver in a different order. The receive buffer can temporarily store these out-of-order datagrams, and sort and assemble them according to the needs of the application.
  • Network congestion or packet loss: During network transmission, packet loss or congestion may occur. The receive buffer can buffer part of the data so that the lost data can be re-received after the network returns to normal.

Although UDP does not have a send buffer, the UDP socket can read and write, so it is also full-duplex

2.5 Precautions for UDP

already discussed above

  • It should be noted that the maximum length of UDP in the UDP protocol header is 16 bits, so the maximum length of a UDP message is 64K (including the size of the UDP header).
  • However, 64K is a very small number in today's Internet environment. If the data to be transmitted exceeds 64K, it is necessary to manually subpackage at the application layer, send it multiple times, and manually assemble it at the receiving end.

2.6 UDP-based application layer protocol

  • NFS: Network File System.
  • TFTP: Trivial File Transfer Protocol.
  • DHCP: Dynamic Host Configuration Protocol.
  • BOOTP: Boot protocol (for diskless device boot).
  • DNS: Domain Name Resolution Protocol.

--------------------- END ----------------------

「 作者 」 枫叶先生
「 更新 」 2023.7.18
「 声明 」 余之才疏学浅,故所撰文疏漏难免,
          或有谬误或不准确之处,敬请读者批评指正。

Guess you like

Origin blog.csdn.net/m0_64280701/article/details/131724799