UDP protocol [transport layer protocol]


Friendship Links: Getting Started with Network Basics

This article is suitable for students who have already understood the basics of the network, that is, they have understood the content of the first two chapters of "Illustrated TCP/IP", and it is best to understand the principles of the HTTP/HTTPS protocol (at least understand what is a "protocol"). This article combines Contents of Chapter 6 of "Illustrating TCP/IP".

1. Transport layer

Application layer protocols such as HTTP/HTTPS run on top of transport layer protocols such as TCP/IP. Communication scenarios and requirements are different, corresponding to transport layer protocols with different characteristics. The layered implementation model makes each layer protocol only care about its own layer protocol. For it, it is like throwing data directly to the other party. In fact, the data is passed down through the protocol stack and delivered up to arrive. For the host of the other party, this process is transparent to the same layer protocol of the hosts of both communicating parties.

Therefore, the protocol type of the transport layer does not affect the data itself, while the application layer only cares about whether the data can be completely sent to the other party. The difference lies in the transmission efficiency and security.

The role of the transport layer:

The existence of the transport layer in the ISO model is of great significance. It is not only the highest layer responsible for data communication, but also an intermediate layer between the lower three layers of network communication and the upper three layers of information processing.

The main function of the transport layer is to provide end-to-end reliable and transparent data transmission services for the session layer and network layer, ensuring that data can be completely transmitted to the network layer. The transport layer can also provide connection-oriented or connectionless service types and different quality of service according to different application requirements.

The transport layer plays the role of a bridge in the OSI model , which makes the upper layer applications not need to care about the details of network communication, but only needs to call the service interface provided by the transport layer. At the same time, it also enables the lower-layer network to choose the appropriate protocol and technology according to the actual situation, without affecting the normal operation of the upper-layer application.

Image00216

There are two representative protocols in the transport layer, they are TCP and UDP.

  • TCP (Transmission Control Protocol): Provides reliable communication transmission and is connection-oriented.
  • UDP (User Datagram Protocol): It is often used to allow broadcast and detail control to be handed over to the communication transmission of the application, which is byte-oriented.

Regarding connection-oriented and byte-oriented streams, I believe you will have some experience through the discussion later.

1.1 TCP and UDP

"Reliable" is used as a neutral adjective when describing these two agreements,Is the nature of the agreement itself, not good or bad(Of course, the advantages and disadvantages themselves are relative, and this is often the case in real life). Although UDP is unreliable, it is efficient; although TCP is reliable, it has to pay for "reliability".

A simple example, for example, live broadcasts usually use the UDP protocol. When the network is bad, the picture will be blurred (packet loss occurs), but it does not affect the viewing fluency. If the TCP protocol is used, it is unreasonable to lose fluency for the sake of image quality in the live broadcast scenario.

The following is quoted from "Illustrated TCP/IP".

One might think that since TCP is a reliable transport protocol, it must be better than UDP. actually not. The advantages and disadvantages of TCP and UDP cannot be compared simply and absolutely. So, how should these two protocols be differentiated and used? Below, I will briefly explain this issue.

TCP is used when reliable delivery is necessary at the transport layer. Because it is connection-oriented and has mechanisms such as sequence control and retransmission control, it can provide reliable transmission for applications.

On the one hand, UDP is mainly used for communications or broadcast communications that have high requirements for high-speed transmission and real-time performance. Let's take an example of making a call over an IP phone. If TCP is used, if data is lost during transmission, it will be resent, but this cannot transmit the caller's voice smoothly, resulting in the failure of normal communication. With UDP, it does not perform retransmission processing. Therefore, there will be no problem of greatly delayed arrival of sound. Even if some data is lost, it will only affect a small part of the call (when transmitting animation or sound in real time, the packet loss of a small part of the network on the way may cause a short pause or even confusion in the picture or sound. But in actual use Among them, this little interference is not a big deal.)

Therefore, TCP and UDP should be used as needed according to the purpose of the application.

2. Port number

Summarize:

  • IP+port number identifies a process in a computer in the network (the process provides services at the application layer)
  • A port number is the address of a process in the web world

The "end" in what we usually call "peer-to-peer" refers to a certain process on the host of the communication participant.

2.1 Port number identifies the process

In the OSI model, the transport layer receives data from the application layer, then processes the data a little bit, and sends it to the network layer for transmission. For example, when a postman mails a package, he must fill in the sender and address as well as the receiver and address. The same is true for the data to be transmitted in the network. The transport layer needs to send the data sent from the source IP address to a computer in the network corresponding to the destination IP address through the assistance of various network protocols .

However, this is not enough, because most computers in the network are Client-Server (client-server) mode. That is to say, the data sent by the client is handed over to the server for processing, and the server also needs to return the processed information if necessary. The subject of sending, receiving and processing data is not the computer, but the process in the computer. There are multiple processes in the computer. Therefore, in the transport layer, it is also necessary to identify the identity of the process in a certain computer in a way similar to the IP address. That is, the port number .

The IP address marks the identity of a computer in the network, and the port number marks the identity of a process in the computer, so the IP+port number marks the identity of a process in a computer in the network.

The roles of client and server are relative:

Client means customer. In the computer network, it is the party that provides and uses the service, and is the initiator of the request. In the computer network, the server means the program or computer that provides the service, which means providing the service, and is the processing end of the request.

The port number identifies the process service:

Image00220

In general, addresses in data links and IP refer to MAC addresses and IP addresses, respectively. The former is used to identify different computers in the same link, and the latter is used to identify interconnected hosts and routers in a TCP/IP network. There is also this concept similar to an address in the transport layer, that is, the port number. Port numbers are used to identify different applications communicating on the same computer. Therefore, it is also called program address.

2.2 Communication identification through IP address, port number, and protocol number

However, communication between hosts cannot be identified only by source/destination IP and port numbers. This is because if a client initiates multiple requests to the same server at the same time, even if the source IP and destination IP can correctly establish communication between the two hosts, the subject of the communication is still the process, and the port number contained in each request information They are all the same, and the information that can identify the request must be added, otherwise the process on this port number will confuse multiple requests initiated by the same client. The information to be added is the protocol number .

As shown in the figure, the communication of ① and ② is carried out on two computers. They both have the same destination port number, 80. For example, if you open two web browsers and visit different pages on the two servers at the same time, there will be two communications similar to the previous ones between the browser and the server. In this case too, a strict distinction must be made between the two communications. Therefore, it can be distinguished according to the source port number.

In the figure below, the destination port numbers and source port numbers of ③ and ① are exactly the same, but their respective source IP addresses are different. In addition, there is another situation that is not listed in the above figure, that is, the IP address and port are all the same, but the protocol number (indicating that the upper layer is a number of TCP or UDP) is different. In this case, two different communications are also considered.

Image00221

Therefore, five pieces of information (quintuple) are usually used in TCP/IP or UDP/IP communication to identify a communication. They are **"Source IP Address", "Destination IP Address", "Protocol Number", "Source Port Number", "Destination Port Number"**. As long as one of the items is different, it is considered to be other communication. The server distinguishes the request to determine the source of the data through the source IP address in the quintuple, and distinguishes the service of the current host through the port number.

2.3 Agreement number

The protocol number is an 8-bit field, which exists in the header of the IP datagram, and is used to indicate which protocol is used for the data carried in the IP datagram , so that the IP layer of the destination host knows which process to hand over the data part to . For example, a protocol number of 6 indicates that the transport layer uses the TCP protocol, a protocol number of 17 indicates that the transport layer uses the UDP protocol, a protocol number of 1 indicates that the network layer uses the ICMP protocol, and so on.

The value range of the protocol number is 0 to 255, among which some commonly used protocol numbers have been assigned to specific protocols by IANA, and some unassigned or reserved protocol numbers can be defined by users.

Both the protocol number and the port number are used to realize the end-to-end data transmission service, but they belong to different levels. The protocol number is a concept of the network layer, and the port number is a concept of the transport layer. The difference between the protocol number and the port number is:

  • The port number is a concept of the transport layer and is used to distinguish different applications or processes on the same host. The protocol number is a concept of the network layer, which is used to distinguish different network layers or protocols used above the network layer.
  • The port number exists in the header of TCP and UDP packets, occupies 16 bits, and ranges from 1 to 65535. The protocol number exists in the header of the IP datagram, occupies 8 bits, and ranges from 0 to 255.
  • Both the port number and the protocol number are used to realize the end-to-end data transmission service, so that the data can be delivered to the destination application program or protocol correctly. Both the port number and the protocol number can be defined by the user, but usually follow some standards or conventions.

2.4 Range of port numbers

The port number occupies 16 bits, so the range it can represent is [ 0 , 2 16 − 1 ] [0,2^{16}-1][0,2161 ] , that is,[0, 65535] [0, 65535][0,65535 ] . Port numbers are divided in the following ways:

  • According to the range of port numbers, they can be divided into three categories:
    • Well-known ports (Well-Known Ports): range from 0 to 1023. These port numbers are generally assigned to some common services. For example, port 80 corresponds to HTTP service, port 21 corresponds to FTP service, port 25 corresponds to SMTP service, etc.
    • Registered Ports: The range is from 1024 to 49151. These port numbers are loosely bound to some services and can also be customized by users. For example, port 8080 is often used for Web proxy servers, and port 3306 is often used for MySQL database servers.
    • Dynamic or private ports (Dynamic/Private Ports): range from 49152 to 65535. These port numbers are generally not assigned to any service, but are dynamically assigned by the operating system to processes that require network communication.

[Note]

When actually communicating, the port number must be determined in advance. There are two ways to determine the port number:

  • The port number established by the standard is the "well-known port number", also known as the well-known port number (Well-Known Port Number). Applications should avoid using well-known port numbers for communications other than their intended purpose to avoid conflicts.

  • Timing allocation method , that is, "private port". At this time, it is necessary for the server to determine the listening port number, but it is not necessary for the client receiving the service to determine the port number.

  • The client application program does not need to set the port number by itself, and the operating system has full authority to assign it. The operating system can assign non-conflicting port numbers to each application. For example, each time a new port number is required, add 1 to the previously assigned number. In this way, the operating system can dynamically manage the port number.

    According to this mechanism of dynamically allocating port numbers, even if there are multiple TCP connections initiated by the same client program, the five part numbers identifying these communication connections will not all be the same.

  • According to the type of agreement, it can be divided into two categories:
    • TCP port: that is, the transmission control protocol port, which needs to establish a connection between the client and the server to provide reliable data transmission services.
    • UDP port: the User Datagram Protocol port, which does not need to establish a connection between the client and the server, and provides unreliable data transmission services.

In general, port numbers other than 0~1023 can be assigned to client programs.

/etc/servicesCommon well-known port numbers can be viewed at :
image-20230626192346436

Among them, we can see that our commonly used ssh protocol and telnet protocol are all well-known protocols. Each row corresponds to a service, and each column is "service name", "port used", "protocol name" and "alias".

2.5 Common commands

netstat

netstat is used to view network system status information in Linux.

For example, to display quintuple information through the netstat -nltp command:

image-20230626185542615

in:

  • Local Address: source IP address and source port number.
  • Foreign Address: destination IP address and destination port number.
  • Proto: protocol type.

Common options for this command:

  • n: Refuse to display aliases, and convert all numbers that can be displayed into numbers.
  • l: Only list services in the LISTEN state.
  • p: Displays the name of the program that established the related link.
  • t(TCP): Only display tcp related options.
  • u(UDP): Only display udp related options.
  • a(ALL): Displays all options, and does not display LISTEN related by default.

Therefore, if you want to view TCP-related network information, you can use nltpthe option to view UCP-related network information, use nlupthe option:
image-20230626193013302

Remove lthe option, which means to view network services other than the LISTEN state.

iostat

iostat is used to monitor system input and output devices and CPU usage.

It reports disk activity statistics and also reports CPU usage. Like vmstat, iostat also has a shortcoming, that is, it cannot conduct an in-depth analysis of a certain process, but only analyzes the overall situation of the system.

Common options:

  • c: Display CPU usage.
  • d: Display disk usage.
  • N: Display disk array (LVM) information.
  • n: Displays NFS usage.
  • k: Displayed in KB.
  • m: Displayed in M ​​units.
  • t: Reports the number of characters and CPU information read and written to the terminal per second.
  • V: Display version information.
  • x: Display detailed information.
  • p: Displays the status of disk partitions.

image-20230626194247382

CPU attribute value:

  • %user: The percentage of time the CPU was in user mode.
  • %nice: The percentage of time the CPU was in user mode with a NICE value.
  • %system: The percentage of time the CPU is in system mode.
  • %iowait: The percentage of time the CPU waits for input and output to complete.
  • %steal: Percentage of time a virtual CPU spends unseen waiting while the hypervisor maintains another virtual processor.
  • %idle: CPU idle time percentage.

pidof

pidof is used to find the process number ID number of the process with the specified name.

image-20230626221810831

In this way, the PID of the specified process name can be found directly through a command, which is simpler than psthe and command.grep

This command is usually used in conjunction with the kill command through the xargs tool and pipeline , for example:

pidof [NAME] | xargs kill -9

[Note]

The xargs command is a filter that passes arguments to other commands, and can process pipes or stdin and convert them into command arguments for specific commands. In this example, the return value of pidof is piped to xargs, which then serves as a command-line argument to the kill command.

2.6 Doubtful

Can a process bind (Bind) multiple port numbers?

Can. A process can create multiple socket objects and bind different port numbers to communicate with different clients.

Can a port number be bound by multiple processes?

Can't. At the same time, a port number can only be bound by one process. If two processes try to bind to the same port number, a conflict will occur and the binding will fail. However, if a process binds a port number first, and then forks a child process, then multiple processes can share a port number. In addition, TCP and UDP can be bound to a port number at the same time, because they are different transmission protocols, and the receiver is judged according to the five-tuple (transmission protocol, source IP, destination IP, source port, destination port) when receiving data.

3. UDP protocol

3.1 Status

In the OSI seven-layer model, the HTTP protocol belongs to the application layer, and it runs on top of TCP/IP, which is the transport layer. UDP and TCP are typical protocols in the transport layer. From their relative positions in the model, HTTP is the closest to the user, and it is only responsible for providing network services. UDP and TCP are equivalent to a "tool man", whose role is to The data is transferred to the target process in the target host.

In fact, how data is transmitted is maintained by the operating system, that is to say, the transport layer belongs to the operating system kernel.

image-20230626230633101

The figure above shows the status of each level in three common models.

When using Socket for network programming, it is actually a software layer between the application layer and the transport layer, that is, a system call between the operating system and the user. When we use it, we don't care about the implementation of Socket itself, because it is maintained by the operating system, so network programming is also system programming. (So ​​learning the OS is the most important thing)

3.2 Separation and delivery of headers

The header and the payload (effective data) together make up the data packet transmitted in the network. The role of the header is:

  • Store some information required by this layer protocol, such as source address, destination address, length, type, checksum, etc., which can help this layer protocol realize its functions, such as addressing, routing, error control, flow control, etc.
  • Identify the type of protocol at this layer. For example, there is a protocol field in the IP header, which is used to indicate whether the upper layer protocol is TCP or UDP; there is a port number field in the TCP header, which is used to indicate whether the application layer protocol is HTTP or FTP. This information can facilitate interaction and sharing between different layers.

In the process of encapsulating data through the protocol stack from top to bottom, each layer of the protocol will add corresponding header information; The header information is extracted.

image-20230626234449778

From the perspective of the model, the data is generated by the application (process) of the application layer, and then delivered from the application layer to the transport layer and sent to the peer host. This process mainly consists of two steps:

  • Data separation (encapsulation) means that when each layer protocol processes the data transmitted by the upper layer, it attaches the header information necessary for the current layer protocol for transmission at the next layer.
  • Data delivery means that when each layer protocol receives the data delivered by the lower layer, it removes the header information of the current layer protocol, and then uploads it to the upper layer for processing.

This process basically needs to be executed for every agreement, so these two problems must be solved:

  1. Low-level -> High-level: How to separate (encapsulate)?
  2. High Level -> Low Level: How to Deliver?

Core point of view: The process of data transmission in the network is actually a series of data copy processes

Workaround for UDP:

  • Separation: It is stipulated that the length of the header is fixed at 8 bytes.
  • Delivery: Since the port number binds the process, the process of the upper layer (application layer) is found according to the port number contained in the header. In Socket network programming, uint_16the port number of type is used because the port number is 16 bits.

No matter what kind of host (regardless of big or small, operating system), you can know which part of the message is the port number, which is determined by the protocol, and the operating system on different machines is the software layer for maintaining the protocol, so use the system The rules of the operating system must be followed when calling.

The HTTP protocol uses a special symbol blank line \r\nto divide the header and payload. No matter what method is used to distinguish the various parts of the message, this rule must be known to the participants of the communication, which is also the requirement of the protocol itself.

3.3 Format of UDP message

The UDP header (header) consists of source port number, destination port number, packet length and checksum. In the figure below, except for the data part, the rest is the header.

image-20230627085112803
  • 16-bit source port number (Source Port): Indicates the source port number of the data, and the field length is 16 bits. This field is optional, and sometimes the source port number may not be set. When there is no source port number, the value of this field is set to 0.

  • 16-bit destination port number (Destination Port): Indicates the data destination port.

  • 16-bit packet length (Length): This field stores the sum of the length of the UDP header and the length of the data, that is, the length of the entire UDP datagram.

  • 16-bit checksum (Checksum): The checksum is designed to provide reliable UDP headers and data. If the checksum fails, the message will be discarded.

It is worth noting that sometimes the sent message does not contain the data part, but only the header.

When calculating the checksum, as shown in the figure below, it is appended before the UDP pseudo-header and UDP datagram. Increase the overall length by a factor of 16 by adding a "0" in the last digit. At this time, the checksum field of the UDP header is set to "0". Then perform 1's complement in units of 16 bits (usually 2's complement form is commonly used in computer integer calculations. The reason why 1's complement form is used in checksum calculation is because even if there is one bit Overflow will return to the first bit, and will not cause information loss. And in this form, 0 can be expressed in two ways, so there are two advantages of using 0 to represent two different meanings), and the resulting 1's complement and write to the checksum field.

image-20230627091447716

Both the source IP address and the destination IP address are 32-bit fields when they are IPv4 addresses, and both are 128-bit fields when they are IPv6 addresses. This article is all about IPv4 addresses.

Padding is to supplement the number of digits, generally 0 is filled.

How does UDP separate header and payload?

Each field in the UDP header is 16 bits, so the sum of the 4 fields is fixed at 8 bytes. UDP only needs to process the first 8 bytes, and the rest is the payload.

How does UDP know which protocol to deliver the payload to the application layer?

The mapping relationship between the port number and the process ID is maintained by hash in the kernel. In this way, when the transport layer receives a UDP datagram, it can quickly find the corresponding process ID through the destination port number, and then deliver the datagram to the corresponding application layer process.

The mapping relationship between the port number and the process ID is automatically established and updated by the kernel when creating a process or opening a socket.

3.4 UDP data encapsulation and demultiplexing

The header is a kind of structured data. As an operating system, it needs to be compatible with various machine compatibility issues, so various situations should be considered to save space (even if only a few bytes) and improve efficiency – the operating system should be designed as much as possible efficient.

In fact, UDP headers use structencapsulated bitfields. In the Linux kernel, it is defined as follows:

struct udphdr {
    
    
    __be16 source; 
    __be16 dest; 
    __be16 len; 
    union {
    
     
        __sum16 check; 
        struct {
    
     
            __wsum csum_tcpudp_magic; 
            __u16 len; 
        }; 
    }; 
} attribute ((packed));

[Note]

A bit field is a special structure that specifies the number of bits each member occupies, rather than byte space. Each member of the bit segment must be of type int, signed int, unsigned int or char, followed by a colon and the number of digits, indicating how many bits the member occupies.

The main difference between structure and bit segment is as follows:

  • Structures can contain members of any type, while bit fields can only contain members of a limited number of types.
  • Members of structures occupy full byte space, while members of bit fields occupy part or all of bit space.
  • The members of the structure are stored according to the alignment rules of the compiler, while the members of the bit segment are stored according to the storage method of the compiler. Different compilers may have different implementation schemes.
  • Structs can get the addresses of their members, but bit fields cannot get the addresses of their members.

Structures and bit segments have their own advantages and disadvantages. Structures can improve the efficiency of data access, while bit segments can save data storage space.

data encapsulation

  1. The application layer delivers the data down to the transport layer, where a header field is created and various attributes are filled. (The transport layer is maintained by the operating system)
  2. The operating system will open up a memory space in the kernel, and then copy the header bit segment and payload to this memory space, thus generating a UDP message.

data sharing

  1. After the transport layer obtains the message submitted by the lower layer, it first reads and removes the first 8 bytes of the header, and extracts the destination port number of the message.
  2. Deliver the payload up to the application layer process corresponding to the destination port number.

How to remove the first 8 bytes of the header and extract the port number?

If Linux is implemented in C language, it can be extracted by using pointer type coercion.

In fact, when the machine is processing data, there may be a steady stream of messages being sent to the machine, which requires a container to store these unprocessed data, which is called "receiving buffer". Correspondingly, there is also a "send buffer".

3.5 The characteristics and purpose of UDP

UDP does not provide a complex control mechanism, and uses IP to provide connectionless communication services. And it is the moment the data sent by the application is received,send immediately as isto a mechanism on the network.

Even in the case of network congestion, UDP cannot perform flow control and other behaviors to avoid network congestion. In addition, even if a packet is lost during transmission, UDP is not responsible for retransmission. There is even no corrective function when packets arrive out of order. If these detailed controls are required, they have to be handled by applications using UDP (since there is no mechanism in the Internet that can control the overall situation, when sending large amounts of data over the Internet, each node will strive not to cause trouble for other users. For Therefore, congestion control becomes a necessary function (congestion control is often not because of its own needs). However, when congestion control is not desired, it is necessary to use TCP.). UDP is somewhat similar to the mechanism that users listen to what they say, but users need to fully consider the type of upper layer protocol and make corresponding applications. Thus, it can also be said that UDP "does what those users who make the program say".

The above description is quoted from "Graphic TCP/IP" 6.3

That is to say, when UDP transmits data, it does not fragment the data according to the sending and receiving capabilities of both parties or the network status, but directly sends a whole data, and the data sent by the client is sent to the peer process by UDP as it is. . This is called "datagram-oriented", which is the name of UDP: User Datagram Protocol.

What is "connectionless-oriented"?

UDP only cares whether the data is sent to the network, and does not care whether the specified process of the peer host actually receives the data. The corresponding "connection-oriented", for example, the process of establishing a connection in TCP is the process of determining whether the data communication channel is established. Even if the channel is abnormal during the data transmission process or the sending and receiving capabilities of both parties are limited, TCP has corresponding measures. To ensure that the peer process can receive the data completely.

3.6 Buffer

As mentioned above, UDP may also receive data while sending data, and vice versa.

TCP is also similar.

  • UDP's receive buffer is used to cache received datagrams until the application program reads them. If the application does not read in time, the incoming datagram will be discarded after the receive buffer is full.
  • UDP has no send buffer, only a limit on the size of the send buffer, indicating the maximum length of each UDP datagram. If the application sends a datagram larger than this limit, an error will be returned.

Why doesn't UDP send buffer?

[Note] Part of the content below will be elaborated in TCP.

  • UDP is an unreliable transmission protocol. It does not need to guarantee the reliability, order and integrity of data like TCP, so there is no need to maintain a sending buffer at the sending end to store sent but unacknowledged data, nor Congestion control and flow control are required.
  • UDP is a connectionless transmission protocol. It does not need to establish and maintain a connection state, nor does it need to track the window size and receiving capacity of the other party, so it does not need to maintain a sending buffer at the sending end to adapt to the receiving rate of the other party, nor Perform window control and sliding window mechanism.
  • UDP is a datagram-oriented transmission protocol. It sends a complete datagram each time without fragmenting or merging the data, so there is no need to maintain a sending buffer at the sending end to store the fragmented data or merged data. data without requiring a reorganization or grouping mechanism.

When UDP sends data, it simply copies the data into a kernel buffer, then enqueues the datagram or all its fragments to the link layer's output queue as soon as possible. If the output queue does not have enough space for the datagram or a fragment thereof, the kernel will usually return an ENOBUFS error to the application process. When the data is sent from the link layer, the kernel will delete the data in the kernel buffer. The successful return of UDP only means that the datagram written by the user or all fragments have been added to the output queue of the link layer, and it does not mean that the other party has received the data.

The role of the buffer

Network I/O interfaces like send()and recvfrom()/ recv()or are essentially "copy functions".

  • The send() function is to copy the data in the application layer buffer to the kernel send buffer, and then the kernel is responsible for sending the data to the network. If the length of the sent data is greater than the size of the send buffer, or greater than the remaining size of the send buffer, the send() function will send in frames, and divide the frames into the size that the buffer can receive. At the UDP layer, there is no real send buffer, only receive buffer.

  • The recvf() function copies the data in the kernel receive buffer to the application layer buffer, and returns the number of bytes read. If the length of the received data is greater than the size of the application layer buffer, or greater than the remaining size of the receiving buffer, the recv() function will receive in frames, and divide the frames into the size that the application layer buffer can hold. At the TCP layer, if the receiving buffer is full, it will notify the other party to close the receiving window to implement flow control; at the UDP layer, if the receiving buffer is full, it will discard new datagrams.

recv() is a general-purpose function for receiving data, which can be used for both TCP and UDP sockets. recvfrom() is a function specially used for receiving data of UDP socket, it can return the address information of the datagram sender. The functions of the two are basically the same, except that recvfrom() has two more parameters, which are used to specify and return the address information of the other party.

The only difference between them is whether they need to return the address information of the other party.

buffer size

The UDP packet length is represented by a 16-bit field, so the theoretical maximum is 65535 bytes (64KB). However, since UDP is based on the IP protocol, and the IP protocol has a maximum transmission unit (MTU) limitation, generally 1500 bytes. Therefore, if the UDP packet length exceeds the MTU, fragmentation (fragmentation) needs to be performed at the IP layer to divide the datagram into several pieces so that each piece is smaller than the MTU.

MTU is the abbreviation of Maximum Transmission Unit (Maximum Transmission Unit), which refers to the maximum packet size that the network can transmit, in bytes. MTU is the concept of the data link layer, which refers to the limitation of the data link layer on the length of the data frame. Different types of networks have different default MTU values, for example, the default MTU value of Ethernet is 1500 bytes.

Simply put, UDP is a non-connection-oriented protocol. It only cares about whether the data is sent to the network, but not how the data is sent, so even if the size of the data (including the header) exceeds the capacity that UDP can send at one time (64KB) , it will not be fragmented, so the logic of fragmentation and reorganization needs to be implemented at the application layer, that is, manually by the programmer. But doing so increases processing overhead and the possibility of errors.

This also corresponds to the "User" in the name of UDP, which does not just refer to "Internet users", but is more equivalent to programmers. In other words, it is understandable to think that UDP transmits datagrams according to the programmer's programming ideas (in contrast, because TCP has various control mechanisms, it may not follow the programmer's instructions when sending data. programming ideas).

And TCP will maintain the logic of data fragmentation and reassembly at the transport layer, that is, the operating system.

More content about the buffer will be supplemented in the TCP topic.

References

  • "Graphic TCP/IP"

Guess you like

Origin blog.csdn.net/m0_63312733/article/details/131524817