Understanding from the perspective of a programmer: UDP protocol

Hello, everyone~ I am your old friend: Protecting Xiao Zhou ღ , this issue brings you the UDP protocol in the basic principles of the network, from what protocol? , understand the UDP protocol, the UDP message format, the strategy of UDP when transferring large files, and the workflow of the UDP protocol. It is not enough for us programmers to not know network knowledge. Make sure you don’t take a look~~
Stay tuned for more highlights: Protect Xiaozhou ღ *★,°*:.☆( ̄▽ ̄)/$:*.°★*'

 1. What is an agreement

In order for data to be transmitted on the network (from source to destination), participants in network communication must follow the same rules, such as: how to establish a connection, how to transmit, how to parse information from each other, and so on. Only by following this convention can computers communicate with each other. Such a rule is called a protocol, and it is ultimately reflected in the format of data packets transmitted on the network.

The protocol stipulates the call relationship between layers and layers. The upper-layer protocol calls the lower-layer protocol, and the lower-layer protocol provides support for the upper-layer protocol, and cannot be called across layers.

View details: Protocol in the eyes of programmers: TCP/IP five-layer network model_Protect Xiao Zhouღ's Blog-CSDN Blog

The real network protocol uses the TCP/IP five-layer network model , which is also the most widely used network model at present.

The UDP protocol described this time is a commonly used protocol on the transport layer.

Another more powerful protocol TCP, please look forward to the next blog~


Second, understand the UDP protocol

UDP (User Datagram Protocol) is a connectionless transport layer protocol. It does not guarantee the reliability of data transmission, but it has the characteristics of fast transmission speed and low overhead. The UDP protocol is mainly used in scenarios that require fast data transmission and do not require high data reliability, such as online games, transmission of distributed system information (short-distance transmission UDP transmission efficiency is very high).

2.1 Features of UDP protocol:

  1. Connectionless: The UDP protocol does not need to establish a connection before transmitting data, nor does it need to maintain a connection state. It can be regarded as a one-brain transmission, so the transmission speed is fast.

  2. Unreliable: The UDP protocol does not guarantee the reliability of data transmission, because it does not provide retransmission mechanism, exception handling and other functions. If the data is lost or wrong during transmission, the UDP protocol will not perform any processing.

  3. Datagram-oriented: The datagram is the basic unit of data transmitted over the network, including a header (header) and the data itself, where the header describes the destination of the data and the relationship with the "load" data. The header of the UDP protocol is only 8 bytes, which is much smaller than the header of the TCP protocol, and the overhead is relatively small. This knowledge point is introduced in detail below.

  4. Full-duplex communication: Communication allows data to be transmitted in both directions at the same time, such as a phone call, where both parties can speak at the same time.

  5. Support broadcast and multicast: The UDP protocol supports broadcast and multicast, and can send data to multiple hosts at the same time.

Knock on the blackboard interview question: What is the difference between TCP and UDP?

TCP is a connection-oriented, full-duplex communication protocol that uses byte stream transmission and reliable data transmission.

UDP is a connectionless full-duplex communication protocol that uses datagram transmission and unreliable data transmission.


2.2 Message format of UDP protocol

An important part of learning the protocol is to understand the format of the protocol message, and how a protocol specifically organizes data.

 The UDP message body is divided into two parts: UDP header (8 bytes of data) + UDP data/UDP payload

  • Load: It is relatively easy to understand, and the datagrams of the application layer are stored in it.

  • UDP header : source port, destination port, packet length, checksum, composition, each part occupies two bytes, and the data range that can be described is [0, 65535].

Parse the message:

  1. Source port : Indicates where the data comes from . The port number can be regarded as the identification of a process (the information sent by the software). After all, the transport layer provides services to the application layer.

  2. Destination port : Indicates where the data is going . After the datagram arrives at the designated host, the datagram is sent to that software for analysis (the process, the communication parties agree on a unified protocol format, and then the data can be parsed each other).

  3. After the datagram of the transport layer enters the network layer, a network layer header will be added. The source IP and destination IP will be described in the network layer header. This layer of protocol can be used to find the target host in a huge network environment.

  4. Message length: describes the byte size of a UDP datagram , because the length only occupies 2 bytes, so the data range it can describe is [0, 65535], so the maximum length of a UDP datagram is 65536 characters section = 64KB;

  5. Checksum : Network transmission is not completely stable, there are various ways of information transmission, and accidents will inevitably occur, for example: bad weather, strong magnetic field interference, transmission medium, signal attenuation and other reasons, so the purpose of checksum is to It is used to judge whether the currently transmitted data is wrong. If the checksum is wrong, the transmitted data must be wrong. Even if the checksum is correct, the data has a certain probability of being wrong. The checksum verification method is usually Use the data content (the data in the text payload) as a parameter to calculate the algorithm. When the receiver gets the UDP datagram, take out the data in the payload and substitute it into the checksum algorithm, and then compare the result with the checksum to see if it is the same. It can be judged whether the data transmission is accurate or not. (The premise is that the input content is the same, and the checksum result obtained according to the unified algorithm is also the same). Only the receiving end can judge whether the data is reliable. The UDP protocol does not provide a retransmission mechanism. If it is wrong, it is wrong, just transmit it.

  6. UDP datagram body\payload: It contains the information of the application layer datagram, UDP packet length (maximum value) - UDP header can get the payload size, the maximum packet length can describe 64 bytes, and the entire UDP header occupies 8 bytes, so the maximum payload storage space (the maximum application layer information that a UDP datagram can carry) is 64 - 8 = 56 bytes.

Port number details:

The so-called port can be regarded as the house number of the application . The client can find the corresponding destination through the IP address, but the destination has many application ports, and each application corresponds to a port number;

For example, my computer qq sent you a message. This message is sent to your computer through the network. After entering your computer, how can this message know which application program (process) to send it to? WeChat is also owned by Tencent, and I haven’t seen messages sent by QQ. The key point is that the application will bind a port number when it starts (the port number can be used randomly within a certain range), and the port number in a host The number cannot be repeated, so after the message enters the host, it will send the information to the specified application program according to the port number, and the application program will analyze the information according to the datagram protocol of the application layer (this can be the protocol agreed by the communication parties, or qq's own protocol) .

The port number only occupies two bytes, and the data range that can be described is [0, 65535], that is, there are only so many port numbers that we can use. Port numbers less than 1024 are called "well-known port numbers", which are It is provided for some well-known servers, for example: the port number of http server is 80, the port number of ssh is 22, and the port number of ftp server is 21. So we should try to avoid using this part of the port in the program design.


2.3 Strategies for transferring large files using the UDP protocol

As mentioned above, a UDP datagram can transmit a maximum of 64 bytes. After removing the header, the effective message payload is only 56 bytes. That is to say, a UDP datagram can transmit a maximum of 56 bytes of content. It’s too small, how many megabytes a song has (M).

In the face of large file transfers, UDP splits a large datagram into multiple parts and uses multiple udp datagrams for transmission. As the receiver (application layer), it needs to communicate with the sender (application layer) ) Agreed on the analysis protocol of the datagram.

For example: I qq sent a large piece of small composition (application layer datagram) to my girlfriend, which is several hundred KB, and is transmitted using UDP protocol. One UDP datagram is definitely not enough, so it can be divided into several pieces. For datagram transmission, qq, as the receiver (girlfriend), needs to parse and divide the sent UDP datagram into application layer datagrams. At this time, it can be parsed according to a mutually agreed protocol.

As a receiver, how can the above data be combined into one application layer datagram from multiple UDP datagrams?

The so-called encapsulation of UDP datagrams and segmentation of application layer datagrams is essentially the splicing of strings, that is to say, the essence of datagrams is strings, and the protocol describes how to encapsulate, divide, and parse these strings .

The UDP protocol does not provide for the reassembly of the divided datagrams. This is implemented by the application layer protocol (because the application layer chooses to use the UDP protocol), so it is necessary to design the application layer data according to the length of the UDP protocol. Generally speaking , the application layer protocol will perform relevant processing on the sending end, divide the original data into multiple UDP datagrams and send them out, and then reassemble them after receiving them at the receiving end. There are many ways to implement them, such as The serial number and other information are attached to the UDP header, and the receiving end splices datagrams through these information, including using some special delimiters to distinguish an application layer datagram, etc., which can be controlled by means of the application layer.

Example using some special delimiters:

This is just a strategy agreed by the application layer protocol, which can be customized, provided that both parties to the communication adopt the protocol.


When we need to use the UDP protocol to transfer large files, another way is to directly use the TCP protocol for transmission. Haha, even if it is not as fast as UDP, the authenticity of data transmission is high. It is not as troublesome as UDP. TCP is character-oriented For throttling transmission, it is also necessary for the communication parties to agree on the application layer protocol to facilitate the analysis of information.


 2.4 Workflow of UDP protocol

  1. The application transmits application layer datagrams to the UDP protocol.

  2. The UDP protocol encapsulates data into UDP datagrams (adding UDP packets directly on the basis of application layer datagrams), including information such as source port number, destination port number, and data length.

  3. The UDP protocol transmits UDP datagrams to the IP protocol.

  4. The IP protocol encapsulates UDP datagrams into IP datagrams, including information such as source IP address and destination IP address.

  5. The IP datagram is transmitted over the network to the destination host.

  6. The IP protocol of the destination host decapsulates the IP datagram into a UDP datagram.

  7. The UDP protocol transmits UDP datagrams to applications

Summary: The UDP protocol is a simple and fast transmission protocol, suitable for real-time applications and application scenarios that require fast data transmission. But because it does not guarantee the reliability of data transmission, other protocols, such as TCP, need to be used when transmitting important data.


So far, the bloggers of the UDP protocol in network programming have finished sharing. I hope it will be helpful to everyone. If there is anything wrong, please criticize and correct. 

This issue is included in the blogger's column - JavaEE, which is suitable for programming beginners. Interested friends can subscribe to view other "JavaEE basics".

next time

Thank you to everyone who read this article, and more exciting events are coming: Protect Xiaozhou ღ *★,°*:.☆( ̄▽ ̄)/$:*.°★* 

Met you, all the stars are falling on my head...

Guess you like

Origin blog.csdn.net/weixin_67603503/article/details/130108983