One of the core technology series for distributed application development - original message design based on TCP/IP

[Yuan Chuang Conference Preview] 1024 Programmers’ Day (two days before), meet at the Open Source China office, let’s talk about AI! >>>

This article is original and first published by the Grape City technical team. Please indicate the source for reprinting: Grape City official website . Grape City provides developers with professional development tools, solutions and services to empower developers.

Preface

The content of this article mainly focuses on the following parts:

A brief introduction to TCP/IP.
Introduction of the message.
Transport formats based on message classification (stream type and XML type).
The composition of the message system.

A brief introduction to TCP/IP

TCP/IP (Transmission Control Protocol/Internet Protocol) is the basic communication language or protocol in the Internet. It is actually a two-layer program, divided into high-level and low-level. The higher level is the Transmission Control Protocol, which is responsible for aggregating information or splitting files into smaller packets. These packets are transmitted through the network to the TCP layer at the receiving end, and the TCP layer at the receiving end restores the packets to the original files. The lower layer is the Internet Protocol, which handles the address portion of each packet so that the packets reach their destination correctly. Gateway computers on the network route the information based on its address. Even sub-packets from the same file may be routed differently but eventually converge at the destination. TCP/IP uses a client/server model for communication.

Architecturally, TCP/IP does not fully conform to 0SI's 7-layer reference model. The traditional open system interconnection reference model is a 7-layer abstract reference model of a communication protocol, in which each layer performs a specific task. The purpose of this model is to enable various hardware to communicate with each other on the same level. These 7 layers are: physical layer, data link layer, network layer, transport layer, session layer, presentation layer and application layer. The TCP/IP communication protocol adopts a 4-layer hierarchical structure, and each layer calls the network provided by the next layer to complete its own needs. These 4 layers are:

Application layer: The layer for communication between applications, such as Simple Mail Transfer Protocol (SMTP), File Transfer Protocol (FTP), Remote Network Access Protocol (Telnet), etc.
Transport layer: In this layer, it provides data transmission between nodes and communication services between applications. Its main functions are data formatting, data confirmation and loss retransmission, etc. Such as Transmission Control Protocol (TCP), User Datagram Protocol (UDP), etc. TCP and UDP add transmission data to the data packet and transmit it to the next layer. This layer is responsible for transmitting the data and determining that the data has been delivered. and receive.
Interconnection network layer: Responsible for providing basic data packet transmission functions so that each data packet can reach the destination host (but does not check whether it is received correctly), such as Internet Protocol (IP).
Network interface layer (host-network layer): receives IP datagrams and transmits them, receives physical frames from the network, extracts IP datagrams and forwards them to the next layer, manages the actual network media, and defines how to use the actual network (such as Ethernet, Serial Line, etc.) to transmit data.

Commonly used functions in Tcp/IP

1.Socket function

int socket(int domain,int type,int protocol),

domain specifies the protocol suite used, usually PF INET, which represents the Internet protocol suite (TCP/IP protocol suite); the type parameter specifies the type of socket; SOCK STREAM for TCP or SOCK DGRAM for UDP; protocol is usually assigned a value of [ 0]. The socket function call returns an integer socket descriptor, which can be called later.

2.bind function:

The bind function associates the socket with a port on the local machine, and then listens for service requests on that port. The bind function prototype is:

int bind(int sockfd,struct sockaddr *my addr, int addrlen)；

sockfd is the socket descriptor returned by calling the socket function; my addr is a pointer to the sockaddr type containing information such as the local IP address and port number: addrlen is often set to sizeof (struct sockaddr).

3.connect connection function:

The connection-oriented client program uses the connect function to configure the socket and establish a TCP connection with the remote server. Its function prototype is:

int connect(int sockfd, struct sockaddr *serv addr,int addrlen);

sockfd is the socket descriptor returned by the socket function; serv addr is a pointer containing the IP address and port number of the remote host; addrlen is the length of the remote address structure. The connect function returns -1 when an error occurs, and sets errno to the corresponding error code. There is no need to call bind 0 when designing client programs, because in this case you only need to know the IP address of the destination machine, and the client does not need to care about which port the client uses to establish a connection with the server. The socket executor program automatically selects an unoccupied one. port, and notifies the program when data arrives at the port.

4.listen listening function:

The network listening (listen) function puts the socket in passive listening mode and establishes an input data queue for the socket, saving arriving service requests in this queue until the program processes them.

int listen(int sockfd, int backlog)；

sockfd is the socket descriptor returned by the Socket system call; backlog specifies the maximum number of requests allowed in the request queue. Incoming connection requests will wait in the queue for the receive function accept 0) (see below). The backlog limits the number of requests waiting for service in the queue. Usually the system default value is 20. If a service request comes in and the input queue is full, the socket will reject the connection request and the client will receive an error message.

5.accept receiving function:

The accept0 function allows the server to accept the client's connection request. After establishing the input queue, the server calls the accept function, then sleeps and waits for the client's connection request.

int accept(int sockfd, void *addr, int *addrlen)；

sockfd is the monitored socket descriptor, addr is usually a pointer to the sockaddr_in variable, which is used to store information about the host that makes the connection request service (a host sends the request from a certain port); addrlen is usually a pointer to An integer pointer variable whose value is sizeof (struct sockaddr in). When an error occurs, the accept function returns -1 and sets the corresponding errno error code.

6.sendto function and recvfrom function:

int sendto(int sockfd, const void *msg,int len,unsigned int flags,const struct sockaddr *to, int tolen):

to represents the IP address and port number information of the destination machine, and tolen is often assigned the value of sizeof (struct sockaddr). The sendto function returns the actual length of data bytes sent or -1 in case of a send error.

int recyfrom(int sockfd,void *buf,int len,unsigned int flags,structsockaddr *from,int *fromlen);

from is a variable of type struct sockaddr, which saves the IP address and port number of the source host. fromlen is often set to sizeof (struct sockaddr). When recvfrom() returns, fromlen contains the number of data bytes actually stored in from. The recvfrom() function returns the number of bytes received or -1 when an error occurs, and sets the corresponding errno error code.

7.shutdown function

shutdown function to close the socket. This function allows you to stop data transfer in one direction only while data transfer in the other direction continues.

int shutdown(int sockfd,int how);

sockfd is the descriptor of the socket that needs to be closed. The parameter how allows you to choose the following methods for the shutdown operation:

0-1 Do not allow to continue receiving data
1--Do not allow to continue sending data
2-Do not allow further sending and receiving of data

shutdown returns 0 when the operation is successful, returns -1 when an error occurs, and sets the corresponding errno error code.

8.fcntl function

The fcntl function can change the properties of an open file.

int fcntl (int fields, int cmd, .../* int arg */) ;

9.getsockopt and setsockopt functions

These two functions can get or set options associated with a socket. In order to manipulate socket layer options, the layer value should be specified as SOL SOCKET. In order to operate the option control options of other layers the appropriate protocol number must be given. For example, to indicate that an option is to be parsed by TCP, the layer should be set to the protocol number TCP.

int getsockopt(int sock, int level, int optname, void *optval, socklen_t *optlen);
int setsockopt(int sock, int level, int optname, const void *optval, socklen_t optlen);

10.select function

The select function is a system call or function used for multiplexing. It is usually used to handle multiple input and output streams to implement asynchronous I/O operations.

int select(int n, fd set * readfds, fd set * writefds, fd set * exceptfds,struct timeval * timeout);

The parameter n represents the largest file descriptor plus 1. The parameters readfds, writefds and exceptfds are called descriptor groups and are used to return the read, write or exception status of the descriptor.

11.poll function

int poll(struct pollfd fds[], nfds t nfds, int timeout);

Among them, fds is an array of struct pollfd structure type, used to store the socket descriptor whose status needs to be detected. The definition of struct pollfd is as follows:

struct pollfd {
        //descriptor to check
        int fd；
        //events of interest on fd
        short events;
        //events that occurred on fd
        short revents;
}

what is news

Message is the smallest unit at the programming level when communicating between two logical entities on the network in distributed application development.

Here are some explanations for the above definition:

(1) The concept of messages exists in development work and is located at the programming level. While the system is running, it is transparent to application users.

(2) Two logical entities on the network refer to two programs that can run independently. They can be deployed on two different physical devices in the network, or they can be deployed on the same physical device, but generally two An independent process without a parent-child relationship (this is different from the most basic message concept in IPC programming).

(3) Message is the smallest unit at the programming level in distributed communication. That is, no matter how much or how little data is involved in the communication, the program code is implemented by sending and receiving one or more messages.

(4) Communication between two applications on the network includes two types: data stream transmission and remote procedure (function) calling.

(5) Messages can be used to achieve structured data communication between distributed applications. In other words, what programmers face at the communication level is no longer an actual byte stream, but a structured data unit that can be combined from multiple data types.

In fact, this structured data unit itself is a "message", which can be externally represented as a structure or class. Therefore, when the message mechanism based on the above definition is established, programmers only need to generate the corresponding message when distributed communication is needed during the coding process, and then call the corresponding sending and receiving interfaces to implement it conveniently. There is no need to understand TCP/IP knowledge, master the basic skills of socket programming, and there is no need to consider other issues such as too many serial messages, too many concurrent messages, network flow control, etc., so that distributed applications can truly be implemented. Development efforts are concentrated on business implementation, which greatly improves the development efficiency and quality of distributed systems, especially large-scale distributed systems.

Regarding the existence form of the message, in the traditional C language, it can be a structure struct; in an object-oriented language (C++ or Java), it can be a class.

Transmission format based on message classification

Based on the different formats of message transmission, messages can be divided into stream messages and XML messages. Stream messages are transmitted based on binary byte streaming format, and XML messages are transmitted based on string strings in XML format.

streaming messages

Stream messages refer to messages that are transmitted and processed in a stream (stream) manner in a computer system. A stream message consists of a series of continuous data, which is generated at the sending end in a certain order and transmitted to the receiving end in the form of a stream. During the transmission process, the receiving end can read the data in the stream one by one. , for stream messages, no matter how the programmer expresses the message, the message needs to be converted into a binary stream format before it is actually sent. This conversion process is called Streamlization, or Serilization.

XML message

XML messaging refers to a data transmission method that uses Extensible Markup Language (XML) as the message format. XML is a text markup language used to describe and store data. It uses tags to define the structure and properties of the data. In the XML message mechanism, after programmers express the message content in XML format, they do not need to do any format conversion work for sending and transmitting (excluding encryption work for secure transmission), and can directly send it out in XML string format. . XML messages are also widely used. For example, the SOAP protocol in Web Service is designed and implemented based on XML messages.

For example: design and implementation method based on streaming messages

The editor below will briefly introduce how to send and receive a person's information (including height, name and age) on the two applications.

(1) Define a class to store people’s information:

struct Person {
        char name[20] ；
        float height;
        int age;
}
struct Person p;
strcpy(p.name ,"Michael Zhang")；
height = 170.00；
age = 30；

(2) Structure the information sequence

char sendStream[1024] = {0};
sprintf(sendStream,"|%s|%f"%d",p.name, p.height, p.age);

(3) The sender sends a byte stream:

/*注: 这里省略建立/管理/关闭 TCP 连接的代码*/
char datalen[4+1] = (0);
sprintf(datalen，"04d" , strlen (sendStream) );
if(SendBytes ( socket, datalen， 4) == -1) {
        return -l;
}
if(SendBytes(socket, sendStream, strlen(sendStream)) == -1) {
        return -1
}

Note that the function SendBytes in the above code actually ensures that all byte streams of a certain length are successfully sent before returning. This is mainly because calling the send or write function on the socket cannot guarantee that the byte stream of a certain length can be sent completely at one time. . The basic idea of SendBytes is to send in a loop until all bytes are successfully sent. The implementation code is as follows:

int SendBytes (int sd, const void *buffer, unsigned len) {
        int rez = 0;
        int leftlen = len;
        int readlen = 0:
}
while(true) {
        rez = write (socket, (char *)buffer+readlen, len-readlen);
        if(rez < 0) {
                if (errno != EWOULDBLOCK && errno != EINTR) {
                        ErrorMsg("Error is serious );
                        DisConnect(socket);
        }
    return -l:
    }
    readlen += rez;
    leftlen -= rez;
    if(leftlen <= 0){
    break;
    }
   }
return len:
}

(4) The receiver receives the byte stream:

char datalen[4+1] = {0}；
char receiveStream[1024] = {0}；
sprintf(datalen,"%04d", strlen(sendStream)) ；
if(ReceiveBytes(socket, datalen, 4) == -1 {
        return -l；
}
int packet len = atoi(datalen) :
if(ReceiveBytes (socket, receiveStream, packet len) == -1) {
        return -l；
}

The ReceiveBytes function can refer to the third step where the sender sends the byte stream.

(5) The byte stream is deserialized to obtain the structure:

struct Person p;
sscanf(receiveStream,"%[`|]|%f|%d", p.name, &p.height, &p.age) ;

Summarize

This article briefly introduces the TCP/IP protocol and its commonly used interface functions, then introduces the classification and transmission format of messages in the TCP/IP protocol, and finally ends with a simple example of sending messages. If you have any comments or suggestions on the content, you are welcome to leave messages and discuss in the comment area.

Reference book: "Message Design and Development - Core Technology of Distributed Application Development" He Xiaochao

Extension link:

From form-driven to model-driven, interpret the development trend of low-code development platforms

What is a low-code development platform?

Branch-based version management helps low-code move from project delivery to customized product development

One of the core technology series for distributed application development - original message design based on TCP/IP

Guess you like