Linux socket programming: network programming foundation

In Linux, there is a very popular saying: Everything in Linux is a file . Indeed, in Linux, various devices can be operated by means of files, and the files of peripheral devices are usually called device files . The network communication in Linux is also achieved by operating network file descriptors.
In the previous blog "Introduction to the Transport Layer" we know that the devices and devices on the Internet must know the IP addresses and port numbers of both parties. The IP address can find the communication host, and the port number indicates the actual communication process . This is often referred to as socket communication, socket = <IP>: <port>. In Linux, network communication also needs this information, which is called socket address structure in Linux system . In Linux, the socket address structure usually starts with sockaddr. The following introduces several commonly used socket address structures.

struct   sockaddr {                                 
	sa_family_t   		sa_family;/*网络通信协议的域,和socket的第一个参数一致;常用PF_INET(AF_INET)*/
	char          		sa_data[14];
}
 struct sockaddr_in   {                                   
	u8        				sin_len;/*固定长度16*/
	u8						sin_family;	/*协议的domain*/
	u16    					sin_port;/*通信使用的端口号*/
	struct in_addr 		 	sin_addr;/*通信使用的IP地址*/
	char 					sin_zero[8];/*保留字段,为0*/
}
struct sockaddr_un {
	sa_family_t 		sun_family;/*domain*/
	char  				sun_path[UNIX_PATH_MAX];/*108的长度,保存路径*/
}

typedef unsigned short sa_family_t;
struct in_addr{
	u32 s_addr;
}

Among them struct sockaddr is a general socket address structure, it can be forced to convert between different protocol families. The size of struct sockaddr and struct sockaddr_in are the same. The following figure shows the corresponding relationship. struct sockaddr_in is the Ethernet address structure, and struct sockaddr_un is the protocol family address structure of the Unix domain. It mainly uses communication between processes on the same host, and its speed will be twice as fast as the communication using the Ethernet structure. .
Insert picture description here

Building a web framework API

Here mainly introduces the function of each function in the construction of the network framework, the specific test code is not posted, a lot of online.

Create a network socket function socket ()

Everything is a file in Linux, that is, you can operate the device by reading and writing files. The same is true for network communication, where the socket () function creates a socket description in the kernel and associates it with a file descriptor . Subsequent operation of this file descriptor can control this network interface socket. A successful socket () function will return a socket file descriptor.

int socket(int domain, int type, int protocol);

The socket prototype is as above, where domain is used to set the domain of network communication, and the function socket selects the family of communication protocols based on this parameter. Ethernet should use the PF_INET field (AF_INET and its value are the same). The values ​​of domain are listed below, and the red mark is commonly used.
Insert picture description here

  • PF_UNIX: Mainly used for Unix domain communication, that is, local communication. When using Unix communication, the speed will be twice as fast as other APIs.
  • PF_INET: IPV4 communication, this domain is used in most cases.
  • PF_NETLINK: Mainly used for netlink communication, that is, communication between user space and kernel.
  • PF_PACKET: mainly used to directly access the data of the network card MAC, that is, directly operate the frame on the network card.
    The type is used to set the type of socket communication (ie protocol), such as the common control transmission protocol TCP type SOCK_STREAM, user data packet protocol UDP type SOCK_DGRAM, and the original socket type SOCK_RAM. The following figure is its optional value
    Insert picture description here
    Note: Not all protocol family domains implement these protocol types.
    The third parameter protocol of the socket function is used to specify a specific type of a protocol, that is, the expansion of type. Many protocols often have only one specific type, so they can only be set to 0 at this time. But protocols like SOCK_RAM and SOCK_PACKET need to set this parameter to select the specific type of protocol.
    When creating a TCP socket, use socket (PF_INET, SOCK_STREAM, 0), and when creating UDP use socket (PF_INET, SOCK_DGRAM, 0). The process of creating a socket is as follows.
    Insert picture description here
    When user space calls socket, it will call sys_socket () in the kernel. Its main purpose is to create a kernel socket structure (inconsistent with the application layer), allocate resources such as queues (receive, send, exception), and copy to ops and type according to parameters. At the same time, the kernel socket and the file descriptor are also bound . Finally, the file descriptor is returned to the application layer. In this way, the corresponding kernel socket structure can be found through the file descriptor, that is, the operation of the network communication can be realized by operating the file.
    Insert picture description here
    note: The socket file descriptor is no different from the general file descriptor in form. To determine whether a file descriptor is a socket descriptor, you can obtain the mode of the file descriptor by calling the function fstat (), and then change the mode The S_IFMT part is compared with the identifier S_IFSOCK. In this way, you can know whether a file descriptor is a socket descriptor. You can use the following code to achieve.
int issockettype(int fd)
{
	struct stat st;
	int err = fstat(fd,&st);
	if(err<0){return -1;}
	
	if((st.st_mode & S_IFMF) == S_IFSOCK){
		return 1;
	}else{
		return 0;
	}
}

bind () is used to bind the address structure

After the socket is successfully established, the kernel socket can be found through the file descriptor, and the protocol parameters and corresponding operation functions can be obtained. But at this time, the socket file descriptor is not related to the IP and port number in the network. We can use bind () to bind the file descriptor to the network address structure. After binding , the socket file descriptor is associated with the IP, port, and type in the network address structure. The bind function is only used to bind the server network interface on the server side, and can not be used elsewhere. The function prototype is as follows

int bind(int sockfd, struct sockaddr*my_addr,socket_len addrlen)

Insert picture description here
The bind () function is mainly used to bind the network file descriptor and the network address structure, so that you can use sockfd to monitor the status on the network . Bind is often used on the server side, because the server has been running the service program since the server is turned on, and I don't know when there will be client connection and communication (processing the passive side). Therefore, accept () function will be used to establish a connection after bind in TCP; and recvfrom () function will be used to receive external data in UDP. If there is no bind network address, the server does not know to monitor the movement of the IP and port. Therefore, communication is impossible.

listen (), accept () monitor local port (used in TCP communication)

In the C / S architecture, after the service program on the server runs, it is necessary to always monitor the connection establishment request sent by the client. After receiving the request, the two parties can communicate only after establishing the connection. Due to limited resources on the server, and the server can only handle one client connection at a time, when multiple client connection requests arrive at the same time, the server cannot process them at the same time, so the client connection requests that cannot be processed need to be placed in the waiting queue . The length of the queue is specified by the listen () function. The length of the queue cannot be infinite due to the influence of the operating system and hardware resources. When the maximum length is exceeded, the kernel uses the maximum length. When the queue length is full, the request from the client will be lost. The client connect () function will return an ECONNREFUSED error . The listen process and the above bind () process type, first find the corresponding kernel socket according to the socketfd descriptor, and then call the listen function in socket-> ops.
The accept () function is to create a new socket for recording client information after the connection is established. After the function succeeds, the client's file descriptor is returned, and information such as the client's IP, port, and type can be obtained through parameters. When communicating with the client, you need to use the newly-connected client socket descriptor. The process is as follows:
Insert picture description here

Connect to the target network connect () function

After the client establishes the socket, there is no need to perform operations such as address binding. You can initiate a connection establishment request directly to the server. The function to connect to the server is connect. To establish a connection with the server, you need to specify the address structure of the server and the file descriptor of the client. After the connection is established successfully, you can use this socket descriptor to communicate with the server. When the connection establishment request queue on the server is full, this function returns an error code of type ECONNREFUSED . The calling process of the function is as follows:
Insert picture description here
From the above, we can see that the subsequent series of functions (such as bind, listen, accept, connect, etc.) all rely on the file descriptor sockfd created by socket (), and the corresponding kernel socket can be found through the file descriptor. And the ops (operation function) in the socket in the kernel will specify the specific protocol operation method according to the socket () parameter.

close function

When the communication is over, you can use the close () function to close sockfd. close () will release the corresponding sock resource and file descriptor in the kernel. In the network communication, you can also use the shutdown function to close the communication, which can be SHUT_RD (cut off read), SHUT_WR (cut off write), SHUT_RDWR (read and write are closed, and close equivalent).

Network byte order and IP address conversion processing

Since there may be a byte order correspondence problem between the data transmitted by the network and the local data, the byte order problem needs to be dealt with in network programming. The following will introduce some functions related to endianness and a brief introduction to endianness.
Byte order is generated due to the different memory storage order of CPU and OS for multi-byte variables , and is divided into big-endian and little-endian storage. Little-endian byte order stores the low byte at the starting address of the memory address representing the variable; big-endian byte order stores the high byte at the starting address of the memory address representing the variable.
Due to the vast differences in the host, the byte order of the host cannot be unified, but for the variables transmitted on the network, their values ​​must have a unified representation. The network byte order refers to the representation method of multi-byte variables during network transmission, and the network byte order uses the big-endian byte order representation method . The main functions under Linux are as follows:

       #include <arpa/inet.h>
       uint32_t htonl(uint32_t hostlong);
       uint16_t htons(uint16_t hostshort);
       uint32_t ntohl(uint32_t netlong);
       uint16_t ntohs(uint16_t netshort);

The variable passed in by the function is a variable that needs to be converted, and the return value is the converted value; the naming rule of the function is == “byte order” to “byte order” and “variable type” ==, h means host is the host, n represents network, that is, network byte order, l represents long type variable; s represents short type variable . When programming, you need to call the byte order conversion function to convert the host byte order to the network byte order. The ports and IPs in the network need to be replaced first and then assigned to the corresponding address structure.
Only binary data can be recognized in the computer, and 32-bit data like IP addresses are particularly inconvenient for memory and writing. People will separate the 8-bit and 8-bit IP addresses for easy memory. This is often called dotted decimal, such as 192.168.1.123. This is easier to remember. There are also corresponding functions in Linux to convert between string IP addresses and binary addresses . Commonly used functions in Linux are as follows:

       #include <sys/socket.h>
       #include <netinet/in.h>
       #include <arpa/inet.h>
       
       int inet_aton(const char *cp, struct in_addr *inp);
       in_addr_t inet_addr(const char *cp);
       in_addr_t inet_network(const char *cp);
       char *inet_ntoa(struct in_addr in);
       struct in_addr inet_makeaddr(in_addr_t net, in_addr_t host);
       in_addr_t inet_lnaof(struct in_addr in);
       in_addr_t inet_netof(struct in_addr in);
       struct in_addr{ 
			unsigned long int s_addr;/*IP地址*/
		}
		 const char *inet_ntop(int af, const void *src,
                             char *dst, socklen_t size);
		int inet_pton(int af, const char *src, void *dst);
  • The inet_aton () function converts the IP address of the dotted decimal string type stored in cp into a binary IP address and saves it in inp.
  • inet_addr () converts the IP address of the dotted decimal string type stored in cp to a binary IP address, which is expressed in network byte order
  • inet_network () converts the IP address of the dotted decimal string type stored in cp to a binary IP address, and the IP address is expressed in network byte order. Where cp can have abcd form, abc form or ab form.
  • The inet_ntoa () function is the inverse of inet_aton () conversion, converting the binary IP address into a dotted decimal 4-segment string IP address. This memory area is static, so multiple conversions save the last data.
  • The inet_makeaddr () function merges the host byte order network address and host address into a network byte order IP address
  • The inet_lnaof () function returns the host part of the IP address.
  • The inet_netof () function returns the network part of the IP address.

Conversion between IP address and domain name

In actual use, the domain name of the host is often used, and its IP address is rarely used. After all, domain name memory is more convenient. Domain names such as www.baidu.com and www.google.com will be much more convenient to remember than their dotted decimal IP addresses. However, the APIs in socket programming are based on IP addresses, so you need to convert between the host domain name and IP address. This is the DNS (domain name system) service, which serves as the translation between the host domain name and the IP address.
Which gethostbyname and gethostbyaddr functions can obtain information for a host . The gethostbyname function obtains the host information by the name of the host, and the gethostbyaddr function obtains the host information by the IP address.

       #include <netdb.h>
       extern int h_errno;

       struct hostent *gethostbyname(const char *name);
       #include <sys/socket.h>       /* for AF_INET */
       struct hostent *gethostbyaddr(const void *addr,
                                     socklen_t len, int type);

Insert picture description here
Both functions can return some information of the host, and its structure is shown in the figure above. h_name is the official name of the host, such as www.baidu.com, where h_length is the length of the IP address, which is 4 for IPv4, that is, 4 bytes. In the list of host IP addresses stored in h_addr_list, each length is It is h_length, and the end of the list is a NULL pointer. The relationship is as follows:
Insert picture description here

Protocol name processing function

To facilitate operation, Linux provides a set of functions for querying protocol values ​​and names. The following briefly introduces related functions, usage methods, and precautions. The operation functions in Linux are as follows. These functions operate on the records in the file / etc / protocols .

       #include <netdb.h>

       struct protoent *getprotoent(void);/*从协议文件中读取一行*/
       struct protoent *getprotobyname(const char *name);/*从协议文件中找到匹配项*/
       struct protoent *getprotobynumber(int proto);/*按照协议类型的值获取匹配项*/
       void setprotoent(int stayopen);/*设置协议文件打开状态*/
       void endprotoent(void);/*关闭协议文件*/

The content in the / etc / protocols file is as shown in the following figure, which records the name, value and alias of the protocol. The struct protoent structure is defined in Linux to describe this information.
Insert picture description here
Insert picture description here
p_name is a pointer to the protocol name, p_aliases is a pointer to the alias list, the protocol alias is a string, and p_proto is the protocol value.
Insert picture description here
Before reading the information in the / etc / protocols file, you need to open this file in advance and call the setprotoent () function. When the parameter is 1, it is always open. After the operation is complete, call endprotoent () to close.
The getprotobyname () function is used to obtain the information of the specified protocol name. It will be used when accessing the IP raw socket data. Since creating the IP raw socket requires specifying the protocol, these values ​​are not remembered by us, so we can use This function gets

TCP / UDP communication process

The communication modes in the network mainly include C / S, B / S, P2P and other modes, among which C / S is mostly used. The following explains the C / S communication mode. The transport layer can be divided into two types: TCP communication and UDP communication. The construction of TCP and UDP communication frameworks are described below.

TCP communication

Insert picture description here
The TCP communication process is relatively simple and will not be introduced here.

UDP communication

Because UDP is a connectionless and unreliable communication protocol, its programming architecture is still very different from TCP. UDP communication does not need to establish a connection, so there is no need to connect (), listen (), accept () and other functions. The programming process is as follows:
Insert picture description here
UDP creates a socket file descriptor, the function used is socket (), when the parameters and TCP communication are very different. type uses SOCK_DGRAM. The binding of the listening port is the same as TCP, and the members of the address structure are filled and then bound. Interactive use of sendto, recvfrom and other functions. The UDP communication protocol has no connection establishment process, so when using recvfrom, you can receive data sent by different senders, and you can use parameters to obtain the address structure information (IP, port, type, etc.).
The prototype of recvfrom is as follows, where src_addr is used to save the address structure information of the sender. The function flow is as follows:

ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen);

Insert picture description here

  • Find the kernel socket corresponding to the file descriptor
  • Create a message structure and pack the address buffer pointer and data buffer pointer of the user space into the message structure
  • Find the corresponding data in the corresponding data chain in the socket file descriptor and copy the data to the message
  • Destroy the data in the data chain, copy the data to the application layer space, and reduce the reference count of the file descriptor.
    Kernel space uses the message structure msghdr to store all data structures.
    Insert picture description here
    Msg_name and msg_namelen are used to store the sender's address related information , And the message is stored in msg_iov, base is the address of the receive data buffer passed in user space, and len is the length of the receive buffer passed in by the user.
    Insert picture description here
    Send data in UDP basically uses the sendto function, the prototype is as follows. Since UDP does not establish a connection, the address structure information of the destination needs to be filled in the parameter dest_addr. If the socket is not bound to the local IP address and port during transmission, it will be automatically filled in the network protocol stack, as shown in the figure:
ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,const struct sockaddr *dest_addr, socklen_t addrlen);

Insert picture description here
The UDP protocol does not guarantee the quality of service for transmission, so problems such as packet loss, out-of-order, flow control, and outgoing network interfaces may occur during transmission. For message loss and out-of-order message, you can use the TCP method. Mark the sequence number of the message in UDP. After receiving the UDP message, it returns an acknowledgment to tell the sender that it has been received. If the confirmation message is not received within the specified time, it is considered that the message is lost and retransmitted; Messages out of order can be recovered according to the sequence number in the message.
Using sendto and recvfrom in UDP communication can easily communicate with each host. When using these two functions, you can specify / get the address structure information of the other party. The connect function can also be used in UDP. After use, the socket descriptor and the network address structure will be bound (the binding is the other party's). After binding, you cannot use sendto and recvfrom. You can only use read / write or send / recv functions.The use of the connect () function in the UDP protocol only means that the address of the other party is determined, and the bind function only binds the local address and port for receiving

Published 35 original articles · Like1 · Visits 1870

Guess you like

Origin blog.csdn.net/lzj_linux188/article/details/105206832