linux socket programming, not limited to linux, everything is socket

 "Everything is Socket!"

Although it is a bit exaggerated, the fact is that almost all network programming now uses sockets.

——I am interested in actual programming and open source project research.

We are well versed in the value of information exchange, how do processes in the network communicate, such as when we open the browser to browse the web every day, how does the browser process communicate with the web server? When you chat with QQ, how does the QQ process communicate with the server or the QQ process of your friend? All these have to rely on sockets? What is a socket? What are the types of sockets? There are also the basic functions of sockets, which are all that this article wants to introduce. The main content of this article is as follows:

1. How to communicate between processes in the network?

2. What is Socket?

3. The basic operation of socket

  3.1, socket() function

  3.2, bind() function

  3.3, listen(), connect() functions

  3.4, accept() function

  3.5, read(), write() functions, etc.

  3.6, close() function

4. Detailed explanation of TCP three-way handshake connection establishment in socket

5. Detailed explanation of TCP's four-way handshake release connection in socket

6. An example (practice it)

7. Leave a question, welcome to reply to answer! ! !


If you think the article is helpful to you, you might as well give me some attention.
Knowing: The road to baldness  
b station: Linux also has a way back.   
Every day we update our public course recording and broadcasting, programming dry goods and big factory experience,
or click the link directly

C/c++ linux server development senior architect
Come to the class to talk face-to-face with our lecturers
. Small partners who need a big factory face to follow the learning syllabus can join the group 973961276 to obtain


1. How to communicate between processes in the network?

There are many ways of local inter-process communication (IPC), but they can be summarized into the following four categories:

Message passing (pipe, FIFO, message queue) synchronization (mutexes, condition variables, read-write locks, file and write record locks, semaphores) shared memory (anonymous and named) remote procedure calls (Solaris gate and Sun RPC) )

But these are not the subject of this article! What we want to discuss is how to communicate between processes in the network? The first problem to be solved is how to uniquely identify a process, otherwise communication is impossible! The process PID can be used to uniquely identify a process locally, but this is not feasible in the network. In fact, the TCP/IP protocol family has helped us solve this problem. The "ip address" of the network layer can uniquely identify the host in the network, and the "protocol + port" of the transport layer can uniquely identify the application (process) in the host. In this way, a triple (ip address, protocol, port) can be used to identify the process of the network, and process communication in the network can use this flag to interact with other processes.

Applications that use the TCP/IP protocol usually use application programming interfaces: UNIX BSD sockets and UNIX System V TLI (which has been eliminated) to implement communication between network processes. For now, almost all applications use sockets, and now it is the network age, process communication is ubiquitous in the network, which is why I say "Everything is socket".

2. What is Socket?

We already know that processes in the network communicate through sockets. So what is socket? Socket originated from Unix, and one of the basic philosophy of Unix/Linux is that "everything is a file", which can be operated in the mode of "open open -> read and write write/read -> close close". My understanding is that Socket is an implementation of this mode. Socket is a special file. Some socket functions are operations on it (read/write IO, open, close). We will introduce these functions later.

The origin of the word socket was first used in the networking field in the document IETF RFC33 published on February 12, 1970, authored by Stephen Carr, Steve Crocker and Vint Cerf. According to the records of the American Museum of Computer History, Croker wrote: "The elements of a namespace can be called a socket interface. A socket interface forms one end of a connection, and a connection can be completely specified by a pair of socket interfaces. "The Computer History Museum added: "This is about 12 years before the BSD socket interface definition."

3. The basic operation of socket

Since the socket is an implementation of the "open-write/read-close" mode, the socket provides functional interfaces corresponding to these operations. Let's take TCP as an example to introduce several basic socket interface functions.

3.1, socket() function

int socket(int domain, int type, int protocol);

The socket function corresponds to the opening operation of ordinary files. The opening operation of a normal file returns a file description word, and socket() is used to create a socket descriptor, which uniquely identifies a socket. This socket description word is the same as the file description word, and it is used in subsequent operations. Use it as a parameter to perform some read and write operations.

Just as you can pass in different parameter values ​​to fopen to open different files. When creating a socket, you can also specify different parameters to create different socket descriptors. The three parameters of the socket function are:

domain: Protocol domain, also known as protocol family (family). Commonly used protocol families are AF_INET, AF_INET6, AF_LOCAL (or AF_UNIX, Unix domain socket), AF_ROUTE and so on. The protocol family determines the address type of the socket, and the corresponding address must be used in communication. For example, AF_INET determines the combination of ipv4 address (32-bit) and port number (16-bit), and AF_UNIX determines the use of an absolute path Name as address. type: Specify the socket type. Commonly used socket types are SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, SOCK_PACKET, SOCK_SEQPACKET, etc. (what are the socket types?). protocol: As the name suggests, it is the designated protocol. Commonly used protocols are IPPROTO_TCP, IPPTOTO_UDP, IPPROTO_SCTP, IPPROTO_TIPC, etc., which correspond to TCP transmission protocol, UDP transmission protocol, STCP transmission protocol, TIPC transmission protocol (I will discuss this protocol separately!).

Note: It is not that the above type and protocol can be combined at will. For example, SOCK_STREAM cannot be combined with IPPROTO_UDP. When protocol is 0, the default protocol corresponding to the type type is automatically selected.

When we call socket to create a socket, the returned socket description word exists in the protocol family (address family, AF_XXX) space, but does not have a specific address. If you want to assign an address to it, you must call the bind() function, otherwise the system will automatically randomly assign a port when you call connect() and listen().

3.2, bind() function

As mentioned above, the bind() function assigns a specific address in an address family to the socket. For example, corresponding to AF_INET and AF_INET6 is to assign an ipv4 or ipv6 address and port number combination to the socket.

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

The three parameters of the function are:

sockfd: the socket description word, which is created by the socket() function and uniquely identifies a socket. The bind() function will bind a name to this descriptor. addr: a const struct sockaddr * pointer, pointing to the protocol address to be bound to sockfd. This address structure is different according to the address protocol family when the socket is created. For example, ipv4 corresponds to: struct sockaddr_in {sa_family_t sin_family; /* address family: AF_INET */ in_port_t sin_port; /* port in network byte order */ struct in_addr sin_addr; /* internet address */ }; /* Internet address. */ struct in_addr {uint32_t s_addr; /* address in network byte order */ }; ipv6 corresponds to: struct sockaddr_in6 {sa_family_t sin6_family; /* AF_INET6 * / in_port_t sin6_port; /* port number */ uint32_t sin6_flowinfo; /* IPv6 flow information */ struct in6_addr sin6_addr; /* IPv6 address */ uint32_t sin6_scope_id; /* Scope ID (new in 2.4) */ }; struct in6_addr {unsigned char s6_addr[16]; /* IPv6 address */ }; Unix domain corresponds to: #define UNIX_PATH_MAX 108 struct sockaddr_un {sa_family_t sun_family;

addrlen: corresponds to the length of the address.

Usually the server will be bound to a well-known address (such as ip address + port number) when it is started, which is used to provide services, and the client can connect to the server through it; the client does not need to specify, and the system automatically assigns a port number Combined with its own ip address. This is why usually the server will call bind() before listen, but the client will not call it. Instead, the system randomly generates one when connect().

Network byte order and host byte order The host byte order is what we usually call big-endian and little-endian modes: different CPUs have different endian types. These endianness refer to the order in which integers are stored in memory. This is called the host sequence. The definitions of Big-Endian and Little-Endian that quote the standard are as follows: a) Little-Endian means that the low-order byte is placed at the low address end of the memory and the high-order byte is placed at the high address end of the memory. b) Big-Endian means that the high byte is placed at the low address end of the memory, and the low byte is placed at the high address end of the memory. Network byte order: 4 bytes of 32-bit values ​​are transmitted in the following order: first, 0~7bit, secondly 8~15bit, then 16~23bit, and finally 24-31bit. This transmission order is called big endian. Because all binary integers in the TCP/IP header are required to be in this order when they are transmitted over the network, it is also called network byte order. Endianness, as the name implies, is the order in which data larger than one byte is stored in the memory. There is no order issue for one byte of data. Therefore: when binding an address to the socket, please convert the host byte order to the network byte order first, instead of assuming that the host byte order is the same as the network byte order using Big-Endian. Because of this problem, there have been murders! Because of this problem in the company's project code, it has caused many inexplicable problems, so please remember not to make any assumptions about the host byte order, be sure to convert it to network byte order and assign it to the socket.

3.3, listen(), connect() functions

If as a server, after calling socket() and bind(), listen() will be called to monitor the socket. If the client calls connect() at this time to send a connection request, the server will receive the request.

int listen(int sockfd, int backlog); int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

The first parameter of the listen function is the description of the socket to be monitored, and the second parameter is the maximum number of connections that can be queued for the corresponding socket. The socket created by the socket() function is of an active type by default, and the listen function turns the socket into a passive type and waits for the client's connection request.

The first parameter of the connect function is the socket description of the client, the second parameter is the socket address of the server, and the third parameter is the length of the socket address. The client establishes a connection with the TCP server by calling the connect function.

3.4, accept() function

After the TCP server calls socket(), bind(), and listen() in turn, it will monitor the specified socket address. After the TCP client calls socket() and connect() in turn, it wants the TCP server to send a connection request. After the TCP server listens to the request, it will call the accept() function to receive the request, so that the connection is established. After that, you can start network I/O operations, that is, read and write I/O operations similar to ordinary files.

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

The first parameter of the accept function is the socket description of the server, the second parameter is a pointer to struct sockaddr *, which is used to return the client's protocol address, and the third parameter is the length of the protocol address. If accpet is successful, the return value is a brand new description word automatically generated by the kernel, which represents the TCP connection with the returning client.

Note: The first parameter of accept is the socket description of the server, which is generated by the server calling the socket() function, which is called the listening socket description; and the accept function returns the description of the connected socket. A server usually only creates a listening socket description, which always exists during the life cycle of the server. The kernel creates a connected socket descriptor for each client connection accepted by the server process. When the server completes the service for a certain client, the corresponding connected socket descriptor is closed.

3.5, read(), write() and other functions

Everything has nothing but Dongfeng. So far the server and the client have established a connection. The network I/O can be called for read and write operations, that is, the communication between different processes in the network is realized! There are the following groups of network I/O operations:

read()/write()recv()/send()readv()/writev()recvmsg()/sendmsg()recvfrom()/sendto()

I recommend using recvmsg()/sendmsg() functions. These two functions are the most common I/O functions. In fact, you can replace the other functions above with these two functions. Their declaration is as follows:

 

The read function is responsible for reading content from fd. When the read is successful, read returns the actual number of bytes read. If the returned value is 0, it means that the end of the file has been read, and if it is less than 0, it means an error has occurred. If the error is EINTR, it means that the reading is caused by an interrupt. If it is ECONNREST, it means that there is a problem with the network connection.

The write function writes the contents of nbytes in buf to the file descriptor fd. It returns the number of bytes written when it succeeds. On failure, it returns -1 and sets the errno variable. In a network program, there are two possibilities when we write to the socket file descriptor. 1) The return value of write is greater than 0, which means that part or all of the data has been written. 2) The returned value is less than 0, and an error has occurred at this time. We have to deal with it according to the type of error. If the error is EINTR, it means that an interrupt error occurred while writing. If it is EPIPE, there is a problem with the network connection (the other party has closed the connection).

I will not introduce these pairs of I/O functions one by one. For details, please refer to the man document or baidu and Google. Send/recv will be used in the following examples.

3.6, close() function

After the server establishes a connection with the client, some read and write operations will be performed. After the read and write operations are completed, the corresponding socket descriptor must be closed. It is like calling fclose to close the open file after the operation is completed.

#include <unistd.h> int close(int fd);

The default behavior of closing a TCP socket is to mark the socket as closed, and then immediately return to the calling process. The description word can no longer be used by the calling process, that is, it can no longer be used as the first parameter of read or write.

Note: The close operation only makes the reference count of the corresponding socket descriptor -1. Only when the reference count is 0, the TCP client is triggered to send a connection termination request to the server.

4. Detailed explanation of TCP three-way handshake connection establishment in socket

We know that TCP establishes a "three-way handshake" to establish a connection, that is, to exchange three packets. The general process is as follows:

The client sends a SYN J to the server, the server responds to the client with a SYN K, and ACK J+1 for SYN J, the client wants the server to send a confirmation ACK K+1

Only the three-way handshake is over, but what happens in the socket functions? Please see the picture below:

Figure 1. TCP three-way handshake sent in socket

It can be seen from the figure that when the client calls connect, it triggers a connection request and sends a SYN J packet to the server. At this time, connect enters the blocking state; the server listens to the connection request and receives the SYN J packet and calls the accept function. The receiving request sends SYN K and ACK J+1 to the client, then accept enters the blocking state; after the client receives the SYN K and ACK J+1 from the server, then connect returns and confirms SYN K; the server receives When ACK K+1 is reached, accept returns. So far, the three-way handshake is completed and the connection is established.

Summary: The client's connect returns in the second time of the three-way handshake, while the server-side accept returns in the third time of the three-way handshake.

5. Detailed explanation of TCP's four-way handshake release connection in socket

The above describes the TCP three-way handshake establishment process in the socket and the socket functions involved. Now we introduce the four-way handshake in the socket to release the connection process, please see the following figure:

Figure 2. TCP four-way handshake sent in socket

The graphic process is as follows:

An application process first calls close to actively close the connection. At this time, TCP sends a FIN M; after the other end receives FIN M, it executes a passive close to confirm the FIN. Its reception is also passed to the application process as the end of file, because the reception of FIN means that the application process can no longer receive additional data on the corresponding connection; after a period of time, the application process that receives the end of file calls close to close it Socket. This causes its TCP to also send a FIN N; the source sender TCP that receives this FIN acknowledges it.

So there is a FIN and ACK in each direction.

6. An example (practice it)

Having said that, let's practice. Let’s write a simple server and client (using TCP)-the server always listens to port 6666 of the machine. If it receives a connection request, it will receive the request and receive the message sent by the client; the client and the server are established Connect and send a message.

Server-side code:

Service-Terminal

Client code:

Client

Of course, the above code is very simple and has many shortcomings. This is just a simple demonstration of the basic functions of the socket. In fact, no matter how complex network programs are, these basic functions are used. The above server uses an iterative mode, that is, only after processing one client request will it process the next client request. Such a server's processing capacity is very weak. In reality, the server needs to have concurrent processing capabilities! In order to require concurrent processing, the server needs to fork() a new process or thread to handle requests, etc.

7, hands-on

Leave a question, everyone is welcome to reply! ! ! Are you familiar with network programming under Linux? If familiar, write the following program to complete the following functions:

Service-Terminal:

Receive client information of the address 192.168.100.2, if the information is "Client Query", then print "Receive Query"

Client:

Send the information "Client Query test", "Cleint Query", "Client Query Quit" to the server with address 192.168.100.168 in order, and then exit.

The ip address in the question can be determined according to the actual situation.

——This article only introduces simple socket programming.

The more complex needs to go deeper.

(Unix domain socket) Using udp to send a message >=128K will report an ENOBUFS error (a problem encountered in actual socket programming, I hope it will help you)


You have seen this, you may as well pay attention to me, follow-up will continue to update programming related learning experience, I hope it can be helpful to everyone

Guess you like

Origin blog.csdn.net/linuxguitu/article/details/109360198