Network programming socket, the most comprehensive explanation of each layer protocol of osi seven-layer architecture

Socket principle

Go to: https://www.jianshu.com/p/066d99da7cbd

1. What is Socket

In the field of computer communications, socket is translated as "socket", which is between the computers communicate in a convention or one way. Through the socket agreement, a computer can receive data from other computers or send data to other computers. The
  socket originates from Unix, and one of the basic philosophy of Unix / Linux is "everything is a file", you can use "open open- > Read and write write / read-> close the "close" mode to operate.
  My understanding is that Socket is an implementation of this pattern: that socket is a special file, and some socket functions are operations on it (read / write IO, open, close).
  The Socket () function returns an integer Socket descriptor, and subsequent operations such as connection establishment and data transmission are implemented through the Socket.

2. How processes communicate on the network

Since Socket is mainly used to solve network communication, then let's understand how processes communicate on the network.

2.1, local inter-process communication

a. Message passing (pipeline, message queue, FIFO)
  b. Synchronization (mutexes, condition variables, read-write locks, file and write record locks, semaphores)? [Not very clear]
  c. Shared memory (anonymous and named, eg: channel)
  d. Remote procedure call (RPC)

2.2. How the processes communicate in the network

To understand how processes communicate on the network, we have to solve two problems:
  a. How do we identify a host, that is, how do we determine which host the process we want to communicate is running on.
  b. How do we identify unique processes, local identification by pid, how should be identified in the network?
Solution:
  a. The TCP / IP protocol family has solved this problem for us. The "ip address" of the network layer can uniquely identify the host in the network
  b. The "protocol + port" of the transport layer can uniquely identify the application in the host. (Process), therefore, we can use triples (ip address, protocol, port) to identify the process of the network, and the process communication in the network can use this flag to interact with other processes

3. How to communicate with Socket

Now, we know how to communicate between processes in the network, that is, using triples [ip ​​address, protocol, port] can communicate between networks, then how should we achieve it, therefore, our socket came into being, it is to use A triplet is a middleware tool for network communication. At present, almost all applications use sockets, such as the socket of UNIX BSD (socket) and the TLI of UNIX System V (has been eliminated).
There are two commonly used data transmission methods for Socket communication:
  a. SOCK_STREAM: indicates a connection-oriented data transmission method. The data can reach another computer without error, and if it is damaged or lost, it can be resent, but the efficiency is relatively slow. The common http protocol uses SOCK_STREAM to transmit data, because it is necessary to ensure the correctness of the data, otherwise the web page cannot be parsed normally.
  b. SOCK_DGRAM: indicates a connectionless data transmission method. The computer just transmits the data and does not perform data verification. If the data is damaged during transmission, or does not reach another computer, there is no way to remedy it. In other words, the data is wrong and cannot be retransmitted. Because SOCK_DGRAM does less verification work, the efficiency is higher than SOCK_STREAM.
  For example: QQ video chat and voice chat use SOCK_DGRAM to transmit data, because first of all, we must ensure the efficiency of communication and minimize the delay, and the accuracy of the data is secondary, even if a small part of the data is lost, video and audio can be Normal analysis, up to noise or noise, will not have a substantial impact on communication quality

4. TCP / IP protocol

4.1 Concept

TCP / IP [TCP (Transmission Control Protocol) and IP (Internet Protocol)] provides a point-to-point link mechanism that standardizes how data should be encapsulated, addressed, transmitted, routed, and received at the destination. It abstracts the software communication process into four abstract layers, and adopts the protocol stack to implement different communication protocols. The various protocols under the protocol family are classified into these four hierarchical structures according to their functions, and are often regarded as a simplified seven-layer OSI model.

Between them is like the role of the delivery line and the station. For example, to suggest a delivery station, you must understand the details of the delivery.

TCP (Transmission Control Protocol, Transmission Control Protocol) is a connection-oriented, reliable, byte stream-based communication protocol. The data must be connected before transmission, and disconnected after transmission. The client is sending and receiving data Before using the connect () function to establish a connection with the server. The purpose of establishing a connection is to ensure that the IP address, port, physical link, etc. are correct, and to open up channels for data transmission.
When establishing a connection, TCP transmits three data packets, commonly known as three-way handshaking (Three-way Handshaking). The image can be likened to the following dialogue:

[Shake 1] 套接字A:“你好,套接字B,我这里有数据要传送给你,建立连接吧。”
[Shake 2] 套接字B:“好的,我这边已准备就绪。”
[Shake 3] 套接字A:“谢谢你受理我的请求。

4.2 TCP's sticky packet problem and data's borderlessness:

https://blog.csdn.net/m0_37947204/article/details/80490512

4.3, TCP datagram structure:

The shaded fields need to be highlighted:
(1) Sequence number: Seq (Sequence Number) sequence number occupies 32 bits and is used to identify the sequence number of the data packet sent from computer A to computer B. This is marked when the computer sends data .
(2) Acknowledgement number: Ack (Acknowledge Number) acknowledgment number occupies 32 bits, both client and server can send, Ack = Seq + 1.
(3) Flag bit: each flag bit occupies 1Bit, there are 6 in total, namely URG, ACK, PSH, RST, SYN, FIN, the specific meaning is as follows:

(1)URG:紧急指针(urgent pointer)有效。
(2)ACK:确认序号有效。
(3)PSH:接收方应该尽快将这个报文交给应用层。
(4)RST:重置连接。
(5)SYN:建立一个新连接。
(6)FIN:断开一个连接。

4.4. Connection establishment (three-way handshake):

When using connect () to establish a connection, the client and server will send three data packets to each other, please see the following figure:

The client calls socket () function to create a socket after, because no connection is established, the socket is in CLOSED state; server-side calls listen () after the function, the socket into the LISTEN state, start listening for client requests
where the Client The terminal initiates the request:
  1) When the client calls the connect () function, the TCP protocol will construct a data packet and set the SYN flag bit, indicating that the data packet is used to establish a synchronous connection. At the same time, a random number 1000 is generated, and the "Seq" field is filled to indicate the sequence number of the data packet. After completing these tasks, it starts sending data packets to the server, and the client enters the SYN-SEND state.
  2) The server receives the data packet, detects that the SYN flag has been set, and knows that this is a "request packet" sent by the client to establish a connection. The server will also set up a data packet and set the SYN and ACK flags. SYN indicates that the data packet is used to establish a connection. ACK is used to confirm that the data packet sent by the client has been received. The
  server generates a random number 2000 and fills in the "serial number" (Seq) "field. 2000 has nothing to do with client packets.
  The server adds 1 to the client data packet sequence number (1000) to get 1001, and fills the "Ack" field with this number.
  The server sends the data packet and enters the SYN-RECV state.
  3) The client receives the data packet, detects that the SYN and ACK flags have been set, and knows that this is a "confirmation packet" sent by the server. The client will check the "Ack" (Ack) field to see if its value is 1000 + 1, if it is, it means the connection is successfully established.
  Next, the client will continue to build data packets and set the ACK flag, indicating that the client has correctly received the "acknowledgement packet" from the server. At the same time, add 1 to the sequence number (2000) of the data packet sent by the server just now to get 2001, and use this number to fill the "Ack" field.
  The client sends the data packet and enters the ESTABLISED state, indicating that the connection has been successfully established.
  4) The server receives the data packet, detects that the ACK flag has been set, and knows that this is an "acknowledgement packet" sent by the client. The server will check the "acknowledgement number (Ack)" field to see if its value is 2000 + 1, if it is, it means that the connection is established successfully, and the server enters the ESTABLISED state.
  At this point, the client and server have entered the ESTABLISED state, the connection is established successfully, and then you can send and receive data.

4.5, TCP four-way handshake to disconnect

It is very important to establish a connection, it is the premise of the correct transmission of data; disconnection is also important, it allows the computer to release resources that are no longer in use. If the connection cannot be disconnected normally, it will not only cause data transmission errors, but also cause the socket to be closed and continue to occupy resources. If the amount of concurrency is high, the server pressure is worrying.
Disconnecting requires four handshake, which can be likened to the following dialogue:

[Shake 1] 套接字A:“任务处理完毕,我希望断开连接。”
[Shake 2] 套接字B:“哦,是吗?请稍等,我准备一下。”
等待片刻后……
[Shake 3] 套接字B:“我准备好了,可以断开连接了。”
[Shake 4] 套接字A:“好的,谢谢合作。”

The following figure demonstrates the scenario where the client actively disconnects:

After the connection is established, both the client and server are in ESTABLISED state. At this time, the client initiates a disconnect request:

  1. After the client calls the close () function, it sends a FIN packet to the server and enters the FIN_WAIT_1 state. FIN is the abbreviation of Finish, which means disconnection is required to complete the task.
  2. After receiving the data packet, the server detects that the FIN flag is set and knows to disconnect, so it sends a "confirmation packet" to the client and enters the CLOSE_WAIT state.
    Note: The server does not immediately disconnect after receiving the request, but first sends a "confirmation packet" to the client, telling it that I know, I need to prepare to disconnect.
  3. After receiving the "confirmation packet", the client enters the FIN_WAIT_2 state, and waits for the server to prepare to send the data packet again.
  4. After waiting for a while, the server is ready to disconnect, so it actively sends a FIN packet to the client to tell it that I am ready, disconnect it. Then enter the LAST_ACK state.
  5. After receiving the FIN packet from the server, the client sends an ACK packet to the server to tell it to disconnect. Then enter the TIME_WAIT state.
  6. After receiving the client's ACK packet, the server disconnects, closes the socket, and enters the CLOSED state.

4.6. Explanation about TIME_WAIT status

After the client sends the ACK packet for the last time, it enters the TIME_WAIT state instead of directly entering the CLOSED state to close the connection. Why?

TCP is a connection-oriented transmission method. It must be ensured that the data can reach the target machine correctly without loss or error. The network is unstable and may destroy the data at any time, so every time machine A sends a data packet to machine B, it requires Machine B "acknowledges" and returns an ACK packet, telling machine A that I received it, so that machine A can know that the data transmission was successful. If machine B does not return an ACK packet, machine A will resend until machine B returns an ACK packet.

When the client sends the ACK packet back to the server for the last time, the server may not receive it due to network problems. The server will send the FIN packet again. If the client completely closes the connection at this time, the server will not receive the ACK anyway Package, so the client needs to wait a moment and confirm that the other party receives the ACK packet before entering the CLOSED state. So, how long do you have to wait?

The data packet has a survival time in the network. If it does not reach the target host after this time, it will be discarded and the source host will be notified. This is called the maximum segment life time (MSL). TIME_WAIT waits 2MSL before entering CLOSED state. It takes MSL time for the ACK packet to reach the server, and MSL time for the server to retransmit the FIN packet. 2MSL is the maximum time for the round trip of the data packet. If the FIN packet retransmitted by the server has not been received after 2MSL, it means that the server has received the ACK packet.

4.7 Elegant disconnection-shutdown ()

The difference between close () / closesocket () and shutdown ()
To be precise, close () / closesocket () is used to close the socket, clear the socket descriptor (or handle) from memory, and can no longer be used This socket is similar to fclose () in C language. After the application closes the socket, the connection and cache associated with the socket lose their meaning, and the TCP protocol automatically triggers the operation to close the connection.

shutdown () is used to close the connection, not the socket. No matter how many times shutdown () is called, the socket still exists until close () / closesocket () is called to clear the socket from memory.
When calling close () / closesocket () to close the socket, or calling shutdown () to close the output stream, a FIN packet is sent to the other party. The FIN packet indicates that the data transmission is completed. The computer receives the FIN packet and knows that no more data will be transmitted.

By default, close () / closesocket () will immediately send a FIN packet to the network, regardless of whether there is data in the output buffer, and shutdown () will wait for the data in the output buffer to be transmitted before sending the FIN packet. This means that calling close () / closesocket () will lose the data in the output buffer, while calling shutdown () will not

5. OSI model

TCP / IP divides the network model layer of OSI as follows:

The TCP / IP protocol reference model classifies all TCP / IP series protocols into four abstract layers
Application layer: TFTP, HTTP, SNMP, FTP, SMTP, DNS, Telnet, etc.
Transport layer: TCP, UDP
Network layer: IP , ICMP, OSPF, EIGRP, IGMP
data link layer: SLIP, CSLIP, PPP, MTU.
Each abstraction layer is built on the services provided by the lower layer and provides services for the higher layer. It looks like this

6. Socket commonly used function interface and its principle

Graphic socket function:

6.1. Use the socket () function to create a socket

int socket(int af, int type, int protocol);
  1. af is the address family (Address Family), that is, the type of IP address, commonly used are AF_INET and AF_INET6. AF is short for "Address Family" and INET is short for "Inetnet". AF_INET represents an IPv4 address, such as 127.0.0.1; AF_INET6 represents an IPv6 address, such as 1030 :: C9B4: FF12: 48AA: 1A2B.
    You need to remember 127.0.0.1, it is a special IP address, which means the local address, which will be used frequently in the following tutorials.
  2. type is the data transmission method, commonly used are SOCK_STREAM and SOCK_DGRAM
  3. protocol represents the transmission protocol, commonly used are IPPROTO_TCP and IPPTOTO_UDP, respectively represent TCP transmission protocol and UDP transmission protocol

6.2. Use bind () and connect () functions

The socket () function is used to create a socket and determine various properties of the socket, and then the server-side bind () function is used to bind the socket to a specific IP address and port. The data of the IP address and port can only be handed over to the socket for processing; the client must use the connect () function to establish a connection

int bind(int sock, struct sockaddr *addr, socklen_t addrlen);  

sock is the socket file descriptor, addr is the pointer of the sockaddr structure variable, addrlen is the size of the addr variable, and the
following code can be calculated by sizeof () to bind the created socket to the IP address 127.0.0.1 and port 1234 set:

//创建套接字
int serv_sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
//创建sockaddr_in结构体变量
struct sockaddr_in serv_addr;
memset(&serv_addr, 0, sizeof(serv_addr));  //每个字节都用0填充
serv_addr.sin_family = AF_INET;  //使用IPv4地址
serv_addr.sin_addr.s_addr = inet_addr("127.0.0.1");  //具体的IP地址
serv_addr.sin_port = htons(1234);  //端口
//将套接字和IP、端口绑定
bind(serv_sock, (struct sockaddr*)&serv_addr, sizeof(serv_addr));

The connect () function is used to establish a connection. Its prototype is:

int connect(int sock, struct sockaddr *serv_addr, socklen_t addrlen); 

6.3. Use listen () and accept () functions

In the server-side program, after using bind () to bind the socket, you also need to use the listen () function to make the socket enter the passive listening state, and then call the accept () function, you can respond to the client's request at any time.
Through the ** listen () function **, the socket can be put into passive listening state. Its prototype is:

int listen(int sock, int backlog); 

sock is the socket that needs to enter the listening state, and backlog is the maximum length of the request queue.
The so-called passive monitoring means that when there is no client request, the socket is in the "sleep" state. Only when the client request is received, the socket will be "awakened" to respond to the request.

Request queue
When the socket is processing the client request, if a new request comes in, the socket cannot be processed, and it can only be put into the buffer. After the current request is processed, it will be removed from the buffer. Read it out for processing. If new requests keep coming in, they will be queued in the buffer in order, until the buffer is full. This buffer is called Request Queue.

The length of the buffer (how many client requests can be stored) can be specified by the backlog parameter of the listen () function, but there is no standard for how much, it can be determined according to your needs, and the concurrency can be 10 or 20 .

If the value of backlog is set to SOMAXCONN, the length of the request queue is determined by the system. This value is generally large, which may be several hundred or more.

When the request queue is full, no longer receive new

When the request queue is full, no new requests will be received. For Linux, the client will receive an ECONNREFUSED error

Note: listen () just puts the socket in the listening state and does not receive the request. The accept () function is required to receive the request.

When the socket is in the listening state, the client request can be received through the accept () function . Its prototype is:

int accept(int sock, struct sockaddr *addr, socklen_t *addrlen); 

Its parameters are the same as listen () and connect (): sock is the server-side socket, addr is the sockaddr_in structure variable, and addrlen is the length of the parameter addr, which can be obtained by sizeof ().

accept () returns a new socket to communicate with the client, addr saves the client's IP address and port number, and sock is the server-side socket, everyone should pay attention to distinguish. When communicating with the client later, use this newly generated socket instead of the original server-side socket.

Finally, it should be noted that listen () just puts the socket into the listening state, and does not actually receive client requests. The code behind listen () will continue to execute until it encounters accept (). accept () will block the execution of the program (the code behind cannot be executed) until a new request arrives.

6.4. Reception and transmission of socket data

Data receiving and sending under
Linux Linux does not distinguish between socket files and ordinary files. Use write () to write data to the socket, and read () to read data from the socket.

As we said earlier, the communication between two computers is equivalent to the communication between two sockets. Write () is used to write data to the socket on the server side, and the client can receive it, and then use read () Read from the socket and complete a communication.
The prototype of write () is:

ssize_t write(int fd, const void *buf, size_t nbytes);

fd is the descriptor of the file to be written, buf is the buffer address of the data to be written, and nbytes is the number of bytes of the data to be written.
The write () function writes nbytes bytes in the buffer buf to the file fd. If successful, it returns the number of bytes written, and if it fails, it returns -1.
The prototype of read () is:

ssize_t read(int fd, void *buf, size_t nbytes);

fd is the descriptor of the file to be read, buf is the buffer address of the data to be received, and nbytes is the number of bytes of the data to be read.

The read () function reads nbytes bytes from the fd file and saves them to the buffer buf. If successful, it returns the number of bytes read (but returns 0 at the end of the file), and -1 if it fails.

6.5, socket buffer and blocking mode

Socket buffer After
each socket is created, it will allocate two buffers, input buffer and output buffer.

write () / send () does not immediately transmit data to the network, but first writes the data into the buffer, and then the TCP protocol sends the data from the buffer to the target machine. Once the data is written to the buffer, the function can successfully return, regardless of whether they reach the target machine, and regardless of when they are sent to the network, these are the things that the TCP protocol is responsible for.

The TCP protocol is independent of the write () / send () function. The data may be sent to the network as soon as it is written into the buffer, or it may continue to backlog in the buffer. The data written multiple times is sent to the network at once. It depends on many factors such as network conditions at the time, whether the current thread is idle, etc., and is not controlled by the programmer.

The same is true for the read () / recv () function, which also reads data from the input buffer instead of directly from the network

The characteristics of these I / O buffers can be organized as follows:

(1)I/O缓冲区在每个TCP套接字中单独存在;
(2)I/O缓冲区在创建套接字时自动生成;
(3)即使关闭套接字也会继续传送输出缓冲区中遗留的数据;
(4)关闭套接字将丢失输入缓冲区中的数据

The default size of the input and output buffer is generally 8K, which can be obtained through the getsockopt () function:

unsigned optVal;
int optLen = sizeof(int);
getsockopt(servSock, SOL_SOCKET, SO_SNDBUF, (char*)&optVal, &optLen);
printf("Buffer length: %d\n", optVal);

Blocking mode
For TCP sockets (by default), when using write () / send () to send data:

1) 首先会检查缓冲区,如果缓冲区的可用空间长度小于要发送的数据,那么 write()/send() 会被阻塞(暂停执行),直到缓冲区中的数据被发送到目标机器,腾出足够的空间,才唤醒 write()/send() 函数继续写入数据。
2) 如果TCP协议正在向网络发送数据,那么输出缓冲区会被锁定,不允许写入,write()/send() 也会被阻塞,直到数据发送完毕缓冲区解锁,write()/send() 才会被唤醒。
3) 如果要写入的数据大于缓冲区的最大长度,那么将分批写入。
4) 直到所有数据被写入缓冲区 write()/send() 才能返回。

When using read () / recv () to read data:

1) 首先会检查缓冲区,如果缓冲区中有数据,那么就读取,否则函数会被阻塞,直到网络上有数据到来。
2) 如果要读取的数据长度小于缓冲区中的数据长度,那么就不能一次性将缓冲区中的所有数据读出,剩余数据将不断积压,直到有 read()/recv() 函数再次读取。
3) 直到读取到数据后 read()/recv() 函数才会返回,否则就一直被阻塞。
这就是TCP套接字的阻塞模式。所谓阻塞,就是上一步动作没有完成,下一步动作将暂停,直到上一步动作完成后才能继续,以保持同步性。

TCP sockets are in blocking mode by default

Guess you like

Origin www.cnblogs.com/zhangchaocoming/p/12699662.html