Learning Network Programming No.3 [socket theory and practice]

introduction:

Beijing time: 2023/8/12/15:32, since I updated the article the night before yesterday, I watched Goose Factory's new "Sweeping Drugs 3" and it's not doing well until now. In that way, I don't feel very good. It may also be that the domestic market is not very friendly to Hong Kong films, and the quality is not very high anyway. Then yesterday I spent most of my time chasing the novel "I Want to Seal the Sky", the main reason is that the plot is more intriguing and it makes people want to stop! Er Gen is worthy of being synonymous with classics. Later in the evening, I watched the video for a while, first watched the clip of "The Hunger Platform", and then watched the clip of "The Hunger Games". Then I just woke up from a nap, in order to be able to update the text today, I have to go all out now, although I have been entertaining for a long time, but it doesn't matter, just treat it as a reward for the crazy code words a few days ago! Let's pick up the keyboard today and have a crazy Saturday! Let's learn about sockets in this blog!

insert image description here

In-depth study of the basic process of network transmission

In the previous blog, we focused on the detailed explanation of the network layer and data link layer in the network protocol stack, and understood why routers and switches work at the network layer and data link layer, and for IP addresses and MAC addresses and I have a certain understanding of data transmission, so let's talk about relevant knowledge at this time.

Reviewing encapsulation, unpacking, and distribution
In the previous blog, we focused on explaining the basic process of network transmission, and understood that when a piece of data wants to be sent from host A to host B, because each layer of the network protocol stack has its corresponding protocol and the functions that need to be implemented, so we need to encapsulate each layer-specific protocol into the data when the data passes through the network protocol stack, including the header protocol, identification protocol and function protocol, so only specific to each layer The protocol is encapsulated, and finally when the data is passed down or up, it can be passed to each layer of the protocol in the correct order, and correctly recognized by each layer of the protocol, and then complete the corresponding needs to be implemented in the layer protocol Function, of course, this is also the concept of unpacking and decommissioning, and note that when data is at the transport layer, we call it a message, at the network layer it is called a datagram, at the data link layer it is called a packet, at the physical layer is called a data frame.

In-depth understanding of MAC address and IP address
In the previous blog about the data link layer and network layer, we briefly understood the MAC address and IP address, and understood that the MAC address is the unique identification of the switch for a network card when data is transmitted in the LAN character, and the IP address is for data transmission in the wide area network, and the unique identifier of the router for a certain host. And understand that because the IP address of the host is applied to the DHCP protocol in the network through the router, the IP address of the host is naturally recorded in the routing table of the corresponding router, that is, when some other host wants to send data in the WAN When sending to the target host, just let the corresponding router find the target router according to the IP address of other routers in the routing table , which means that the target host has been found. After the target router is found and the data is sent, note: Although the routing table of the target router There is the IP address of the corresponding host, but the IP address only serves as an identification, and cannot directly receive data (only the network card, that is, the MAC address, has the function of receiving data), so the most critical problem at this time is to let the router obtain the MAC address of the target host, then the question is how the router obtains it What about the MAC address of the target host? First of all, understand that it needs to be divided into two situations at this time. One is that there is a direct connection between the router and the target host, and the other is that there is no direct connection between the target host and the router. What does it mean? That is to say, it is necessary to determine whether the host is in the subnet of the router (analogous to the school host and its own laptop). When our laptop uses WIFI, it means that there is a connection between the host and the router. Otherwise, the school computer room There is no connection between the computer and the router (whether the IP address interface provided by the router itself is obtained). Situation 1: That is, there is a connection, then the router and the target host can directly interact through the IP address interface provided by the router (similarly, the target host obtains its own IP address), and directly tell the router its own MAC address, and finally the router Directly transmit all the data to be transmitted to the target host according to its own forwarding table and interactive interface. Situation 2: On the contrary, there is no connection between the router and the target host. At this time, the router needs to hand over the data to the switch first, and then transmit it to the target host through the switch. However, there is a problem at this time, that is, the router How to find the switch? Similarly, the router needs to determine which switch to send the data to according to the relationship between the MAC address of the target host in its forwarding table and the switch. That is to say, the router needs to obtain the MAC address of the target host at this time, and the problem arises. Now, how does the router obtain the MAC address when there is no direct relationship between the router and the target host? The answer is: a network protocol called ARP is involved at this time, well understood, that is, when a router obtains the IP address of the target host, according to the ARP network protocol, the router will perform ARP broadcast at this time, that is, the IP address of the target host will be sent to all hosts connected to the switch through the switch , when the target host finds that its IP address is the same as the received IP address, the target host will perform an ARP response operation, that is, send its own MAC address to the router, and the router will successfully obtain the target host MAC address, which is the ARP network protocol. When we use this method to let the router obtain the MAC address of the target host, the essence is to make the identification of the IP address more substantial (IP address -> MAC address), then the router can use the MAC address in the forwarding table The relationship with the switch, find the target switch associated with the target host, so as to realize the transfer of data from the router to the switch and then to the network card. Note: Although the router has also obtained the MAC address at this time, because there is no relationship between the router and the target host, the data cannot be transmitted directly and must pass through the switch, which is somewhat different from the above situation 1. After understanding these, we have a deeper understanding of the MAC address and IP address and the detailed knowledge of the data transmission process. Let's look at the sub-picture together to understand the data transmission process in the WAN (IP address) ( Combining packaging and sharing understanding), as shown in the following figure:
insert image description here
As shown in the figure above, different from the data transmission process in the LAN, the IP address is an essential part in the WAN at this time. Combining the previous knowledge about encapsulation, unpacking and decommissioning and the knowledge of the data link layer, at this time We can have a deeper understanding of why the data link layer can support a variety of different protocols. The essence is that in the process of data transmission, each layer protocol needs to have a corresponding header if it wants to be recognized. When we are in When sending data, the data link layer encapsulates the data packet according to the Ethernet protocol. If the switch wants to transmit the data to the router, the premise is that it completes the unpacking process of the Ethernet protocol according to the Ethernet protocol. After the completion, it will be distributed according to the protocol. At this time, the data packet can be successfully transmitted to the router. At this time, the router will go through a series of query routing table processes to find the router with the target IP address. Finally, when we want to transfer When the data is transmitted from the router to the target switch, it is necessary to encapsulate the protocol used by the target switch (Ethernet protocol, token ring protocol, wireless LAN protocol). Only in this way can the switch work normally, so according to this principle (Encapsulation, distribution, unpacking) can solve the problem of inconsistency in the data link layer protocol. Of course, why the inconsistency was discussed in the last blog when we talked about the TCP/IP protocol stack, we emphasized it. So this is why the data link layer may not be consistent when the IP protocol and TCP protocol of the network layer and the transport layer are consistent.

Note: After understanding the above knowledge, coupled with the above-mentioned in-depth understanding of IP addresses and MAC addresses, we can say that it is invincible after we understand the last point at this time, that is why the source IP address and the target IP address are different. Change, but the MAC address is changing. If you understand this point, you will have a feeling that you can see the moon. From the above, we understand a very important concept: that is, the IP address is a kind of identification, and the MAC address is the only one. It is substantive, the reason is that the MAC address represents the network card, and only the network card has the ability to transmit and receive data, that is to say, in the wide area network, the process of the router querying the routing table to find the target IP is essentially a process of finding the MAC address , that is, when a router wants to transmit data to the next router (abbreviation: next hop), this process is similar to how we mentioned above how the router obtains the MAC address, that is, whether the router finds the target router or not, it wants to complete The next hop, at the first time, it must perform an ARP broadcast (same as mentioned above), and then a router finds that there is a target IP address in its routing table, then it will respond with ARP and send its own The MAC address is sent to the corresponding router, and finally the router obtains the MAC address of the target router in the same way, and then transmits the data to the target router according to the target MAC address (and records the mapping relationship between the IP address and the MAC address, and the next transmission after recording , you can directly find the MAC address, that is, you don’t need to perform ARP broadcast), and the next process is all the same. After understanding this point, the small data transmission process is like chopping melons and vegetables, and the concept is completely settled. As shown in the figure below, it is the IP address and MAC address on our cloud server:

insert image description here

What is a socket

After understanding the above knowledge, let’s take a look at the topic of this blog. Don’t panic about socket-related knowledge. I have always emphasized that the best way to understand a strange thing is to learn it by analogy. Knowledge, first understand that sockets are a network communication method implemented by the transport layer in the network protocol stack using the TCP/UDP protocol, that is, the operating system provides a set of interfaces (sockets) for us to use at the transport layer. Allows us to implement network communication (system calls). Then, because the essence of network communication is inter-process communication, sockets, like message queues, shared memory, and pipes, are an inter-process communication method, but the former can not only implement inter-process communication within the system (local sockets) word), can also realize network communication (network socket), and the latter can only be used for inter-process communication. Of course, why the essence of network communication is inter-process communication, we explain in detail below.

Looking at sockets from the perspective of network data transmission
After having a certain understanding of the key concepts of the above sockets, at this time we understand sockets as a kind of network communication, and network communication as a kind of inter-process communication. Knowledge of spelling. First of all, at this time, we are already passionate about the network data transmission process, and we know the process of data from host A to host B well, so we understand that the data from A to B is not our purpose, our purpose is to let host B receive the data Afterwards, you can access a certain service, and then return the corresponding service to host A, and according to the knowledge of the network protocol stack, we know that the process of applying for and accessing the service must be implemented at the application layer, so there are customers at this time The concept of client and server (server), that is, the essence of the network communication between two hosts is the communication between the client (APP) and the server (the function realization of the APP). After understanding this point, this It can be said that the concept of network communication is inter-process communication is more clear, but it is not enough. We can also know that when we need to obtain a certain function through a certain client (APP), this operation can be done anytime, anywhere. It can be completed, which means that the service we visit must be started and running all the time, so after understanding these two points, we can understand the client and server as two processes. When we visit a certain A client (APP) is starting a process, and the server process is a started endless loop process. In short: network communication is inter-process communication.

In-depth understanding of network communication
After understanding the above knowledge, the concept of socket is a kind of inter-network communication and a kind of inter-process communication. We are already clear and ready to come out. At this time, we can deeply understand the process of data from host A to host B. Create a process for host A (the application layer opens APP), and then the process executes the corresponding code (creating sockets, etc.), and then according to the TCP/UDP protocol and sockets (establishing contact with host B), the data is now It is transmitted to host B, and then host B reaches the transport layer through the physical layer, data link layer, and network layer, and the socket on the transport layer can push the corresponding data to the process specified by the upper layer.

At this time, we still can’t understand the above passage well. Of course, we will not emphasize the knowledge about data encapsulation of IP addresses at the network layer and routers and switches. At this time, the problem we need to solve is: how does the socket transfer data What is the essence of pushing to the process specified by the upper layer and the socket? The first question: that is, there are so many processes in the system, how does the socket find the target process? Understand, similar to the IP address, when a host wants to achieve network communication, the premise must be that it has obtained the IP of the target host and the identity of the target process, because the essence of network communication is inter-process communication, so at this time as long as a certain If the host and a process are identified, then in the entire Internet, we can find the unique server process, thereby successfully completing the communication between the two processes. Knowing this, the concept of port number arises spontaneously. The port number is used to identify the only process in that network, so for the inter-process communication method implemented by using source IP, source port number, target IP, and target port number, we use Called socket (socket) network communication. Therefore, the socket can directly find the target process according to the port number, and then copy the corresponding data to the buffer of the process. Note: there are many process identifiers (PID) in the operating system for managing processes, but in network communication In order to identify a process, we only use the port number, which can reduce the coupling between the process management module and the network module.

In-depth understanding of receiving and sending data
After completing the above knowledge, we have a deeper understanding of network communication, understand that network communication is inter-process communication, and understand that inter-process communication needs to use the target IP and target port number to identify two unique processes , Understand the socket communication method, of course, how to realize this communication method with the specific socket, we will discuss in detail in the following and the next blog, the essence is the use of various socket interfaces, so the essence of socket network communication is actually called socket programming interface. After this is done, let's take a look at the last knowledge point about network communication, the specific process of receiving and sending data by a process: the first is receiving, we talked about the knowledge of combining the network protocol stack with the operating system before The network protocol stack is related to the file system. This knowledge is fully reflected, that is, when the target process starts, the user can open a file in the process. After the file is opened, according to the knowledge of the file system, the operating system will Create a struct file structure for it, which contains information such as the inode and buffer of the file, and then when the socket corresponding to the target process receives the data, the socket will be processed according to the TCP protocol and payload (Data) Find the port number of the target process, then the socket finds the opened file according to the file descriptor of the target process corresponding to the open file, and finally copies the data in its own buffer to the file buffer. The second is to send: similarly, create a process (start APP), open the file in the process, write the data into the buffer of the file, and then copy the data in the buffer to the file corresponding to the process through the operating system. In the socket buffer, the socket then sends the data to the target host through the network protocol stack. Through the above method, the process can well realize how to receive and send network data by means of files.

Understand the TCP protocol and UDP protocol

After understanding the above knowledge about socket network communication, let's briefly understand the TCP and UDP protocols, because these two protocols are transport layer protocols, and sockets (sockets) work at the transport layer. In short That is to say, we can use different protocols to realize socket network communication. The following are the characteristics between TCP and UDP, that is, the differences, as follows:

1.TCP
The TCP protocol is a transport layer protocol, also known as the Transmission Control Protocol. It is a connection-oriented, byte-stream-oriented reliability protocol. How do you understand it? Connection-oriented means that before the data is sent, the sender and the receiver need to establish a connection in advance, and byte-oriented means that the protocol has no control over the reading/sending of data, and you can read as much as you want to read/send, while Reliability means that the use of the TCP protocol will ensure that the corresponding data is received by the receiver. If it fails halfway, the TCP protocol will perform operations such as retransmission. Anyway, it uses various means to ensure the success of data transmission (data analysis) segments, flow control, congestion mechanisms, etc.).

2. UDP
The UDP protocol is also a transport layer protocol, also known as the user datagram protocol. It is a connectionless, datagram-oriented and unreliable protocol. The understanding of TCP is similar to that of TCP. At this time, the understanding of UDP is relatively simple. , the UDP protocol does not need to establish the connection between the sender and the receiver before sending the data, and must read/send the data completely when reading/sending the data, and the unreliability means that after the sender sends the data, there is no Then care about whether the receiver receives the data.

To sum up: TCP is suitable for scenarios with high requirements for data transmission reliability, while UDP is suitable for scenarios with high requirements for real-time performance and low requirements for data transmission reliability. [For further understanding, please refer to this link: In-depth study of TCP/UDP ]

network byte order

After getting acquainted with TCP and UDP, we have taken another step forward in socket programming at this time. In the future, we will definitely distinguish the respective characteristics of TCP and UDP protocols, and then realize different socket network communications, that is, TCP network communications and UDP networks Communication, but before that, we need to understand the knowledge related to network byte order. The essence of this knowledge is to solve the problem of using big-endian storage or little-endian storage for data, as shown in the following figure:

insert image description here
As shown in the figure above, at this time we understand that when multi-byte storage is performed in a computer, due to architectural reasons, not only big-endian storage is supported at this time, but also small-endian storage is supported, so if a machine with big-endian storage and Network communication between machines with little-endian storage, then there will be a problem of data being reversely stored at this time, which will eventually affect the correctness of the data. Therefore, in order to solve this problem, the concept of network byte order is proposed at this time , it is stipulated that the network byte sequence must be big-endian, so the data obtained in the network must be big-endian storage, that is, if the sender is a little-endian storage machine, then it needs to use the The data is converted from little endian to big endian. Similarly, if the receiver is also little endian, then it needs to convert the data from big endian to little endian when receiving data. This is the concept of network byte order.

Note: If our host is also little-endian, then we also need to complete the work of converting little-endian to big-endian when writing code in the user layer, so at this time the operating system provides us with some interfaces, including: / (host- htonl、ntohl(32位)> htons、ntohs(16位)network /Network->Host), the specific use will be discussed in detail later.

Know the socket programming interface

Function corresponding interface
Create socket file descriptor (TCP/UDP, client + server) int socket(int domain, int type, int protocol);
Binding port number (TCP/UDP, server) int bind(int socket, const struct sockaddr *address, socklen_t address_len);
start listening socket (TCP, server) int listen(int socket, int backlog);
Receive request (TCP, server) int accept(int socket, struct sockaddr* address, socklen_t* address_len);
Establish a connection (TCP, client) int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

(We will focus on the introduction and use of the above interface in the next blog)

Summary: We will introduce socket-related knowledge here today, and the specific socket programming will be seen in the next blog, See you!

Guess you like

Origin blog.csdn.net/weixin_74004489/article/details/132249545