[Linux] Network basics + UDP network socket programming

It is a luxury to only do what you like to do and not be dragged forward by society and the times.
insert image description here



1. Network foundation

1. LAN and WAN

1.
First of all, the computer is a tool designed by humans to improve productivity, and the continuous development of human civilization to this day must be inseparable from the cooperation between humans. Since humans need to cooperate to complete more complex tasks and problems, computers are used as tools for humans. Naturally, collaboration is also necessary, and the collaboration between computers is actually network communication, that is, data exchange between various hosts.
So we can conclude that the emergence of computer networks is inevitable.
The computers at the beginning were indeed independent of each other. If they wanted to communicate, they could only artificially copy the data to the U disk, and then insert the U disk into another host, and let the other host do it. Network communication, as long as it is a work involving people, it must be inefficient, so in order to avoid this inefficient communication method, the first version of the communication solution came up with a server, that is, multiple hosts communicate through a server. For network communication, each host can send its own data to the server, and other hosts can directly read data from the server if they want to get the data.

insert image description here

2.
When the communication distance becomes longer, it is obviously not enough to provide services to multiple hosts through one server, so with the emergence of LAN, for example, two hosts in Guangzhou and Inner Mongolia or more hosts need to be connected. Communication, at this time, switches, hubs, routers and other devices are needed to connect a large number of computers.
But if you say connect, you connect, what do you say with your mouth? In fact, many things are involved and involved behind the scenes. In fact, LANs need to communicate with each other. Homes, offices, and campuses are all LANs. LANs must communicate with each other, but the distance between LANs may be very far. How to communicate? Is it possible to pull an optical fiber to achieve the connection? This is certainly unrealistic.
An important role in completing the connection work is the operator. As early as the rise of the network, that is, in the 1960s and 1970s, the government saw the inevitability of the trend of the future network era and prompted operators to complete the construction of base stations in various regions at that time. The laying of optical fiber cables, the establishment of a large-scale switching machine room, and the establishment of a data conversion center have laid the foundation for network communication. What we have done is just fiber optic access, pulling the network cable into the home, configuring a router, and configuring a modem. That's all. Therefore, the connection between LANs is not that simple. It is not enough to connect a router like the logic diagram below. There must be someone behind him to carry the burden for you and do a lot of low-level work to make your current network Communication has become so simple, and this person is the operator, so you can think that the infrastructure of network communication is driven by the country and operators and succeeded.
Therefore, the network did not grow for no reason. All the things you don't know must have been done for you , and what we do is only fiber-to-the-home.

insert image description here
3.
The WAN is to connect computers thousands of miles away to make it more convenient for people all over the world to communicate. How can this be done? The answer is the same as the previous one. Someone has laid the foundation for communication, built base stations, laid fiber optic cables, etc. for us.
A wide area network is usually formed by interconnecting multiple local area networks through high-speed network connection equipment routers, and the Internet is a typical wide area network. The connecting devices of the WAN mainly include routers, modems, gateways, etc., and the connecting devices of the LAN mainly include routers, hubs, switches, etc.
(Gateway is a general concept. It can be hardware, such as routers, switches, etc., or software, such as firewalls, proxy servers, etc. It usually has the conversion function of different networks (ipv6 and ipv4), and the protocol ( HTTP conversion to FTP) conversion and data exchange function.)
LAN usually uses twisted pair, optical fiber or wireless technology as the transmission medium, wide area network uses a variety of transmission media, including optical fiber, satellite, microwave, telephone line, etc. .

Before realizing the interconnection between LANs, there must be a link of spending money. Laying the infrastructure of network communication must require money, manpower and a lot of time.
insert image description here

2. Introduction to protocols and layering of network protocols (TCP/IP four-layer model)

1.
There are a lot of hardware in a computer. It is the cooperation between these hardwares that can make our computer run healthily. Of course, this is also inseparable from the management of the hardware by the software.
Suppose we have a bigger brain, pull out the various hardware in the computer, and put them in far, far places, and the computer can still run normally, relying on the protocol, and the essence of the protocol is a kind of agreement. The relationship between hardware and hardware There are also protocols between them, such as the HBA protocol of the disk, and there are also protocols for disk and memory IO, so the protocol is not only exclusive to the network, but also has protocols in the computer architecture. Isn't the inside of the computer equivalent to a small network? Each device is connected through a network cable, and there is an agreement between the devices, so each device can communicate normally, so that the computer can provide good services for the user, so the network and the computer are not separated, there is a network in the system structure, and a system in the network structure!

insert image description here
2.
After talking about the relationship between computers and networks, we still haven't talked about protocols.
Actually, in the network, the essence of all network problems is that the transmission distance becomes longer. If the transmission distance is very short, do you still need to worry about the loss of sent data? Or the other party cannot receive the data, or the other party has not received the complete data. Naturally, there is no need to consider these problems, because the transmission distance is very short, there are few obstacles in the middle, and the probability of error is very low. And when the transmission distance is too long, it is easy to cause problems, such as the weakening of the transmission signal, resulting in data loss, so once the long-distance transmission, it will introduce new communication problems, and in order to reduce the cost of communication as much as possible, You need a custom agreement! ! !
We make an agreement in advance, reach a consensus between the two parties, and let the two communicating hosts agree on an agreement, and the agreement will reduce the cost of communication.

3.
But after the agreement is in place, will everything be fine in network communication? There are so many computer manufacturers, there are so many operating system manufacturers, there are also many disk manufacturers, and there are many other hardware equipment manufacturers. If each manufacturer has its own protocol, how can the various computers produced communicate with each other? ? Only computers in one manufacturer can communicate, which is obviously not possible, so at this time, one person needs to stand up and customize a unified network protocol standard. This network protocol is the TCP/IP protocol, which was officially replaced by the TCP/IP protocol in 1983. NCP has become the network protocol standard that most of the Internet complies with. (It’s OK if you don’t follow the rules. You won’t be able to access the network. If you can’t access the network, you won’t be able to communicate. Your device can play by itself.)

4.
In 1977, the International Organization for Standardization proposed the OSI seven-layer network model. Why should it be layered? Because in the process of network data transmission, you need to face many problems, such as the physical layer, driver layer, software layer, user layer, etc., all have their own network transmission problems that need to be solved, and layering is actually decoupling. The first layer is a module with relatively concentrated functions and high cohesion, which is used to deal with the network transmission problems faced by this layer, and the coupling between layers is low.
And each layer has its own matching protocol, and each layer protocol is used to deal with the transmission problem of the current layer.
Although it is a seven-layer model, in actual use, the upper three layers are compressed into one layer, which is collectively called the application layer, so what we usually talk about is the TCP/IP four-layer or five-layer model, and we do not consider the physical layer.

insert image description here

5.
Let’s first warm up and understand the TCP/IP four-layer model through a life example. For example, if you are going to travel by bicycle from Beijing to Yunnan, the first problem you face is not how to go to Yunnan, but how to go to Yunnan. Where to go at the first stop, where to go at the second stop, and finally to Yunnan, so the first problem I face is how to get to the next stop, such as riding to Henan first. Then the second question is how to get to Henan, is it going north? Or go south? So we must have the ability to choose a path. And is it possible to ride in the wrong direction? Of course there are, so we must also have the ability to tolerate and correct errors. After we finally arrive in Yunnan, we have to start playing. In fact, the corresponding computer is to process data. This is the four-layer model in the life example, and they correspond to the data link layer: deliver the data to the next host connected to itself. Network layer: plan the route (line) for data transmission between two hosts through the routing table. Transport layer: Responsible for reliable data transmission between two hosts, such as the transmission control protocol TCP, that is, to provide error tolerance and error correction capabilities. Application layer: Responsible for the communication between application programs. Network programming is mainly aimed at the application layer. There is a system call interface provided by the operating system between the application layer and the transport layer.
TCP and IP are the big brothers of the transport layer and network layer protocols, so our model is called the TCP/IP five-layer model. The network layer and the transport layer are implemented inside the operating system, so the code we write must call the application to access the lower layer A system call between the layer and the transport layer.

insert image description here

insert image description here

6.
Physical layer : Responsible for the transmission of photoelectric signals. The coaxial cables used in the early Ethernet are now generally used in cable TV, and the network cables used in the current Ethernet are twisted pairs. The capability of the physical layer determines the maximum transmission rate, transmission distance, anti-interference and so on. For example, the hub Hub works at the physical layer, and the hub is responsible for re-amplifying the signal strength that is about to attenuate, so that the transmission distance becomes longer. For example, optical fiber and optical modem are all working on the physical layer, as well as routers and so on. The optical fiber is inserted into the modem optical cat. The cat is mainly used to convert digital signals into analog signals, or convert analog signals into digital signals. Analog signals are suitable for network transmission and local area network interpretation. Generally, optical fibers are used to transmit analog signals to optical cats. Converting the analog signal into a digital signal of 0101, what the router really recognizes is the binary sequence of 0101. The network card is also a device at the physical layer, responsible for converting various network signals.
Data link layer : perform conversion between data frames and binary bit streams, perform data error checking and correction, drive network card devices, have standards such as Ethernet, token ring network, and wireless LAN, and switches work at the data link layer
Network layer : Responsible for address management and routing selection. In the IP protocol, the network identifies a host through IP, and the router plans the route for data transmission between two hosts through the routing table. The router works at the network layer.
Transport layer : The reliability of data when two hosts transmit data through certain transport protocols.
Application layer : Network programming is mainly aimed at the application layer, because the following four layers have already been written. Our programming is mainly for communication between app programs at the application layer, such as email transmission, network remote access, etc. .

insert image description here

insert image description here

7.
The current network equipment at each layer has already broken through the limitations of the previous TCP/IP five-layer model.

insert image description here

3. MAC address and IP address (subnet mask, routing table, IP address cross-network communication, Ethernet and Internet)

1.
Generally, when the data is actually sent, more data will be sent. This part of the data is called the protocol header. Since each layer of the model has its own protocol, the data must carry the protocol header of this layer when it is passed down, so that After the data flows to the lowest physical layer, the data is transmitted to the host of the other party. When the host of the other party delivers the data upwards, each layer of the host of the other party can understand my protocol, so the data must need a header when it is sent. , and the specific data content we like to call a message, which is equivalent to a newspaper. The header is the agreement of the newspaper, such as the typesetting in the play, similar agreements such as Xinhua News Agency, etc. These things are generally placed on the first line of the newspaper And it is the kind with enlarged font, which is very eye-catching, that is, the masthead.
The protocol header describes the content of the layer protocol.

insert image description here

2.
Two hosts in the LAN can communicate directly, because each host has its own "name", each host has its own network card, and the network card has its own address, this address is MAC address. The MAC address is 48-bit binary data, that is, a 6-byte number, and the MAC address can be explained in hexadecimal. For example, the physical address of my network card is expressed in hexadecimal.

insert image description here

3.
The MAC address is the address of the data link layer, which is used for communication between hosts in the same network. If the target host and the sending host are in the same network, then the packet can be sent directly to the MAC address of the target host, so the MAC address is assigned in the local network. Different networks may use the same MAC address.
The IP address is the address of the network layer, which is used for routing and addressing between different networks, so as to facilitate data transmission between different networks.
If the target host and the sending host are not in the same network, then the sending host needs to send the data packet to a specific router first, and the router will determine where the next hop of the data packet is through the routing table and the IP address of the sending host. The routing table records the ip addresses of different networks and hosts and information about how to reach these ip addresses. So where is the ip address? The ip address is actually in the header of the datagram. The third-layer network layer protocol in the header contains the ip address of the datagram, and the core working position of the router is also at the network layer, so the router can understand the network layer. The IP protocol determines to which network the next hop of the sending host's datagram should be sent.
Protocols above the transport layer use IP addresses to identify the location of hosts. IP addresses are identifiers assigned to hosts on the Internet. They are globally unique and can be routed globally. The MAC address is allocated in the local network, and different networks may use the same MAC address. Therefore, in the cross-network communication scenario, the MAC address cannot be used. In addition, because the MAC address is 48 bits, compared with 32 bits The bit IP address is 2 bytes long, which is not conducive to use in the communication protocol, so the protocol up the transport layer usually uses the IP address to identify the location of the host, because the IP address is across the network, and the MAC address Only for local network distribution. Common protocols that use IP addresses to identify host locations include HTTP, FTP, SMTP, SSH, etc. In these protocols, IP addresses are used to establish connections and exchange data between apps, and routers use IP addresses to forward and route datagrams. This is to achieve cross-network communication.
There are so many concepts such as cross-network and different networks mentioned above, so what are different networks? How to tell the difference?
The IP address is composed of two parts: the network address and the host address. If the network addresses in the IP addresses of two hosts are the same, then the two hosts are in the same network, otherwise they are not in the same network. The subnet mask is a mask paired with the IP address, which is used to divide the IP address into the above two parts. If the IP address and subnet mask of the two hosts are bitwise and the result is the same, then the two hosts are on the same network. It should be noted that different networks have different subnet masks, which are paired with IP addresses.

4.
Ether is the cosmic space in ancient Greek mythology, and it is often considered the basis of all things.
Ethernet is a data link layer protocol that uses electromagnetic signals to propagate in Ethernet. It uses MAC addresses to identify each device on the local network. Each Ethernet card will be assigned a unique MAC address when it leaves the factory, as the Ethernet card. permanent identification.
Ethernet data transmission relies on air and cables as transmission media, and these two media are collectively referred to as Ethernet. Ethernet cables include both black frequency cables and coaxial cables. These cables can connect computers and other network devices, such as switches and routers, to achieve interconnection within a local area network. For example, use UTP (uncorrected twisted pair) to connect the computer to the switch, and then the switches are connected to each other to form a local area network.
In the local area network, if it is an environment that requires high freedom of mobility and easier deployment, you can also choose to use WIFI technology. The wireless signal passes through the air (Ether) as the transmission medium, and the data is mapped to high-frequency electromagnetic waves for propagation. For example, WiFi IEEE 802.11a and other standards, as well as BlueTooth, ZigBee Zifeng (similar to Bluetooth)
wired Ethernet can achieve stable and high-speed connection through cables, and wireless Ethernet can provide high freedom and convenience.
Ethernet is the physical basis of the Internet. The Internet is formed by the interconnection of many LANs and WANs. The most common connection technology in LANs is Ethernet. Ethernet defines the standards of the physical layer and the data link layer, and provides efficient local data transmission for the Internet through Ethernet cables or Ethernet wireless signals, that is, data transmission within a local area network. The Internet realizes the interconnection between different LANs through devices such as routers on the basis of LANs established by Ethernet. TCP/IP provides the network layer and transport layer, and Ethernet and TCP/IP work together to realize the Internet. The Internet uses IP addresses to identify network devices, and Ethernet uses MAC addresses to identify devices. The two correspond through the ARP (Address Resolution Protocol) protocol to realize Internet (Internet) communication.
127.0.0.1 is a special IP-like address called the local loopback address. It is a special IP address reserved in the IPV4 address segment, which can only be used for the current computer itself, and other computers cannot access this address. So 127.0.0.1 is the computer itself. For a request to access this address, the data packet will go around the protocol stack from the sending host and then return to the host itself. Generally used for local testing of network code, or local access to Apache Web server software

insert image description here

5.
The IP protocol generally has two versions, ipv4 and ipv6. The mainstream use is still the ipv4 protocol, but ipv4 is a 4-byte 32-bit integer, and there are only 4.29 billion ip addresses at most, so IP addresses must be distributed globally. It is not enough, ipv4 addresses generally like to use dotted decimal to express, ipv4 is divided into public network ip and private network ip, because the public network ip is not enough, so the new technology of private network ip is introduced.
The ipv6 protocol developed by our country is an integer of 16 bytes and 128 bits, and there are at most 2^128 addresses. It claims to be able to assign an ip address to every grain of sand on the earth. It is too long, so the ipv6 address is generally expressed in hexadecimal. The ipv6 protocol is better in my country, but it is difficult to implement the ipv6 protocol in the wide area network. However, in fact, many domestic companies have already used ipv6 addresses in their computer rooms. China's intranet and local area network have begun to use the ipv6 protocol.
The IP address printed on the right is not the real public network IP, it is actually the LAN IP created by Tencent (I use Tencent Cloud Server) to identify each cloud service host.
The Mac address is the ip address working in the local area network. You can use the ipconfig and ifconfig commands to print out the MAC address under win and linux respectively. The IP address is usually used in the wide area network, but in fact the ip address can be used in both the local area network and the wide area network, but let's not talk about the use in the local area network, only in the wide area network.

insert image description here

insert image description here

4. Communication diagram of LAN and WAN (wireless LAN and wireless Ethernet)

1.
The following is a simple host communication diagram inside a LAN. For example, when MAC1 host asks MAC7 if it has eaten, only MAC7 will receive the corresponding message and respond. Although other hosts receive the message, they will not Respond, why? It is because the Ethernet protocol header information in the data packet sent by MAC1 contains the MAC address of the target host, and other hosts that receive the datagram will check whether their own MAC address is the same as the address of the protocol header, and if they are the same, they will perform The unpacking and demultiplexing of data packets, if they are not the same, nothing will be done.

insert image description here
2.
The following is the flow chart of data packet transmission during LAN communication under the TCP/IP four-layer model. First, you can see that the hosts in the same LAN do not need a router to obtain the IP address of the sending host when communicating with each other. It is possible to deliver data packets directly through the Ethernet protocol, but it should be noted that the data packets must pass through the underlying physical layer, but it is not shown in the figure.
From the application layer of the upper layer to the data link layer of the second layer, each layer has its own protocol, so when the user sends data to the target host, the first is to deliver the data packet to the lower layer to the underlying protocol, each layer All must encapsulate the protocol header of their own layer. After reaching the data link layer, since the sender and the target host are in the same network segment, the location of the target host in this network segment can be determined directly through the MAC address in the Ethernet protocol, and then the data packet is delivered to the target The data link layer of the host (delivered to the physical layer first), the target host will unpack and divide the data packet, each layer will separate the header and payload, and then deliver the payload upward. The process of delivering the payload upward is called For separate use, each layer of the target host corresponds to the protocol header of each layer of the sending host, and the target host can understand the layer-by-layer encapsulation protocol of the sending host. After delivering the payload to the upper layer, the target host The application layer can get the corresponding data content.
A more visual example is this, you and your friends use QQ to chat too, the first thing you do is to open QQ, QQ is the application layer software, and then you choose a friend to chat with and send the message A message, your message is not directly sent to the other party's QQ software, your message needs to be encapsulated with the protocol header of the network protocol stack of the host, and then find the other host through the Ethernet protocol of the data link layer of the host MAC address, deliver the data packet to the host of the other party, and the host of the other party needs to unpack and divide the data packet upwards until the message reaches the QQ application layer software of the host of the other party, thus completing a communication process between you and your friend .
In network protocols, we can think of direct communication between protocols at the same layer. It can also be understood that data packets will be delivered downward on the sending host for protocol encapsulation, and data packets will be delivered upward on the target host to separate the protocol header and payload. , these are two different cognitions, but these two cognitions do not conflict, but complement each other.
Here are two more small questions. The answers to these two questions are implied in the above transmission process. How to judge the location of the header and payload of the message? Because the message is just a string of binary data, how to distinguish the position of the header and the payload when the same layer protocol unpacks and divides it? After separation, how to judge which protocol to deliver its payload upward to the upper layer?
In fact, the answers to the above questions are all in the protocol header. All protocols will face the above two problems. The header must cover the location of the payload, which protocol should be delivered to the upper layer, and other information, so the target host is unpacking. There will be no problem when using it.

insert image description here

3.
In addition to the common wired Ethernet, which uses cables as the transmission medium, there is also wireless Ethernet. Wireless Ethernet is the Ethernet that uses air as the transmission medium we mentioned above. The so-called wireless LAN is a broader concept, which includes local area networks built using various wireless technologies, including wireless Ethernet, which is a wireless LAN technology that adopts the IEEE 802.11 standard.
The so-called WiFi is the brand name of the IEEE 802.11 standard, which is used to identify products using the 802.11 standard, and the WIFI technology is the brand name of the wireless Ethernet . So WiFi is the result of commercializing wireless Ethernet, WIFI technology = IEEE 802.11 standard = Wireless Ethernet (Wireless Ethernet).
In addition to WIFI, wireless LAN also includes technologies such as infrared, BlueTooth, and ZigBee.
In fact, wireless LAN is a variant of Ethernet, which is a wireless LAN technology based on Ethernet. Of course, Ethernet also has its own wireless LAN technology based on the IEEE 802.11 standard .

The following are several common local area networks. The introduction of Ethernet and wireless LAN is very clear. The only thing that needs to be mentioned is the token ring network. The token ring network is like a mutex lock. When the sending host sends a message, it will connect the token. The token ring is delivered to the target host together, and the data can only be sent when the host holds the token ring. The principle is very simple.
When sending a message in a LAN, only one host is allowed to send a message in the LAN at any one time, otherwise the message may collide during the sending process, so there is another professional term for the LAN called a collision domain.
Network resources in the same LAN are actually shared resources, and each host in the network segment can access network resources to send and receive data.

insert image description here

4.
The following is a schematic diagram of network communication that needs to participate in ip address routing through routers. You can temporarily understand it as WAN communication, but in reality this is not accurate. Later articles will explain this place in detail, or you can understand it as LAN Schematic diagram of communication with LAN.
The transport layer and application layer above the network layer are the same as the communication process in the same local area network. The difference is that when the data packet of the sending host reaches the data link layer, the data packet cannot be directly delivered to the Ethernet protocol of the target host. , because the sending host and the target host are not in the same network segment, and the next hop location of the data packet cannot be confirmed through the MAC address, so the data packet must first be depacketized and distributed at the Ethernet protocol layer, and the data packet will be delivered to The router at the network layer, the router will determine the IP address of the host to send the data packet through its own routing table, and then encapsulate it downwards, and deliver the data packet to the data link layer command of the target host with the specified IP address. The card ring driver, the subsequent work is to unpack and divide the data packets upwards, until the data packets are transmitted to the application layer, and the target host gets the corresponding data.
The router at the network layer performs depacketization, demultiplexing and repackaging of data packets, with the purpose of shielding the underlying network differences of different LANs and providing users with faster, more stable and healthy network communications. For example, in the following communication process, the underlying protocols used by the sending host and the target host are Ethernet protocol and token ring network protocol respectively, but the router can shield the differences between the different protocols of the lower layer at the network layer, how to shield it? Determine the location of hosts in different network segments through ip addresses to block. (It is worth noting that the token ring network is rarely used at present, and the data link layer usually uses Ethernet)

When chatting with several devices under the same wifi in normal life, it will not go through the router, and the location of different devices can be determined only through the MAC address of the Ethernet protocol.
insert image description here

5. Data packet encapsulation and depacketization (data segment, datagram, data frame)

1.
In the transport layer, the data packet is generally called a data segment, the network layer is called a datagram, the data link layer is called a data frame, and the application layer is called a request and response. After the data is packaged into a frame through the protocol stack, it will be sent to the target host through the transmission medium network cable (many types), and the target host will strip the header of the layer-by-layer protocol, and then send the payload to the protocol specified by the upper layer according to the header information. deal with.

insert image description here

2.
The following figure is a schematic diagram of encapsulation and depacketization, especially the schematic diagram of depacketization, which vividly reflects the role of the protocol header, that is, how to deliver the data packet upwards, and which protocol it is delivered to.

insert image description here

2. UDP network socket programming

1. The essence of network communication (interprocess communication identified by port)

1.
As long as there is a destination IP address and a source IP address, can the communication between the client and the server be completed? This is not the case. It is not the two hosts that actually communicate, but the client process and the server process on the two hosts. The IP address can identify the uniqueness of the host on the entire network. So what is used to identify the client process and the server? What about process uniqueness? In fact, it is identified by the port number port. So as long as there is an ip address + port, it can be determined which process of which host the data packet is sent to.
The port number is the content of the transport layer protocol. The application layer can obtain the port number through the system call. The port number is a 2-byte 16-bit integer, and the maximum size can reach 65536, because the transport layer and the network layer are implemented by the operating system. , so the port can tell the operating system which process on the target host the packet should be sent to. The port number can only be occupied by one process in the host corresponding to the same IP address, so the same port number may appear in different hosts. This is a normal thing, because the uniqueness of the process identified by the port is inside a host Yes, it is normal for the same port to appear in different hosts.
For example, in the figure below, host A and host B mark the uniqueness of their internal processes in the entire network through their own ip+port, so as to realize network communication across LANs.

insert image description here

2.
So the essence of network communication is actually inter-process communication, but today’s inter-process communication is cross-host and cross-network, while the inter-process communication we learned before is only the communication between processes within a host, and There is no cross-host and cross-network, but with the support of ip, port and network protocol stack, inter-process communication across hosts and networks can be realized, and such inter-process communication is actually network communication.
If we want to talk about inter-process communication, we said that the premise of inter-process communication is to let different processes see the same resource first. What is this resource? This resource is actually the network, including local area network and wide area network. The essence of communication is actually IO. All our online behaviors are nothing more than two types, one is to send our own data, and the other is to receive data sent to me by others.

3.
I have a question, since the process already has a pid, why should there be a port to identify the only process?
Reason 1 : The system is a system, and the network is a network. We don't want these two modules to be strongly coupled together, because once the strong coupling changes, the other needs to be changed. The robustness of the code is not good. From a technical point of view, it is absolutely achievable to use only the pid without the port, but we hope that the system and the network can be decoupled without affecting each other.
Reason 2 : The port number of the server process cannot be easily changed. After a server sets the port number, the server will use this port number for a long time, because the client needs to find the server process quickly and accurately every time, so This means that the server's ip and port cannot be easily changed, just like the 110, 120, 119 phone, once set, can it be easily changed? Of course it cannot be easily changed! The pid of the process is randomly assigned by the operating system each time, and the pid is random each time, so a port is required to identify the process.
Reason 3 : Not all processes need to provide network services, but all processes need to have a pid.

4.
So when a process is bound to a port number, we call this process a network service process.
How does the underlying operating system find the corresponding process structure struct task_struct structure based on the uint16_t type port number? Let me briefly say here that the operating system in the bottom layer actually finds the corresponding PCB structure through the port through the hash scheme. The port number is used as the key value of the hash table, and the corresponding PCB structure address is stored in the hash bucket. That is, the pointer of struct task_struct type. As long as you find the PCB pointer, you can find all the information related to the process, such as the file descriptor table struct files_struct *files, the signal bitmap of the process, the address space and a series of information.
The process is not only in the double-linked list, but it may also be in the red-black tree, hash table and other data structures. Multiple data structures in the kernel will be entangled together, which is very complicated.

5.
A process can bind multiple port numbers, but a port number cannot be bound by multiple processes.
For example, the function of a server process represented by a certain port number is to transmit data, and another port number is to execute instructions. Then it is possible that a server process has both functions. When the client sends data to these two port numbers for processing When requesting, it is possible to request the same server process, which responds to the requests of two clients at the same time and provides services for them at the same time.
But a port number can only correspond to one process. Otherwise, when the client sends a request to the port number, the responding process does not know which process it is. At this time, the server may lose or fail to receive data.
In some network services, there may be a situation of leaving a back door, that is, two port numbers are bound to a process, one port number is used by the client, and the other port number is used by someone else, but the client does not know it. It is the software that opens the back door.

6.
Therefore, even if we have not learned any socket programming, we can infer from the knowledge we have now that we must send more data during network communication. The ip+port calibrates the client or The uniqueness of the server process, in addition to sending data, must send its own ip and port to the other party. Because the communication must be two-way. Since it is two-way, the server has to know that after receiving the client's request, it must return the response corresponding to the client's request to the client. How does the server know which client to send it to? ? Of course, the sending location is determined by the client's ip and port. Therefore, in network communication, in addition to the data itself, a part of data must be sent. uniqueness in the web.

2. Transport layer protocol UDP/TCP

1. Both
TCP/UDP are transport layer protocols. When we do network programming, we must access the transport layer, because when the application layer is developing, it will definitely call the system call API between the transport layer and the application layer. .
TCP is called Transmission Control Protocol. It needs to establish a connection during network communication, so TCP is a reliable transmission. Of course, we cannot feel this reliability, because the transport layer is in the OS, and we only stay in the application layer. In addition, TCP is byte-oriented.
UDP is called the User Datagram Protocol. It does not need to establish a connection when communicating on the network, so UDP is an unreliable transmission, and we still cannot feel this unreliability. UDP is datagram oriented.
You can realize it when the back door is used for socket programming. When UDP communicates, the server accepts whatever the client sends. It is very convenient to communicate. TCP is more cumbersome when communicating. The file IO (byte stream) set is used to communicate between the client and the server.
But it should be noted that reliable and unreliable are both neutral words, not that unreliable is a derogatory term, which is suitable for different common The transport layer protocol, such as the TCP protocol must be used for bank transfers, and the data transmission must be stable and reliable, but some online advertising pushes are more suitable for UDP, because stability and reliability must come at a price, in terms of code processing It must be more cumbersome and complicated, and the cost of maintenance and coding must be relatively high. In scenarios such as advertising push, the requirements for stability and reliability are not so high, so it is naturally more suitable to use the UDP protocol because the cost of maintenance and coding is low.

insert image description here
2.
After the agreement is negotiated, the first problem we need to face is the problem of network byte order, because we know that general enterprise-level servers are generally big-endian, and our user-level notebooks are all little-endian. The big and small endian used by different hosts are different, how to unify them? If the data sent by a host is in little-endian byte order, and the receiving host interprets the data in big-endian byte order, there will definitely be problems.
So as early as before the network has been widely promoted, it has been stipulated that the data in the network must be big-endian. If you are a small-endian computer, you must first convert the data to big-endian and then send it to the network. The big-endian machine can send data directly.
In fact, there is a certain reason for such a regulation, because the little endian stipulates that the high bit of the data is at the high address, and the low bit is at the low address, and the address gradually increases from left to right, and the data bits gradually decrease from left to right. If it is small, the storage in memory and the logical form are exactly the opposite, which is not conducive to viewing, and the big-endian byte order is more in line with our logical cognition.
When the host sends and receives data, it sends and receives in the order from low address to high address.

insert image description here

3.
Who will do the conversion between little endian and big endian? Linux has already provided us with a batch of byte order conversion APIs. The host and the network correspond to host and net respectively, and l and s represent long and short. When the host is transferred to the network, the data will be converted to big endian uniformly. When the network is transferred to the host, the data will be converted to the byte order of the host, which may be big endian Endian may also be little endian, depending on the endianness of the host.
The above interface only provides two data types of short and long, so what if there are data types of char and double that need to be converted between the host and the network? Generally, the data sent during the network transmission is a string. If it can be displayed using the above interface, it will be displayed. If the type does not match, then the implicit type conversion will be sent, and the system will do this for us.

insert image description here

3.socket programming API and sockadder structure

1.
The following are a few common APIs for socket programming. It is enough to be familiar with them now. We will write the code later, and then we will know how to use these APIs.

insert image description here
2.
In socket programming, the common ones are network socket programming, raw socket programming, and unix inter-domain socket programming.
Network sockets support multi-host cross-network communication, and the following is about socket programming. Raw sockets are more difficult. It can bypass the transport layer and directly access the network layer and the lower layer. Packet capture and network monitoring tools are done through raw sockets. The article does not talk about raw sockets and unix interdomain sockets. Sockets, only talk about web socket programming. Unix inter-domain sockets can only communicate locally, not network communication. This socket can be understood by looking for the relevant source code of unix inter-domain sockets after learning network sockets.

3.
The socket API of the application layer is an abstract network programming interface. It is applicable to various network layer protocols, such as IPV4 and IPV6. However, the address formats of various network protocols are different, such as Inet socket, Unix Domain socket The network addresses are not the same. So in order to solve this problem, the struct sockaddr structure is created. The defined struct sockaddr_in or struct sockaddr_un structure pointer can be directly assigned to the struct sockaddr type pointer, and the first two bytes of the struct sockaddr structure will be extracted inside the relevant function Content, if the content of the first two bytes is AF_INET macro, then forcefully switch back to a pointer of type struct sockaddr_in inside the function, if it is AF_UNIX, then forcefully switch back to a pointer of type struct sockaddr_un inside the function.
In fact, isn't this way the polymorphism of C++? Receives a derived class object pointer with a base class pointer.
Some people may have doubts, why not directly use void * to receive parameters, and then convert it to short first inside the function, then extract the first two bytes of the pointer, then compare them, and finally perform inet sock or unix What about the conversion of the address type of domain sock? The main reason is that the C language was not born when this interface was used, and there is no C language standard yet. And when the C language was born, because the history can only be forward compatible, it is not easy to change the previous one, because if the previous interface is changed to void *, the cost is too high, and the interface test, function verification, etc. need to be re-tested , so it is only forward compatible due to historical baggage.

insert image description here

4. Server-client simple communication code

1.
The server and the client need to communicate, so two executable files need to be created, one is udpclient, the other is udpserver, variables can be defined in the makefile, for example, the variable cc can be defined as g++, and $ can be used to extract variables Content, if we want to produce two executable files at the same time, we can define a pseudo-target all, and then make the pseudo-target depend on the two executable files, the dependent method is empty, and then generate the corresponding executable file.

insert image description here
2.
Because there are a lot of demo tests to be done, the code posted looks a lot, but it doesn’t matter, we still follow the simple communication of server and client, server translation, server execution of instructions, server routing and group sending of messages, linux server Link communication with windows client, wait for 5 demos to explain one by one, so just look at the code corresponding to the modules I described.
A. Let's take a look at the calling logic of the server first, just look at the main. Let me talk about the user manual of the server first. When executing the server, you need to display a port number for the server process and let the server process bind the port number. In fact, the server does not recommend specifying an ip address because it is so flexible. The performance is very poor. If the ip address is reassigned or released, the server process will not be able to use the ip address, and the server will hang up. And if the specified ip is bound, the server cannot use the redundant ip address, and the robustness of the server will be reduced. The most recommended way is to let the server bind any ip address 0.0.0.0, so that if a certain ip address fails, it will not affect the server, and the server can still use other addresses to provide services for the client, which can strengthen the server The reliability and robustness of the process, so when executing the server process, we only need to add the port number that the server needs to bind on the command line, without specifying the ip address that the server needs to bind.
B. Some people may ask, if multiple clients use different ip addresses to send requests to port 8080 at the same time, is it that one server receives requests from multiple clients? Or do multiple servers receive requests from multiple clients? Didn’t you tell me that if the server binds any ip address, there must be the same port between multiple servers, then there may be a request sent by the client to a certain ip address and port 8080, and I don’t know which one to use The server responds, what should I do?
C. You don't have to worry, there is actually an intermediate role load balancer or proxy server between the client and the backend server. The client's request will be sent to the load balancer first, and then according to the load balancing strategy adopted, such as round Inquiry, ip hash, minimum connection, etc., route the client request to the appropriate back-end server, and let the server process the corresponding request. After the backend server processes the request, it sends the response back to the load balancer, and the load balancer finally forwards the response back to the corresponding client. The client does not need to know which back-end server has processed its request, it only needs to send the request to the load balancer, and the load balancer acts as an intermediate proxy between the client and the server, realizing the back-end server Encapsulation for processing requests.
D. A router is a device that works at the network layer, and a load balancer is a device that works at the application layer. The former is responsible for forwarding the data packets to the corresponding LAN according to the destination ip address, and the latter is responsible for distributing multiple requests of the application layer to multiple back-end servers to achieve high availability and load sharing. Do not confuse the two devices up.
E. In the main code, a description of the user manual is first carried out. If the form of your command line input does not conform to the user manual, the Usage function will be called to print out the user manual, which means that the server binding must be specified when running the server program. port number. Next, use the smart pointer unique_ptr to manage the udpserver object. When creating the object, the port number entered on the command line will be passed to the udpserver object. Before passing it, the string port atoi needs to be converted into an integer. When the smart pointer object is destroyed, the heap space of the udpserver object will be released at the same time, and the class udpserver implements two interfaces, one is the server initialization initServer, and the other is the server running start. This is the first version of the communication code.

insert image description here

3.
Let's not consider the callback here on the server. This callback is used by the communication code of the following version. What class member variables does the udpserver class need? Port number, ip address, sockfd, just these three, the server needs to bind any ip address, and the server needs a file descriptor sockfd to create a socket. So in the constructor, when initializing ip, any ip address of 0.0.0.0 is used for binding by default.
The first step in initializing the server is to create a server socket. The socket file descriptor can help us realize UDP full-duplex communication. The call interface is socket, which creates an endpoint of network communication and returns a file descriptor. The file descriptor returned by socket is different from ordinary file descriptors. sockfd is specially used to create socket communication. The first parameter represents the domain where you created the socket. Do you use the inet network socket to communicate? Or use unix inter-domain socket communication? Or other sockets for communication. This article only talks about network socket programming, so the macro we use is AF_INET, and AF_INET is defined by the macro as PF_INET, which is a macro of the IP protocol family, so the first parameter In addition to AF_INET, PF_INET can also be filled. The second parameter represents the service type provided by the socket. SOCK_DGRAM means that the transmission of datagrams is actually the UDP protocol, and SOCK_STREAM means that the transmission of byte streams is actually the TCP protocol. Here we can fill in SOCK_DGRAM, and the second parameter can actually be determined The transmission type provided by the socket, so the third parameter can be left blank, and it can be written as 0 by default, which means that the UDP protocol is used for network communication by default.

insert image description here

After creating the socket, we do a simple error checking process. The next job is to bind. Only creating a socket cannot complete network communication. It is also necessary to bind ip and port to sockfd to tell the operating system that sockfd has been connected with The specific ip and port are bound, and the sockfd can be used for network communication.
When binding ip and port to sockfd, you need to use the structure address of the network socket to complete the binding, that is, we first define a structure of struct sockaddr_in local network socket, and then fill the fields in the structure, and finally Then cast the structure pointer with filled fields into struct sockaddr * and pass it to the bind API. The struct sockaddr_in structure is mainly filled with three fields. The first field is sin_family, and its type is sa_family_t. This type is actually an unsigned short integer. The field when filling is actually the protocol in the protocol family. We Just fill in AF_INET or PF_INET. The other two fields are the unsigned short integer port, and a structure sin_addr containing a 32-bit ip address field of the in_addr_t type. When filling, we can directly fill the s_addr field of sin_addr, and use the inet_addr interface that returns the in_addr_t type to fill.
Before binding, you need to define a struct sockaddr_in type structure local to represent the local server, and then call bzero to clear all the fields in the structure. You can also use memset to complete this work, but some people like to use bzero, we also use it This way of writing. After that, when filling sin_family, you only need to fill in AF_INET. When filling sin_port, you need to transfer the host to the network, call htons to convert the port, and then fill it into the structure. The dotted decimal ip address format we use, The type must be incompatible with the ip address of the in_addr_t type, so we need to call an interface called inet_addr, which can help us do two jobs, the first is to convert the char * type to the 32-bit in_addr_t type, Then it can also help us do the work of transferring the 32-bit ip host to the network htonl, killing two birds with one stone. When using it, we can call the c_str() interface of string to pass the dotted decimal _ip to the inet_addr() interface, and then pass the interface The return value of is filled into local. In fact, when filling the ip address, you can also directly use the INADDR_ANY macro to fill it. It means any address, which is the same as our default _ip0.0.0.0. The third parameter of bind is an input parameter, which tells the bind function that the size of the local we defined will parse the structure type and extract the fields in the structure inside the bind function, and these tasks must require the structure body size, so the addrlen of the socklen_t type is actually the byte size of the local.
The dotted decimal style ip is more readable, but it is not suitable for network communication, because the number of bytes it occupies is too large, and the 32-bit integer ip address is more suitable for network communication, so fill in the struct sockaddr_in structure For the body field, we need to convert string to uint32_t type, and convert host to network byte order. These two steps are done for us by the inet_addr interface. (Network communication is very responsible here. If we do type conversion ourselves, we can definitely do it, but don’t mess around with it yourself. Just use the interface provided by others, because in addition to type conversion, there may be many things we can’t do. Considering the network communication problem, and these APIs are interfaces after rigorous testing, which is much safer than our own conversion, so don’t convert it by yourself, it is very likely that something will happen, just use the API.)
So the socket interface is more biased towards the system, creating a sockfd in the system, and bind is more biased towards the network, binding the sockfd to the ip address and port number needed for network communication.

Finally, after a simple error checking process, the initialization of the udp server is completed. In fact, we only call the two API interfaces of socket and bind in the whole process. Is it very simple?

insert image description here

4.
After the server is initialized, the next thing that needs to be completed is the interface started by the server. After the server is started, it is essentially an endless loop, and the communication logic of the current version is relatively simple, that is, the server receives the message from the client and displays it on the display, so the start interface does not call the callback function _callback to receive it. To the processing of the message, but only to do the processing of receiving the client message.
We have mentioned before that when data is received in network communication, the data sent by the other party is not only data, but also a part of data must be sent. This part of data is the ip and port of the client. Since the client and the server are simultaneously On the host of the linux cloud server, the available ip addresses are only the public network ip of the cloud server host (my Tencent cloud server is 43.143.224.5) and the local loopback 127.0.0.1. So in fact, the client ip received by the server is the same as the destination ip, because it is tested under the same host. As for other ip addresses, they may be bound by other hosts in the network, so we generally only use two ip addresses during testing, one is the public network ip of the cloud server, and the other is the local loopback ip address 127.0 .0.1, this address will only go around in the protocol stack of the current host, and will not carry out real network communication. It is usually the ip address used by programmers themselves to test network communication codes. Of course, in addition to the commonly used local loopback address such as 127.0.0.1, the cloud server manufacturer also has its own internal network ip, which is the inet address 10.0.8.2 in the eth0 ether. The intranet ip between servers, that is, the ip in Tencent's internal LAN, of course, this ip can also be used for testing, but its role has overlapped with the local loopback 127.0.0.1, because this ip can only be used in Tencent's intranet The hosts of Ali, Byte, and Baidu cannot access this ip, so generally we only use the two ip addresses of the host's public network ip and the local loopback 127.0.0.1.

insert image description here

When viewing the host network status under the udp protocol, you can use the sudo netstat -unpa command to view it.

insert image description here
Bind any address and the server process udpserver on port 8080 is running. Through the command, we can see that our process in the udp network service information has indeed started running.
insert image description here

When calling recvfrom to receive client information, the network socket address struct sockaddr_in must be inseparable, so we define a peer structure of type struct sockaddr_in, which is used as an output parameter to obtain the ip and port number of the client process and other information, in addition, a buffer should be defined to receive the valid data sent by the client, recvfrom will separate the header and payload of the data packet internally, and it will help us strip the protocol header of the last layer at the application layer , Get the valid data sent by the client into the buf array.
When recvfrom passes the last parameter, this parameter is an input and output parameter. We need to pass in the address of the size of the peer. After the call of recvfrom, the real length will be written back to addrlen. recvfrom is very similar to pipe reading, and is also set to blocking reading.
inet_ntoa is to convert the ip address of the network byte order in the peer structure into a dotted decimal ip. Same as inet_addr, inet_ntoa also completes two steps of work when converting: type conversion + byte order conversion.
After the parameters are passed, the peer will have the client's port and ip information, and we can print out the message sent by the client on the server. As for _callback, we will not call it here, and demonstrate a communication-only code first.
So far, the code writing of the server has been completed, which is actually very simple.

insert image description here

insert image description here

5.
The following is the code of udpclient.cc. When the client is running, it needs to indicate which host and server process the message it sends is sent to, so it needs to specify the server's ip and port number when running. Then get the ip and port of the server from the command line, and pass these two parameters to the constructor of udpclient. The same as udpserver, it also uses smart pointers to manage the udpclient object, that is, when the udpclient pointer is destroyed, the udpclient object will also Followed by destruction.
Udpclient also implements the two interfaces of initclient to initialize the client and run to run the client.

insert image description here

6.
The member variables of the client are sockfd, serverip, serverport, Boolean value _quit, that is, today’s client is also an endless loop, if we don’t take the initiative to kill the process, then the client will always be running.
The first thing to do to initialize the client code is the same as that of the server. It also needs to call the socket interface to create a socket.
The second thing is binding. Does the client need binding? First of all, the client must be bound, because he needs to identify the uniqueness of his process in the whole network, but does the client need to bind an ip and port number displayed by the programmer himself? The answer is no. Because the role of the client is to initiate a request to the server, the client only needs to know the ip and port number of the server. He does not care about the ip and port number bound to himself. In fact, when calling the sendto interface to initiate a request to the server , the operating system will automatically bind the ip and port number to the sockfd of the client, and the ip and port number of the client are dynamically allocated by the operating system.
If the client binds an ip and port number, the robustness of the same client program will be greatly reduced. Once other clients accidentally bind the same ip or the same port number, other clients will If it crashes, or the ip bound to the client is released or reassigned, then the client will no longer be able to continue running and will crash again. Therefore, the client and the server are relatively the same. We do not recommend that the server bind the specified ip address, but bind any ip, so that the server is more robust, and the same is true for the client. The client does not need to bind itself Specify the ip address and port number, because such binding will make the client process less robust, the client does not need to bind itself, the operating system will complete the work of client binding, programmers do not need to worry.
So the initialization code of the client is very simple, just call the socket() interface to complete the initialization of the socket, and the working operating system bound to sockfd will be dynamically allocated.

7.
When the client calls sendto to send a message to the server, the client needs to send its own ip and port in addition to the message data itself, so that after the server receives and processes the message, it can pass the client's own ip and port Find the location of the client process, so that the processed message is returned to the client.
The first parameter is that the client sends the sockfd after the operating system has bound itself to the ip and port to the server. The second and third parameters represent the content and byte size of the sent message. The fourth parameter flags is here in UDP. Generally, it is set to 0, indicating that the default behavior is used. Flags mainly control some additional attributes of the data packet. You can modify the default behavior of sendto. If you have special needs, you can set the corresponding flags. We will use it normally today, just set it directly It is 0.
The client needs to define a struct sockaddr_in type structure server by itself, and fill in each field of the server, then pass the address of the server to the dest_addr parameter after mandatory type conversion, and then pass the byte size of the server to addrlen, The last parameter of sendto is a pure input parameter, and the last parameter of recvfrom is an input and output parameter, and the two interfaces are slightly different here.
Another very important knowledge point is about the sockfd socket file descriptor. When reading and writing to sockfd, the actual operation is the socket file control block (socket file control block) pointed to by the file descriptor. The file There are sending and receiving buffers inside the control block, and the reading and sending of messages cannot be done without the help of the socket file control block. We call such network communication full-duplex communication.
The file control block pointed to by the ordinary file descriptor fd only manages the contents of the file, and does not have sending and receiving buffers, because ordinary files store data passively and do not need sending and receiving buffers.

insert image description here
One more thing, C++string-style strings do not contain \0, so when sending to, just send the size of the string directly. In C language, when sending strings, the size of strlen(str) + 1 is generally sent, so that \0 can also be sent, but in C++, we can just send string.size() directly.
insert image description here
8.
The following is the experimental phenomenon. The server prints out the client's ip and port number, as well as the content of the message sent. Because today's client and server are running on the cloud server, the ip addresses of the server and the client are the same, and both are local loopback ip127.0.0.1. If you use the public network ip of the cloud server , you can also conduct a test, but the port number of the public network ip of the cloud server is closed by default. If you directly use the public network ip to access some ports, the manufacturer will automatically block the firewall in order to ensure the security of the cloud server. If you want to use the public network ip to test some ports for the data packets sent by the port of a certain host on the public network ip, you need to manually configure some port numbers that you usually want to use for testing in the firewall in the console of the server, such as 8080, so that you can use the public network IP to test the port number.
The cloud server does not allow binding on ports 0-1023 of the cloud server. These port numbers have been bound inside the cloud server. You can bind the ports from 1024 to 65536 at will. We usually like to use port 8080 for testing code.

insert image description here

5. Translation version and bash command version

1.
Does the server only need to receive the client's message and print it to the monitor? Of course not, the server reads the information only to complete the network communication. After the information is read, the server must process it, so the demo below is the code of the two versions of the server translating words and executing bash commands.

2.
The following is the main function of udpclient.cc in the translated version. It supports two more functions, one is initdict and the other is reload. The initdict() interface will first divide the content in the file dict.txt, and then divide the divided The key and value are constructed into a key-value pair and inserted into the unordered_map, and the dictionary loading process is completed so far, and then the translation function is passed to the udpserver server class, and the server will use the wrapper type object _callback to receive the translateMessage function pointer. After receiving the client message, it will call _callback and pass parameters such as sockfd, clientip, clientport, and the obtained message to _callback. At this time, the function translateMessage will be called back to complete the translation of the received information.
The processing of the message is completely put into the translateMessage function for processing, and the communication logic of the udpserver is strongly decoupled. This is a very good code style. It performs specific business logic processing on the message, and the translateMessage function does not need to care about the message. How did it come about? The business logic processing of information and the network communication for obtaining information are strongly decoupled, which can greatly improve the robustness of the code and is a very good code style.
(You may not realize the benefits of strong code decoupling. When I wrote the thread pool code before, I strongly coupled the initialization of the thread pool, that is, the code for creating multi-threads, and the code for assigning thread names in the thread processing function handler. Together, it is very messy anyway, it is possible that the thread has started to run, but the pointer of the thread object has not been allocated, I can’t explain it here, anyway, several module codes with different functions are strongly coupled together, the result It’s just that there are disgusting and disgusting problems in the code, and the problems are different every time, which is really very uncomfortable. Therefore, the code you usually write must have high cohesion and low coupling, otherwise it will cause problems. Solving the problem will really break you down. If it is mixed with multi-threading, it will be really difficult, so the code must be strongly decoupled, otherwise if there is a bug, you can find it slowly, and it will kill you.)

insert image description here
insert image description here
3.
C++-style getline will read the contents of the istream stream object into the string object line by line, so we can define an ifstream object, associate this stream object with the dict.txt file, and create the dict.txt file when the ifstream object is created It will be opened automatically, and the dict.txt file will be automatically closed when the ifstream object is destroyed, and then the in object will be cut and assigned to the first parameter istream object of getline, and the content obtained by getline will be stored in the string object line, and getline will press Line reads the content in the file dict.txt, and ignores the newline character \n read, which is different from fgets, fgets does not ignore \n.
After reading the line, the next thing to do is to cut the string line, use: as the separator (delimiter), cut the string into two parts, key and value, after cutting, construct the key-value pair and insert it into the unordered_map , getline reads in a loop and repeats the above operations until getline reads to the end of the file. After jumping out of the while loop, we can print a sentence of load dict success, which means that the content in the file has been cut and made into key-value pairs and inserted into unordered_map dict up.
In addition, a hot loading function reload has been implemented. Why should this be implemented? For example, when the server is running, I added a new mapping relationship between Chinese characters and English in the file dict.txt. At this time, I don’t want to terminate the server again, and then restart the server and reload the file content into the dict object. , then we can use the hot loading function, that is, to capture the signal SIGINT, so that the processing behavior of the signal becomes calling reload, reload re-adjusts the initdict() function, and reloads. This is called hot loading.

insert image description here

4.
The following is the function code for the business processing of the message obtained after network communication.
The code logic is actually very simple, because all the preparations have been done, we only need to call the find interface in the dict to search for the message as the key value, and the find interface will return the iterator corresponding to the key-value pair stored in the message. The translation result of the message can be obtained through the iterator, and then we can return the translation result to the client. The return method is also very simple. Just call the sendto system call. The parameter list of translateMessage includes the client's ip and port, etc. information, as well as the server's own sockfd, so directly define a struct sockaddr_in structure object client, fill in the content of the client, and return the corresponding response_message message to the client to complete the return of the translation result.

insert image description here

The following code is the run method of the Client class in udpClient.hpp. We have added a part of the code to receive the translation results of the server in the infinite loop code of the client. The logic of receiving is not difficult. You only need to call the system call recvfrom, and the server's ip In fact, the client has already known the port number, so here we only need to define a struct sockaddr_in temp structure, and we don’t need to get the server’s ip and port in the temp structure, because the client already knows before the client runs I know which ip and which port I want to send data to, so the temp here is just a temporary structure to cooperate with recvfrom.
Then we print out the translation result of the server, and the communication code of the client is completed.

insert image description here
The following is the running result of the translated version code. You can see that for the existing mapping relationship in the file dict.txt, the server can give the corresponding English translation result. If the English you input does not exist in the dict.txt, the server will Returns an unknown translation result.
During the running process, I added two lines of mapping relationship in dict.txt. Before hot loading, the translation result of udp and tcp is unknown, but as long as you run ctrl+c in the server process, it will send to the server When the process sends a SIGINT signal, the server will perform hot loading of the file dict.txt, then the tcp and udp English we input can have correct translation results.

insert image description here

5.
The following is the version code for executing the bash command. The decoupling of business logic processing and network communication saves us a lot of work. Now if you want to execute the bash command, just write an execCommand function directly. You can Pass the function name to the udpServer object. After receiving the client message, the server class will call back the execCommand function for business logic processing. So we don't need to change almost anything in the main function, just change the function name parameter passed to the udpServer object, so decoupling is not very fragrant?

insert image description here

The following is the code of the execCommand interface. The function that plays a key role is popen. This function can create a pipeline and a subprocess, let the subprocess execute the shell command, and write the execution result to the pipeline file. popen The pipe file can be opened, and the result of executing instructions by the subprocess can be read from the pipe file. The prototype of this function is as follows, the command to be executed by the child process needs to be passed, which is the command parameter, and the opening mode of the pipe file can be read-only, write-only and append. We must open it with read-only today to read the child at the other end of the pipe file The output result of the process executing the instruction.
The file pointer fp after the popen call will point to a file, and the content in the file is the output result after the subprocess executes the bash command. We can use fgets to read the content of the file pointed to by the pointer fp, and += this content into the string type object response, then the information that needs to be returned to the client is the response. The routine when returning to calling sendto is the same as the previous translation version. It creates a client structure, fills in the fields, and tells sendto which client to send the message to. Finally, just close the file pointer fp, thus completing the pointing of the bash command, the client sends the bash command to the server, and the server returns the execution result of the bash command to the client.

insert image description here

insert image description here

There is no change in the logic here on the client side, I just removed the output prompt message "Server's translation result#". And replace Please Enter with the simulated server prompt message "[wyn@VM-8-2-centos socket]$ "to execute bash commands more realistically.

insert image description here

Below is the bash command version of the code. General commands such as pwd ls touch clear can be executed. For rm mv rmdir commands, I have blocked them to prevent the client from damaging the server, such as the classic rm -rf ./* command to delete garbage.

insert image description here

6. Group chat version and win+linux linkage version

1.
Chatting with the server alone is boring. If we want to do it, we can just start a group chat. The messages sent by one person can be seen by other group chat members, so it is interesting. So let's implement a message routing, that is, send a message sent by someone to all online users. The main function is still the same. You only need to change the function pointer passed to the udpserver object to complete the main code.

insert image description here

2.
The following is the code of the onlineUser.hpp file of online users, which implements two classes, one class refers to ordinary users, and the other class refers to all online users. We use this class to add online users. Realization of functions such as user offline.
The members of the User class are mainly the client's ip and port number. We use these two variables to identify a client. The User constructor can use these two variables to initialize.
The member variable of the OnlineUser class is mainly a hash table, which is used to store the mapping relationship between the id of the string type and the user of the User type. This id is a string that we concatenate with clientip and clientport. The OnlineUser class mainly implements four functional modules, which are addUser() when the user wants to go online, delUser() for deleting the online user, isonlineUser() if the user is online, and broadcastMessage() for message routing, that is, you send The message of is visible to all online users. The implementation of the first three interfaces is not complicated. To put it bluntly, it is to call the insert() erase() find interface of the _users hash table. The routing group sending interface of the message may need to be mentioned. The interface parameters of the routing group sending need the client's ip and port, as well as the server's sockfd and the message sent by the client to the server.
The for loop internally traverses all the key-value pairs in the hash table. The value in the key-value pair is the online user, which contains two member variables that are exactly the user's ip and port. Define a client structure, then fill the structure client with the information of the online users, and finally send the message to all the online users, we will also splicing the sent messages, that is, the client's ip and port and message stitching.

insert image description here
3.
The following is the logic processing function routeMessage of the server server. If the user is not online, tell the user and let the user go online (by calling the sendto interface). Messages can be sent in groups.

insert image description here

4.
The following is the code of the client, which adopts a multi-threaded scheme, that is, one thread sends messages, and one thread is responsible for receiving the messages returned by the server. If the multi-threaded scheme is not adopted, we cannot see the messages sent by other clients. Because after the message is input, after the server processes it, the dead loop code in run calls recvfrom and then re-runs to getline to read the message input by the keyboard in a blocking manner, then recvfrom cannot be run at this time, and other messages cannot be received. After the online user sends a message to the service, the server routes the group message to all online users, so we can only adopt a multi-threaded solution, so that keyboard input and receiving server messages are two different execution flows.
There are still details in the code. When sending a message, I use cerr, that is, display the message on the display without buffering. In the thread execution function that receives the message, when printing the message returned by the server, I use cout, which is line buffered. Display the message to the display.

insert image description here

5.
I use vscode as the server, and xshell as the client. The following is the experimental phenomenon of the code. The content output by cout will be redirected to the pipeline file, and the content output by cerr will be displayed on the terminal immediately without buffering. This can divide the two steps of sending a message and receiving a message into different terminals (different phenomena are actually different sessions), so that the experimental phenomenon will be more obvious.
The message sent by user 1 can be received by user 2. This means that the server sends the message sent by user 1 to all other users including user 1. If there are user 3, user 4, etc., they You can still receive the message sent by user 1, as long as they all run online and go online.

insert image description here

6.
The following is the code implementation of windows vs2022 as the client and linux cloud server as the server.
The code of the server is not so complicated, we simply realize the echo of a client message, that is, server echo.

insert image description here

The following code is the code for network socket programming under windows. The socket code of the windows client only initializes the socket network library of windows, and finally needs to call WSACleanup() under the windows system to clean up the interface. The other codes are the same as we are in The client code written on linux cannot be said to be very similar, but it can only be said to be exactly the same, so I won’t go into too much detail. It is nothing more than calling the socket() sendto() recvfrom() interface to send and receive data packets. You can use the code Take a look at yourself.
insert image description here
7.
The following are the experimental results of the code. Windows uses GBK encoding, while linux uses UTF-8 encoding, so when sending Chinese data, linux decoding may have problems, but there is no problem using English, and it can be used in different Send and receive data under the platform, the compatibility is very good.

insert image description here
From the cmd command window of windows, you can check the public network ip of my computer host, and the ip in the mobile hotspot LAN that my computer is currently connected to, and the client ip printed by the linux server happens to be the public network of my windows computer host ip 112.224.163.210 instead of my LAN ip 192.168.213.134.
insert image description here

Guess you like

Origin blog.csdn.net/erridjsis/article/details/130848048