[Network programming] socket socket

1. Source IP and destination IP

If our desktop or notebook does not have an IP address, we cannot access the Internet, and because each host has an IP address, it is destined that the data transmitted from one host to another must have a source IP and a destination IP .
So the source IP and destination IP will be included in the header.

And it is not the purpose for us to transfer data from one host to another. The real communication is actually the software on the application layer.
insert image description here
And we know that the application layer can be more than one software .

So we now have a problem:
since the public network IP identifies a unique host , data can be passed from one host to another. But there are so many software (processes), how to ensure that software A sent is received by software B?
In other words: what is used to uniquely identify a process on a host?

Second, the port number port

In order to better represent the uniqueness of the service process on a host, it is stipulated that the port number port is used to identify the uniqueness of the service process and the client process.

The port number:

1️⃣ The port number is a 2-byte 16-bit integer
2️⃣ The port number is used to identify a process and tell the operating system which process to hand over the data to
3️⃣ A port number can only be occupied by one process (the same host)

From the above, we can know:
IP address (identifies a unique host) + port number (identifies a unique process) can identify a certain process of a certain host on the network (the only process in the entire network ).

Then the essence of network communication is inter-process communication.
And we said before that the essence of inter-process communication is to see the same resource , and now this resource is the network .
The essence of communication is IO , because there are two situations when we surf the Internet: 1. Send data out 2. Receive data.

Here is another question to think about. We identify a process with a pid, so why do we need a port number?

1️⃣ First of all, the pid is specified by the system, and the port is specified by the network, so that the system and the network can be decoupled . 2️⃣ The uniqueness
of the port identification server cannot be changed in any way . To allow the client to find the server, it cannot be changed just like 110 and 120. The pid will change every time the process starts. 3️⃣ Not all processes need to provide network services or requests (no port is required), but each process needs a pid.

Although a port number can only be bound to one process, a process can be bound to multiple port numbers.

Above we said that IP has source IP and destination IP, and the port here also has source port number and destination port number. When we send data, we also need to send our own IP and port number , because the data will be sent back. Therefore, when sending data, there must be some extra data (presented in the form of a protocol).

Some people may ask how to know which IP and port to send for the first time. Don’t worry about this, because the server has already built it in.

3. TCP/UDP protocol

The socket interface we use must use the transport layer protocol, and we will not bypass the transport layer to call the following protocol.
insert image description here

The transport layer protocol is divided into TCP protocol and UDP protocol.

There are several characteristics for the TCP protocol:

1️⃣ Transport layer protocol
2️⃣ Connection (connection must be established before formal communication)
3️⃣ Reliable transmission (help us do reliable transmission internally)
4️⃣ Oriented to byte stream

There are several characteristics for the UDP protocol:

1️⃣ Transport layer protocol
2️⃣ Connectionless
3️⃣ Unreliable transmission
4️⃣ Datagram-oriented

Reliable and unreliable transmission:
Unreliable transmission: For example, when data is sent, packet loss occurs, or data is repeatedly transmitted (multiple copies), or the network has problems, etc. The consequences are called unreliable.

So the transport layer is a protocol used to solve reliability.

But why do we still have this coordination for UDP unreliable transmission?
It should be noted here that "reliability" is a neutral term . Reliable requires cost, and is often more complicated in maintenance and coding;
unreliable has no cost and is easy to use.
So use it by scene .

3.1 Network byte stream

We know that there will be big and small endian problems when storing multi-byte data.
Little endian: The number with low weight is placed in the low address.
Big-endian: low-weight numbers are put into high addresses.

So now there is such a situation: a big-endian machine may send data to a little-endian machine in a big-endian way.
Now across the network, we don't know whether the data is big endian or little endian .

So there is a rule:
the data in the network is big endian.

If the host sending data is a big-endian machine, don’t worry about it. If it’s a small-endian machine, convert the small-endian to big-endian before sending.
Receive data in the same way.

So how to define the address of the network data flow?

The sending host sends the data in the sending buffer according to the address of the memory from low to high .
The receiving host receives the data in the receiving buffer according to the address of the memory from low to high .
That is, the data sent first is the low address, and the data sent later is the high address.
The TCP/IP protocol stipulates that the network data flow should adopt the big-endian byte order, that is, the low address and the high byte. No matter whether the host is a big-endian machine or a little-endian machine, it will follow the network byte order specified by this TCP/IP. To send/receive data, if the current sending host is little-endian, you need to convert the data to big-endian; otherwise, just ignore it and send it directly.

The work of converting data into big endian does not need to be done by ourselves. We can call the following library functions to convert network byte order and host byte order.

#include <arpa/inet.h>
// 主机序列转网络序列
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
// 网络序列转主机序列
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);

h means host, n means network, l means 32-bit long integer, and s means 16-bit short integer.
Whether the host is big-endian or little-endian will be judged by itself inside the function.
If the host is little-endian, these functions convert the arguments accordingly and return.
If the host is big-endian, these functions do not convert and return the arguments unchanged.

Four, socket socket

We said earlier that the IP+port number port identifies the only process on the entire network, and we call it IP+portthe socket socket

4.1 common interface of socket

// 创建 socket 文件描述符 (TCP/UDP, 客户端 + 服务器)
int socket(int domain, int type, int protocol);
// 绑定端口号 (TCP/UDP, 服务器)
int bind(int socket, const struct sockaddr *address,
socklen_t address_len);
// 开始监听socket (TCP, 服务器)
int listen(int socket, int backlog);
// 接收请求 (TCP, 服务器)
int accept(int socket, struct sockaddr* address,
socklen_t* address_len);
// 建立连接 (TCP, 客户端)
int connect(int sockfd, const struct sockaddr *addr,
socklen_t addrlen);

It can be found that there is a structure type called sockaddr in the function parameter. What is it?

4.2 sockaddr structure

In fact, there are many types of sockets.
There are three common ones:

Network Sockets
Raw Sockets
Interdomain Sockets

Network sockets are mainly used for cross-host communication and can also support local communication, while inter-domain sockets can only communicate locally. The raw socket can access the underlying data across the transport layer (TCP/IP protocol).

The application scenarios of these sockets are completely different, so we have to use three different sets of interfaces if we want to use them.
For convenience, the designer only uses one set of interfaces, and can solve all communication scenarios through different parameters.

Here are two specific socket types: sockaddr_inandsockaddr_un

struct sockaddr_in {
    
    
    short int sin_family;           // 地址族,一般为AF_INET
    unsigned short int sin_port;    // 端口号,网络字节序
    struct in_addr sin_addr;        // IP地址
    unsigned char sin_zero[8];      // 用于填充,使sizeof(sockaddr_in)等于16
};

struct sockaddr_un {
    
    
    sa_family_t sun_family;       /* AF_UNIX */
    char sun_path[108];    /* 带有路径的文件名 */
};

insert image description here
You can see sockaddr_inand sockaddr_unare two different communication scenarios. To distinguish them, use the identifier of the 16 address type protocol family. We don't use both structures, use them sockaddr.

For example, if we want to use network communication, although the parameter is const struct sockaddr *addr, what is actually passed in is sockaddr_ina structure (note the mandatory type conversion).
All functions are treated equally within the function, and all are regarded as sockaddr types, and then judge what type of communication is based on the first two bytes and then forcefully switch back.

We can regard sockaddrit as a base class, and as sockaddr_ina sockaddr_underived class, forming a polymorphic system.

V. Summary

1️⃣ IP+portIt can mark the only process in the whole network.
2️⃣ A socket is a communication mechanism (an agreement between two parties to the communication), and the representation of a socket is: IP+port.
3️⃣ The main difference between TCP and UDP is reliable transmission and unreliable transmission, and unreliable is a neutral word.
4️⃣ The network byte sequence is simply and roughly defined as big endian.
5️⃣ sockaddr uses a unified interface to solve most of the problems of network communication.



Guess you like

Origin blog.csdn.net/qq_66314292/article/details/130454865