Condensed Notes on Operating Systems (7) - Network System Structure

 Main quotes from the article notes:

Axiu’s study notes (interviewguide.cn)

Kobayashi coding (xiaolincoding.com)

I/O multiplexing

socket model

Creating a socket can specify whether the network layer uses IPV4 or IPV6, and the transport layer uses TCP or UDP.

bind() Function, bind an  IP address and port to this Socket . What is the purpose of binding these two?

  • The purpose of binding ports : to find the corresponding application.
  • The purpose of binding an IP address : A machine can have multiple network cards, and each network card has a corresponding IP address. When a network card is bound, the kernel will not send it to us until it receives the packet on the network card. ;

There are two listening Sockets and the Socket actually used to transmit data:

  • One is called Listening Socket ;
  • One is called Connected Socket ;

What is the purpose of a file descriptor?

Based on the Linux concept that everything is a file, Sockets also exist in the form of "files" in the kernel and have corresponding file descriptors.

Each process has a data structure  task_struct, which contains a member pointer to the "file descriptor array". The subscript of the array is the file descriptor, and the content of the array is a pointer pointing to the list of all open files in the kernel, which means that the kernel can find the corresponding open file through the file descriptor .

Then each file has an inode. The inode of the Socket file points to the Socket structure in the kernel . There are two queues in this structure, namely the sending queue and the receiving queue . What is stored in these two queues is one by one  struct sk_buff. Strung together in the form of a linked list.

sk_buff can represent data packets in each layer. In the application layer, the data packet is called data, in the TCP layer we call it segment, in the IP layer we call it packet, and in the data link layer it is called frame.

Why are all data packets described by only one structure?

The protocol stack adopts a hierarchical structure. When the upper layer transmits data to the lower layer, it needs to add a header, and when the lower layer transmits data to the upper layer, it needs to remove the header. If each layer uses a structure, then when transmitting data between layers, Multiple copies will occur , which will greatly reduce CPU efficiency.

How many clients can a single server theoretically connect to?

For IPv4, the maximum number of client IPs is 2 to the 32nd power, and the maximum number of client ports is 2 to the 16th power. That is, the maximum number of TCP connections for a single server is approximately 2 to the 48th power .

This theoretical value is quite "full", but the server will definitely not be able to carry such a large number of connections, and will be mainly limited by two aspects:

  • File descriptor , Socket is actually a file, and it will correspond to a file descriptor. Under Linux, the number of file descriptors opened by a single process is limited. The unmodified value is generally 1024, but we can increase the number of file descriptors through ulimit;
  • System memory , each TCP connection has a corresponding data structure in the kernel, which means that each connection will occupy a certain amount of memory;

 C10K 

10,000 concurrent requests is the classic C10K problem. C is the abbreviation of Client, and C10K is the problem of processing 10,000 requests at the same time on a single machine.

From the perspective of hardware resources, for a server with 2GB memory and Gigabit network card, if each request processing occupies less than 200KB of memory and 100Kbit of network bandwidth, it can meet 10,000 concurrent requests.

However, if you want to truly implement a C10K server, what you need to consider is the network I/O model of the server . Low efficiency increases system overhead, which will make it farther and farther away from C10K.

multi-process model

Parent process and child process

Each client is assigned a process to handle requests. For an established connection, accpet() returns a connected Socket. At this time, fork() creates a child process and copies the file descriptor, memory address space, program counter, executed code, etc. of the parent process. Because the file descriptor of the parent process is copied , the child can directly use the "Connected Socket" to communicate with the client. And the child doesn't care about listening to the socket, just like the parent process doesn't care about the connected socket.

 The parent process takes care of the child process

wait() and  waitpid() functions.

Disadvantages of the multi-process model

1. The larger the number, the more sub-processes will be created and occupy certain system resources.

2. Process context switching is burdensome. It includes not only user space resources such as virtual memory, stack, and global variables, but also kernel space resources such as kernel stack and registers.

multi-threaded model

Thread pool model

After the server completes the TCP connection with the client,  pthread_create() it creates a thread through the function, then passes the file descriptor of the "connected Socket" to the thread function, and then communicates with the client in the thread to achieve the purpose of concurrent processing.

In order to avoid creating a thread for every connection, use a thread pool to avoid frequent creation and destruction of threads. The so-called thread pool is to create several threads in advance, so that when a new connection is established, the connected The Socket is put into a queue, and then the thread in the thread pool is responsible for taking out the "connected Socket" from the queue for processing.

shortcoming

When a new TCP connection arrives, a process or thread needs to be allocated. So if you want to reach C10K, it means that a machine needs to maintain 10,000 connections, which is equivalent to maintaining 10,000 processes/threads.

I/O multiplexing

One process to maintain multiple Sockets 

When the process processes requests, the time taken is controlled within 1ms, and thousands of requests can be processed in 1 second. It is equivalent to reusing one process for multiple requests.

How does select/poll/epoll obtain network events?

select/poll/epoll The kernel provides multiplexed system calls to user mode. The process can obtain multiple events from the kernel through one system call function .

When obtaining events, all connections (file descriptors) are first passed to the kernel, and then the kernel returns the connections that generated the events, and then processes the requests corresponding to these connections in user mode.

select/poll

select implements multiplexing

Put all connected Sockets into a file descriptor set , and then call the select function to copy the file descriptor set to the kernel. The kernel traverses the file descriptor set. When an event is detected , the Socket is marked as available. Read or write, and then copy it back to the user state. Then the user state needs to traverse to find the readable or writable Socket, and then process it.

selectDisadvantages

It is necessary to  "traverse" the file descriptor set twice , once in the kernel mode and once in the user mode, and  "copy" the file descriptor set twice . First, it is passed from user space to kernel space. After being modified by the kernel, it is then passed out to user space.

select uses fixed length

select uses a fixed-length BitsMap to represent a set of file descriptors, and the number of supported file descriptors is limited. In Linux systems, it is limited by FD_SETSIZE in the kernel. The default maximum value is, and can only listen to 0  1024~ 1023 file descriptor.

poll implements multiplexing

Poll no longer uses BitsMap, but uses dynamic arrays organized in the form of linked lists , which breaks through the number limit . Of course, it is also limited by system file descriptors.

However, there is not much essential difference between poll and select. They both use "linear structure" to store the Socket collection that the process is concerned about. Therefore, both need to traverse the file descriptor collection to find the readable or writable Socket. The time complexity is O( n), and also need to copy the file descriptor set between user mode and kernel mode . In this way, as the number of concurrency increases, the performance loss will increase exponentially.

epoll

Basic writing method

Let’s review the usage of epoll first. In the following code, first use epoll_create to create an epoll object epfd, then use epoll_ctl to add the socket to be monitored to epfd, and finally call epoll_wait to wait for data.

int s = socket(AF_INET, SOCK_STREAM, 0);
bind(s, ...);
listen(s, ...)

int epfd = epoll_create(...);
epoll_ctl(epfd, ...); //将所有需要监听的socket添加到epfd中

while(1) {
    int n = epoll_wait(...);
    for(接收到数据的socket){
        //处理
    }
}

epoll solves the select/poll problem very well through two aspects.

The first point is that epoll uses a red-black tree in the kernel to track all file descriptors to be detected in the process , and adds the sockets that need to be monitored  epoll_ctl() to the red-black tree in the kernel through functions. The general time complexity of adding, deleting, and modifying is  O(logn). The select/poll kernel does not have a data structure similar to the epoll red-black tree that saves all sockets to be detected , so the entire socket set is passed to the kernel during each operation . However, because epoll maintains a red-black tree in the kernel , it can Save all sockets to be detected , so only one socket to be detected needs to be passed in , which reduces a large amount of data copying and memory allocation in the kernel and user space.

The second point is that epoll uses an event-driven mechanism. The kernel maintains a linked list to record ready events . When an event occurs on a socket, the kernel will add it to the ready event list through the callback function . When the user calls   the function At this time, only the number of file descriptors with event occurrences will be returned. There is no need to poll and scan the entire socket collection like select/poll, which greatly improves the detection efficiency.epoll_wait()

 Edge triggering and level triggering

epoll supports two event triggering modes, namely edge -triggered (ET ) and level- triggered (LT ) .

These two terms are quite abstract, but their differences are actually easy to understand.

  • When using the edge trigger mode, when a read event occurs on the monitored Socket descriptor, the server will only wake up from epoll_wait once . Even if the process does not call the read function to read data from the kernel, it will still only wake up once. Therefore, Our program must ensure that the data in the kernel buffer is read all at once;
  • When using the horizontal trigger mode, when a readable event occurs on the monitored Socket, the server continuously wakes up from epoll_wait until the kernel buffer data is read by the read function . The purpose is to tell us that there is data that needs to be read. ;

If you use the edge trigger mode, you will only be notified once when an I/O event occurs, and we do not know how much data can be read and written, so you should read and write data as much as possible after receiving the notification to avoid missing the opportunity to read and write. Therefore, we will read and write data from the file descriptor in a loop . If the file descriptor is blocked and there is no data to read or write, the process will be blocked in the read and write function, and the program will not be able to continue execution. Therefore, the edge trigger mode is generally used in conjunction with non-blocking I/O . The program will continue to perform I/O operations until the system call (such as  read and  write) returns an error, the error type is  EAGAIN or  EWOULDBLOCK.

Generally speaking, the efficiency of edge triggering is higher than that of horizontal triggering, because edge triggering can reduce the number of epoll_wait system calls. System calls also have a certain overhead. After all, there are also context switches.

select/poll only has horizontal trigger mode. The default trigger mode of epoll is horizontal trigger, but it can be set to edge trigger mode according to the application scenario.

other

How much do you know about solutions for high server concurrency?

  • Separate application data from static resources and save static resources (pictures, videos, js, css, etc.) separately to a dedicated static resource server. When the client accesses, the static resources are returned from the static resource server and the application is returned from the main server. data.

  • Client-side caching is the most efficient and consumes the least resources is the purely static HTML page , so the pages on the website can be implemented as static as much as possible , and the page will be re-cached after the page expires or the data is updated. Or generate a static page first, and then use ajax asynchronous request to obtain dynamic data.

  • Cluster and distributed (cluster means that all servers have the same function, and any request can be made, and it mainly plays a diversion role)
    (distributed means that different services are placed on different servers, and multiple servers may be needed to process a request. Each server can speed up request processing .)
    Server clusters and distributed architecture can be used to spread the computing pressure originally belonging to one server to multiple servers. Also speeds up request processing.

  • Reverse proxy When accessing the server, the server obtains resources or results from other servers and returns them to the client.

load balancing

The concurrency and data volume of a single machine are limited, so multiple servers are used to form a cluster to provide external services.

The corresponding relationship between nodes (servers) and requests is a load balancing problem  . The simplest way is to introduce an intermediate load balancing layer.

The hardware configuration of each node is different. We can introduce a weight value, set the weight value of the node with better hardware configuration to a higher value , and then allocate it to different nodes according to a certain proportion according to the weight value of each node , so that the hardware configuration Better nodes bear more requests, this algorithm is called weighted polling .

The usage scenario of the weighted polling algorithm is based on the premise that the data stored in each node is the same . The weighted polling algorithm cannot cope with " distributed systems (systems with data sharding) " because in distributed systems, the data stored by each node is different. Distribute data to different nodes.

Build a KV cache system.

Hash algorithm: Because the same keyword is hashed, the same value is calculated every time, so the mapping effect can be achieved. If the number of nodes changes, that is, when the system is expanded or reduced, the data with changed mapping relationships must be migrated , otherwise there will be a problem that the data cannot be queried.

Consistent hashing algorithm: The modulo operation is also used, but unlike the hashing algorithm, the hashing algorithm performs the modulo operation on the number of nodes, while the consistent hashing algorithm performs the modulo operation on 2^32, which is a fixed value .

Consistent hashing refers to mapping both "storage nodes" and "data" to a hash ring connected end to end . The result value of the data mapping finds the first node in the clockwise direction , which is the node where the data is stored.

In the consistent hashing algorithm, if a node is added or removed, only the node's clockwise adjacent successor nodes on the hash ring will be affected, and other data will not be affected  . The consistent hashing algorithm does not guarantee that nodes will be evenly distributed on the hash ring .

Instead of mapping real nodes to the hash ring, virtual nodes are mapped to the hash ring, and virtual nodes are mapped to actual nodes, so there is a "two-layer" mapping relationship. When the number of nodes increases, the distribution of nodes on the hash ring becomes relatively even . When nodes change, different nodes will share the system changes, so the stability is higher .

Guess you like

Origin blog.csdn.net/shisniend/article/details/131868428
Recommended