[Java basic knowledge] IO realization principle

1. Confusing IO concept

IO is the abbreviation of Input and Output, that is, input and output. In a broad sense, there are many input and output around the computer: mouse, keyboard, scanner, etc. And what we are going to discuss today is the input and output operations in the computer, mainly acting on the hardware devices such as memory, network card, hard disk and so on.

Speaking of the IO model, most people must have a confusing concept in their minds. What is the difference between "blocking", "non-blocking", "synchronous" and "asynchronous"? Many students were stupid and confused, and tried to search for relevant materials to explore the truth, but they were submerged in a vast concept.

Here is an attempt to briefly explain why this phenomenon occurs. One of the very important reasons is that the interpretation of the concept in the materials that everyone sees is from a different perspective, and some are from the perspective of the underlying core. I introduced the API directly at the java level or the Netty framework level, so it caused a certain degree of confusion for everyone.

So before we start, let's talk about the perspective of this article. We will explain IO from the level of the underlying kernel. Because change is inseparable, only by understanding the underlying principles, no matter how fancy the language level is, we can cope with the changes in the same way.

Two, user space and kernel space

Hardware layer (Hardware) 

Including CPU, memory, disk and network card that we are familiar with and related to IO;

Kernel Space 

After the computer is turned on, the kernel program will be run first. A private space occupied by the kernel program is the kernel space, and it can support access to all the instruction sets (ring0-ring3) of the CPU and all memory spaces, IOs and hardware devices;

User Space

Each ordinary user process has a separate user space, and the user space can only access restricted resources (the "protected mode" of the CPU), which means that the user space cannot directly manipulate hardware such as memory, network cards, and disks;

As mentioned above, we may have questions. What should the user space process want to access or manipulate the disk and network card?

For this reason, the operating system has opened up a unique and legal call entry "System Call Interface" in the kernel, which is what we often call system calls. System calls provide upper-level users with a set of APIs that can operate the underlying hardware. In this way, the user process can access the operating system kernel through system calls, and then can indirectly complete the operation of the underlying hardware. This access process is also the switch from user mode to kernel mode.

  • There are many common system calls, such as: memory mapping mmap(), file operation type open(), IO read and write read(), write(), and so on.

Three, IO model

1、 BIO(Blocking IO)

Let's take a look at the Java pseudo code of the BIO model that everyone is familiar with:

ServerSocket serverSocket = new ServerSocket(8080);        // step1: 创建一个ServerSocket,并监听8080端口
while(true) {                                              // step2: 主线程进入死循环
    Socket socket = serverSocket.accept();                 // step3: 线程阻塞,开启监听
      
    BufferedReader reader = new BufferedReader(nwe InputStreamReader(socket.getInputStream()));
    System.out.println("read data: " + reader.readLine()); // step4: 数据读取
  
  
    PrintWriter print = new PrintWriter(socket.getOutputStream(), true);
    print.println("write data");                           // step5: socket数据写入
}

This code can be simply understood as the following steps:

  • Create ServerSocket and monitor port 8080;
  • The main thread enters an endless loop to block the connection of the monitoring client, socket.accept();
  • Data reading, socket.read();
  • Write data, socket.write();

problem

The above three steps: accept(...), read(...), write(...) will cause thread blocking. The above code uses a single thread, which will cause the main thread to die directly in the blocked place.

optimization

We need to know that " blocking of the process does not consume CPU resources ", so in a multi-core environment, we can create multi-threads and throw the received requests to multi-threads for processing, thus effectively using the computer's resources Multi-core resources. Even in order to avoid creating a large number of threads to process requests, we can further optimize, create a thread pool, and use pooling technology to buffer requests that cannot be processed temporarily.

2. "C10K" problem

"C10K" or "client 10k" is used to refer to a large number of clients;

BIO looks very simple. In fact, it is more appropriate and optimal to use "BIO + thread pool" to handle a small number of concurrent requests. But facing a huge number of clients and requests, the disadvantages of using multithreading at this time gradually become prominent:

  • Relying heavily on threads, threads are still relatively consuming system resources (a thread occupies about 1M of space);
  • Frequent creation and destruction are expensive, because complex system calls are involved;
  • The cost of context switching between threads is high, because before thread switching occurs, the state of the previous task needs to be preserved so that when switching back, the state of this task can be loaded again. If the number of threads is large, it will cause the thread to perform context switching even longer than the thread execution time, and the CPU load will increase.

3. NIO non-blocking model

The next step is to truly move towards the " non-blocking " described by the Java NIO or Netty framework . NIO is called Non-Blocking IO or New IO. Due to the large number of threads that BIO may introduce, it can be simply understood that the way NIO handles problems is through single thread Or a small number of threads can achieve the purpose of processing a large number of client requests. In order to achieve this goal, the first thing to do is to make the blocking process non-blocking. To be non-blocking, it must be supported by the kernel, and at the same time, system call functions must be exposed to user-space processes. Therefore, the "non-blocking" here can be understood as the system call API level, and the real underlying IO operations are all blocking, we will slowly introduce it later.

In fact, the kernel has already supported "non-blocking". For example, the accept() method blocking example we just mentioned (Tips: the system call function corresponding to the accept method in java is also called accept). Look at the official The document describes its non-blocking part.

Linux use: man 2 accept The results are as follows:

The official document describes the accetp() system call by setting the " flags " parameter to " SOCK_NONBLOCK " to achieve non-blocking purpose. After non-blocking, the thread will always process the polling call. At this time, you can return special The exception code " EAGAIN " or " EWOULDBLOCK " tells the main program to continue polling if there is no connection yet.

We can easily imagine a general process after the program is non-blocking. Therefore, the biggest feature of the non-blocking mode is that the user process needs to actively ask whether the kernel data is ready!

Let's take a look at this calling process through a piece of pseudo code:

// 循环遍历
while(1) {
    // 遍历fd集合
    for (fdx in range(fd1, fdn)) {
        // 如果fdx有数据
        if (null != fdx.data) {
            // 进行读取和处理
            read(fdx)&handle(fdx);
        }
    }
}

This calling method also exposes the biggest drawback of the non-blocking mode, which is the need to keep the user process switching to the kernel mode, polling the connection status or reading and writing data. Is there a way to simplify the user space for loop polling process? That is the IO multiplexing model that we will focus on below .

4. IO multiplexing model

The non-blocking model allows the user process to poll call system functions all the time, and frequently switch the kernel mode. It's actually relatively simple to optimize. Let's imagine a business scenario. A business system will call B's basic service to query the information of a single user. With the development of the business, the logic of A has become more complicated, and it is necessary to check the information of 100 users. Obviously, A hopes B to provide a batch query interface, using collections as input parameters, and passing the data in one time saves frequent inter-system calls.

Multiplexing is actually almost the same as this realization idea, but entering the "collection" requires you to register/fill in the events of interest, read fd, write fd, or fd of the connection status, and then hand it over to the kernel to help you deal with it. .

Then let's take a look at several system calls that everyone may have heard of in multiplexing  -select(), poll(), epoll().

4.1 select()

The select()  constructor information is as follows:

/**
 * select()系统调用
 *
 * 参数列表:
 *     nfds       - 值为最大的文件描述符+1
 *    *readfds    - 用户检查可读性
 *    *writefds   - 用户检查可写性
 *    *exceptfds  - 用于检查外带数据
 *    *timeout    - 超时时间的结构体指针
 */
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

The description of select() in official documents :

DESCRIPTION

select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g.,input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.

select() allows the program to monitor multiple fd, blocking and waiting until one or more fd reaches the "ready" state.

The kernel uses select() to provide a batch-like interface for the user process, and the function itself will block until the fd returns to the ready state. Let's take a look at the implementation of the select() function in detail , so that we can better analyze its advantages and disadvantages. In the constructor of the select() function, we can easily see the input parameter type " fd_set ". It is implemented using the bitmap algorithm bitmap, using a fixed-size array (fd_set sets FD_SETSIZE to a fixed length of 1024 ), and each element in the array is a binary byte such as 0 and 1, and 0,1 maps to fd. Whether there is a read or write event at the position, for example: if fd == 5, then fd_set = 000001000.

At the same time,  fd_set  defines four macros to handle bitmap:

  • FD_ZERO(&set);   // The function of initialization and emptying, so that the set does not contain any fd
  • FD_SET(fd, &set);  // The operation of adding fd to the set collection and assigning a value to a certain position
  • FD_CLR(fd, &set);   // Clear fd from the set collection, remove the value of a certain position
  • FD_ISSET(fd, &set);   // Check whether the fd at a certain position is in the set

The benefits of using the bitmap algorithm are very obvious, the calculation efficiency is high, and the memory is small (using one byte, 8bit). We use pseudo-code and pictures to describe the process of user process calling select():

Assuming that fds is {1, 2, 3, 5, 7}, the bitmap corresponding to "01110101" is thrown to the kernel space for polling. When there is a read and write event, it will re-mark and stop blocking at the same time, and then return to user space as a whole. From this we can see that the shortcomings of the select() system call are also more obvious:

  • The complexity is O(n), the polling task is handed over to the kernel, and the complexity has not changed. After the data is retrieved, it is necessary to poll which fd has changed;
  • The user mode still needs to switch to the kernel mode continuously, until all the fds data is read, the overall overhead is still very large;
  • fd_set has a size limit and is currently hard-coded to 1024 ;
  • fd_set is not reusable and must be reset after each operation;

4.2 poll()

The poll()  constructor information is as follows:

/**
 * poll()系统调用
 *
 * 参数列表:
 *    *fds         - pollfd结构体
 *     nfds        - 要监视的描述符的数量
 *     timeout     - 等待时间
 */
int poll(struct pollfd *fds, nfds_t nfds, int *timeout);
  
  
### pollfd的结构体
struct pollfd{
 int fd;// 文件描述符
 short event;// 请求的事件
 short revent;// 返回的事件
}

The description of poll() in official documents :

DESCRIPTION

poll() performs a similar task to select(2): it waits for one of a set of file descriptors to become ready to perform I/O.

poll() is very similar to select(), it also blocks and waits until one or more fd reach the "ready" state.

Looking at the official document description, you can know that poll() and select() are very similar. The only difference is that poll() discards the bitmap algorithm, uses a custom structure pollfd , and encapsulates fd inside pollfd and passes The event variable registers interesting readable and writable events ( POLLIN, POLLOUT ), and finally   hand over pollfd to the kernel. When a read and write event is triggered, we can poll the pollfd to determine whether the fd has a read or write event by  judging revent.

As usual, we use pseudocode to describe the process of calling poll()  by the user process  :

 Compared with select() , poll() has the main advantage of using the structure of pollfd:

  • There is no limit of 1024 bitmap size;
  • Set by revents in the structure;

However, the problem of switching from user mode to kernel mode and O(n) complexity still exists.

4.3 epoll()

epoll() should be the most mainstream and most widely used group of multiplexed function calls at present. Nginx and Redis that we are familiar with are widely used in this mode.

Next, we focus on the analysis, the implementation of epoll() adopts a "three-step" strategy, which are epoll_create(), epoll_ctl(), and epoll_wait().

4.3.1 epoll_create()

/**
 * 返回专用的文件描述符
 */
int epoll_create(int size);

The user process creates a space in the kernel space through the  epoll_create()  function (for ease of understanding, it can be imagined as creating a whiteboard), and returns the fd describing this space.

4.3.2 epoll_ctl()

/**
 * epoll_ctl()系统调用
 *
 * 参数列表:
 *     epfd       - 由epoll_create()返回的epoll专用的文件描述符
 *     op         - 要进行的操作例如注册事件,可能的取值:注册-EPOLL_CTL_ADD、修改-EPOLL_CTL_MOD、删除-EPOLL_CTL_DEL
 *     fd         - 关联的文件描述符
 *     event      - 指向epoll_event的指针
 */
int epoll_ctl(int epfd, int op, int fd , struce epoll_event *event );

We just said that a specific space "whiteboard" can be created through epoll_create() , then through epoll_ctl()  we can register interesting events on this "whiteboard" through a custom epoll_event structure.

  • Registration-EPOLL_CTL_ADD
  • Modification-EPOLL_CTL_MOD
  • Delete-EPOLL_CTL_DEL

4.3.3 epoll_wait()

/**
 * epoll_wait()返回n个可读可写的fds
 *
 * 参数列表:
 *     epfd           - 由epoll_create()返回的epoll专用的文件描述符
 *     epoll_event    - 要进行的操作例如注册事件,可能的取值:注册-EPOLL_CTL_ADD、修改-EPOLL_CTL_MOD、删除-EPOLL_CTL_DEL
 *     maxevents      - 每次能处理的事件数
 *     timeout        - 等待I/O事件发生的超时值;-1相当于阻塞,0相当于非阻塞。一般用-1即可
 */
int epoll_wait(int epfd, struce epoll_event *event , int maxevents, int timeout);

epoll_wait()  will block and wait until the hard disk, network card and other hardware device data is ready to initiate a hard interrupt , interrupt the CPU, the CPU will immediately perform the data copy work, the data is transferred from the disk buffer to the kernel buffer, and the ready fd is released at the same time To the ready queue for user mode to read. The user mode blocking stops, a specific number of readable and writable fds is received , and the user mode is returned for data processing.

The overall process can be further understood through the following pseudo code and diagram:

epoll()  basically perfectly solves the two problems left by the  poll()  function:

  • There is no frequent switching from user mode to kernel mode;
  • O(1) complexity, the returned "nfds" is a definite readable and writable quantity. Compared with the previous cycle n times to confirm, the complexity is reduced a lot;

Four, synchronous and asynchronous

Careful friends may find that this article has been explaining " blocking " and " non-blocking ". The concepts of " synchronous " and " asynchronous " are not involved. In fact, in many scenarios, synchronous & asynchronous and blocking & non-blocking are basically A synonym.

  • Blocking and non-blocking are suitable from the system call API level, such as the system calls such as select() and poll() introduced in this article
  • Synchronous and asynchronous are more suitable from the perspective of the application. When the application executes code fragments synchronously, the result will not be returned immediately. At this time, the underlying IO operation may not be blocking, and it is entirely possible that it is non-blocking.

so:

  • Blocking and non-blocking: if the read and write is not ready or the read and write is not completed, whether the function should always wait or use polling;
  • Synchronous and asynchronous: Synchronous is that reading and writing are done by the application. Asynchronous means that reading and writing are done by the operating system, and the application is notified through a callback mechanism.

By the way, here are two modes that you may often hear: Reactor and Preactor.

  • Reactor mode: active mode.
  • Preactor mode: Passive mode.

Five, summary

This article explains a process from BIO to NIO from the bottom, focusing on several system calls select(), poll(), and epoll() of IO multiplexing, and analyzes the advantages and disadvantages of each, technology All of them are continuously developing and evolving, and there are also many pain points at present. In the follow-up, I will continue to introduce the " zero-copy " technology related to this , as well as the Java NIO and Netty frameworks.

 

Guess you like

Origin blog.csdn.net/qq_41893274/article/details/113063446