[Java IO Overview] The core is NIO (two BIO + three IO + one Reactor thoroughly understand)

I encountered Java IO in the interview. Basically speaking, it is at the principle level. The
core of NIO-Reactor-netty-RPC (Dubbo Thrift) , which has nothing to say at the use level, is NIO. NIO is always asked in the interview. After all, netty uses NIO. People ask BIO AIO, these two points are good
Netty: It is not easy to program Java NIO directly, and it is difficult to write highly concurrent and robust programs. It not only requires super-class programming skills, but also requires several complex areas ( Network programming, multi-threaded processing and concurrency) expertise. Netty gracefully handles the knowledge of these fields, so that even novice network programming can use it.
Netty is an asynchronous event-driven network application framework that supports the rapid development of maintainable and high-performance protocol - oriented servers and clients. It encapsulates Java NIO and shields the complex low-level details.
1. First of all, it is a framework, a "semi-finished product" and cannot be used out of the box. You have to take it to do some customization, use it to develop your own application, and then run it (just like using Spring). If you want to develop your own high-performance RPC framework, the RPC calling protocol, data format and order are also defined by yourself, and the existing HTTP cannot be played at all, then using Netty is an excellent choice. A more well-known example is Alibaba's Dubbo. The bottom layer of this RPC framework is Netty.
2. High performance and high reliability. With Netty, you don't need to program based on NIO.

Three JavaIO + Reactor finally understand one:

[Java IO] Use level: the use of BIO, the first article (non-interview focus, interviews use NIO)

[Java IO] Use level: the use of BIO, the second part (non-interview focus, interviews use NIO)

JavaIO 001 Linux five IO models

JavaIO 002 Java three IO models

JavaIO 003 Reactor three modes

Reactor finally got it

**1. The three types of Java IO are implemented based on the five types of IO at the bottom of Linux (the related blog is good: two blogs: Linux five types of IO + Java three types of IO)
2. The three modes of Reactor are implemented based on Java NIO, It is a encapsulation of Nio (the related blog is good: a blog: three modes from NIO to Reactor)
3. Netty is implemented based on Reactor (Nio+socket becomes a network framework) (two blogs: netty source code analysis, explanation How is netty implemented based on Reactor)
4. RPC can use netty + spring + zookeeper to implement an RPC framework (a blog: practice: use netty + spring + zookeeper to implement an RPC framework)
4. Many RPC frameworks use netty As communication: Ali Dubbo RPC framework, Taobao's RocketMQ message middleware, game field, big data field Hadoop **

The relationship between serialization and RPC: RPC has network transmission, so a serialization technology for Java Beans is needed

Five io models of Linux

Java three IO and Linux five IO and their corresponding relationship
1. In Java, there are three main IO models, namely blocking IO (BIO), non-blocking IO (NIO) and asynchronous IO (AIO).
2. In the Linux (UNIX) operating system, there are five IO models, namely: blocking IO model, non-blocking IO model, IO multiplexing model, signal-driven IO model, and asynchronous IO model.
3. The IO-related API provided in Java actually relies on the IO operation at the operating system level when processing files .

Java BIO corresponds to Linux synchronous non-blocking IO
Java NIO corresponds to Linux signal driven IO
Java AIO corresponds to Linux asynchronous IO

Gold finger: Five ways to summarize in one sentence, easy to recite
1. Synchronous blocking IO is blocked all the way (during waiting for data, the recvfrom system call);
2. Synchronous non-blocking IO waiting for data polling (while waiting for data, the recvfrom system call is continuously issued) , Copy data blocking
3. Synchronization signal drives IO to wait for data not to block (establish a signal processing program to receive the sigio signal, send the sigaction system call to get the return, and send the recvfrom system call when the signal processing program receives the sigio signal in the kernel space ), copying data blocking
4. Asynchronous IO waiting for data and copying data are not blocked (issuing the aio_read system call to get the return, the signal handler receives the specified signal in the aio_read submitted by the kernel space, indicating that the copy is complete, and the program is used to process the datagram )
5. IO multiplexing is blocked in the whole process (issue the select system call, and after receiving the readable instruction, issue the recvfrom system call)
Golden Finger: io multiplexing is similar to signal-driven io memory, and the signals sent are not the same, but not the same It's just that you can remember like this

Java BIO NIO AIO

Interview language organization:
recommendation: comparison, theoretical language
Not recommended: pictures, source code, examples

Gold finger: One sentence summary About BIO NIO AIO: The
server adopts BIO with synchronous blocking, the
server adopts BIO with synchronous blocking thread pool.
After JDK4, the server adopts synchronous non-blocking NIO.
After JDK7, the server adopts asynchronous non-blocking AIO.

Java BIO corresponds to Linux synchronous non-blocking IO
Java NIO corresponds to Linux signal driven IO
Java AIO corresponds to Linux asynchronous IO

Cheat: Synchronous and asynchronous, blocking and non-blocking
synchronous and asynchronous

Synchronization: After initiating a call, synchronization means that the call does not return before the callee has processed the request.

Asynchronous:
Asynchronous means that after initiating a call, the callee's response is immediately received, indicating that the request has been received, but the callee has not returned a result. At this time, we can handle other requests. The callee usually relies on events, callbacks and other mechanisms. To notify the caller of the result.

The biggest difference between synchronous and asynchronous is that if asynchronous, the caller does not need to wait for the processing result, and the callee will notify the caller of the return result through a callback mechanism.

Blocking and non-blocking

Blocking: Blocking is to initiate a request, and the caller has been waiting for the result of the request to return, that is, the current thread will be suspended, unable to perform other tasks, and can only continue when the conditions are ready.

Non-blocking: Non-blocking is to initiate a request, the caller does not have to wait for the result to return, and can do other things first.

BIO:
1. BIO is also accept read write three operations, NIO is also accept read write three operations, AIO is also accept read write three operations, any io is accept read write three operations;
2. Bio is accept read write They are all synchronous blocking operations. This cannot be changed. To improve bio, you must use the thread pool, but the cost of thread creation and destruction is relatively high, especially for Linux, where a thread is a process, but there is no way. There is only one way.
In Bio, accept read and write are all synchronous blocking operations : synchronous blocking I/O mode, one request for one response, data reading and writing must be blocked in a thread waiting for its completion. For the server using the BIO communication model, an independent Acceptor thread is usually responsible for monitoring the client's connection. We generally listen to the request by calling the accept() method in the while(true) loop to wait for the connection from the client. Once the request receives a connection request, a communication socket can be established on this communication socket. For read and write operations, you can no longer receive connection requests from other clients at this time, and can only wait for the operation of the currently connected client to complete. However, multiple client connections can be supported through multithreading.
Multithreading is pseudo asynchronous IO: If you want the BIO communication model to handle multiple client requests at the same time, you must use multithreading (the main reason is that the three main functions involved in socket.accept(), socket.read(), and socket.write() are all Synchronous blocking), which means that after receiving the client connection request, it creates a new thread for each client for link processing. After the processing is completed, it returns a response to the client through the output stream, and the thread is destroyed. This is a typical request-response communication model. We can imagine that if this connection does nothing, it will cause unnecessary thread overhead, but it can be improved by the thread pool mechanism, which can also make the creation and recycling of threads relatively low. Using FixedThreadPool can effectively control the maximum number of threads, guarantee the control of the limited resources of the system, and realize the pseudo asynchronous I/O model of N (the number of client requests): M (the number of threads processing client requests) (N It can be much larger than M), which will be described in detail in the following section "Pseudo Asynchronous BIO".
Multithreading and pseudo-asynchronous IO are expensive, but there is no way : in the Java virtual machine, threads are precious resources, and the cost of thread creation and destruction is high. In addition, the cost of thread switching is also high. Especially in an operating system like Linux, a thread is essentially a process, and creating and destroying threads are both heavyweight system functions. If the amount of concurrent access increases, the number of threads will expand rapidly, which may cause problems such as thread stack overflow and failure to create new threads, which will eventually lead to process downtime or death, and failure to provide external services.

Pseudo-asynchronous IO:
Thread pool : In order to solve the problem that a link faced by synchronous blocking I/O requires a thread to process, someone later optimized its thread model. One by one, the backend uses a thread pool to handle multiple clients. The request access from the client side forms a proportional relationship between the number of clients M: the maximum number of threads in the thread pool N, where M can be much larger than N. Thread resources can be flexibly allocated through the thread pool, and the maximum value of threads can be set to prevent massive concurrency. Access causes thread exhaustion.
Pseudo-asynchronous IO : A kind of pseudo-asynchronous I/O communication framework can be realized by using thread pool and task queue. Its model is shown in the figure above. When a new client accesses, encapsulate the client's Socket into a Task (this task implements the java.lang.Runnable interface and becomes a multi-threaded class) and sends it to the back-end thread pool for processing. The thread pool maintains a message queue and N active threads to process tasks in the message queue. Because the thread pool can set the size of the message queue and the maximum number of threads, its resource occupation is controllable, and no matter how many clients concurrently access it, it will not cause resource exhaustion and downtime.
Pseudo-asynchronous IO or BIO : The pseudo-asynchronous I/O communication framework adopts a thread pool implementation, so it avoids the problem of thread resource exhaustion caused by creating a separate thread for each request. However, because its underlying layer is still a synchronous blocking BIO model, it cannot solve the problem fundamentally.

NIO:
The N in NIO can be understood as Non-blocking, not simply New. It supports buffer-oriented, channel-based I/O operation methods.
NIO provides traditional BIO model Socket and ServerSocket corresponding SocketChannel and
ServerSocketChannel
two different socket channel to achieve both channel supports blocking and non-blocking modes ( used when the server channel at the start In the five steps, it is set to non-blocking. The client channel obtained after accept connection is set to non-blocking . Among them, the server has only one channel when it starts, and every time a client connection is received, there is a client channel, n There are n client channels for a client connection, and a client disconnection will destroy the client channel and cancel the selectionKey in the select ). The use of blocking mode is just like the traditional support, simpler, but the performance and reliability are not good; the non-blocking mode is just the opposite. For low-load, low-concurrency applications, you can use synchronous blocking I/O to improve development speed and better maintainability; for high-load, high-concurrency (network) applications, you should use
NIO's non-blocking mode to develop
.

Interviewer: What is the difference between NIO and IO?
Core: Start with the NIO stream being synchronous non-blocking, while the IO stream is synchronous blocking IO. Then, analyze some improvements brought by NIO from the 3 core components/features of NIO.
First, the IO stream is blocked, and the NIO stream is not blocked .
1.1 The various streams of Java IO are blocked. This means that when a thread calls accept() read() or write(), the thread is blocked until some data is read or the data is completely written. The thread cannot do anything during this period.
1.2 Java NIO allows us to perform non-blocking IO operations.
Non-blocking read read:
read data from the channel to the buffer in a single thread, and can continue to do other things at the same time, when the data is read into the buffer, the thread continues to process the data.
Non-blocking write write:
A thread requests to write some data to a channel, but does not need to wait for it to be completely written. This thread can do other things at the same time.
Second, the three elements of NIO: Buffer IO is stream oriented, while NIO is Buffer oriented .
Buffer is an object that contains some data to be written or read. Adding the Buffer object to the NIO class library reflects an important difference between the new library and the original I/O. In stream-oriented I/O, you can write data directly or read data directly into the Stream object. Although Stream also has an extended class at the beginning of Buffer, it is just a wrapper class for the stream. It still reads from the stream to the buffer, while NIO reads directly into the Buffer for operation.
In NIO Room, all data is processed with buffer.
When reading data, it is directly read into the buffer;
When writing data, it is written into the buffer.
Any time you access the data in NIO, it is operated through the buffer.
The most commonly used buffer is ByteBuffer, a ByteBuffer provides a set of functions for manipulating byte arrays. In addition to ByteBuffer, there are other buffers. In fact, every Java basic type (except Boolean type) corresponds to a buffer.

Thirdly, the three elements of NIO are Channel . NIO reads and writes through Channel .
The channel is bidirectional, readable and writable, while the read and write of the stream is unidirectional. Regardless of reading and writing, the channel can only interact with the Buffer. Because of the Buffer, the channel can read and write asynchronously.

Fourth, the three elements of NIO Selectors (selectors) NIO has selectors, but IO does not .
The selector is used to process multiple channels using a single thread. Therefore, it requires fewer threads to process these channels. Switching between threads is expensive for the operating system.
Therefore, in order to improve the efficiency of the system, the selector is useful.
The channel is registered in the selector,
Insert picture description here

Remember, all IOs in NIO start from Channel.

Read data from the channel: create a buffer, and then request the channel to read the data.

Write data from the channel: create a buffer, fill it with data, and request the channel to write data.

Data read and write operation diagram:

Insert picture description here

From BIO to NIO
ServerSocket to ServerSocketChannel
NIO three elements: Buffer Channel Selector
buffer is full to read, this is the synchronous non-blocking IO recvfrom among the five IOs of Linux, polling
channel ServerSocket becomes ServerSocketChannel Socket becomes SocketChannel
Selector for selection

Three things about NIO: Buffer is useless for the thread that accepts the request, but only useful for the thread that performs IO operations (read/write)

Note: Here, all events that occur are handed over to a single thread to process. If the performance is not enough, you can open a thread pool to process events.
For example, there are three modes of Reactor: single thread, thread pool, master-slave thread pool.
Golden finger: select model, redis single-thread connection with multiple, good
while (selector.select()>0){ //If there is no prepared channel , Here will be blocked, reduce CPU consumption, ready to pass, use immediately

Why are people reluctant to use JDK native NIO for development? As everyone can see from the above code, it is really difficult to use! In addition to complex programming and difficult programming models, it also has the following criticized problems:
JDK's NIO bottom layer is implemented by epoll, the implementation of the much criticized empty polling bug will cause the CPU to soar 100%
. The emergence of Netty is very large. The above has improved some unbearable problems of JDK native NIO. Look at the following Reactor three modes to know that the operation of encapsulating NIO is completed.

Asynchronous IO is implemented based on the event and callback mechanism, that is, the application will return directly after the operation, and will not be blocked there. When the background processing is completed, the operating system will notify the corresponding thread for subsequent operations.
AIO is the abbreviation of asynchronous IO. Although NIO provides a non-blocking method in network operations, the IO behavior of NIO is still synchronous. For NIO, our business thread is notified when the IO operation is ready, and then this thread performs the IO operation by itself, and the IO operation itself is synchronized.

Three modes from NIO to Reactor

Interview language organization: (remember, it’s easy to recite after understanding, understand and prevent interviewer’s questions)
Three elements
Reactor model is based on event-driven threading model, which can be divided into Reactor single-threaded model, Reactor multi-threaded model, master-slave The Reactor multithreading model is usually implemented based on I/O multiplexing. The different roles and responsibilities are: Dispatcher is responsible for event distribution, Acceptor is responsible for handling client connections, and Handler is handling non-connection events (for example: read and write events).
1. Reactor single-threaded model
1. Principle diagram
In the Reactor single-threaded model, operations are completed in the same Reactor thread. According to the different types of events, the Dispatcher forwards the events to different roles for processing. Connection events are forwarded to Acceptor for processing, and read and write events are forwarded to different Handlers for processing.
Insert picture description here
2. Implementation diagram In the
NIO implementation, you can register the Accept event to the select selector and poll whether there is an "accept ready" event. If it is "connection ready", it will be distributed to the Acceptor role; the "write ready" event will be distributed to the Handler role responsible for writing; the "read ready" event will be distributed to the Handler role responsible for reading. This is all things are handled in one thread.
Insert picture description here
2. Reactor multi-threading model
1. Principle diagram
In the Reactor multi-threading model. According to the different types of events, the Dispatcher forwards the events to different roles for processing. Connection events are forwarded to Acceptor for single-thread processing, and read and write events are forwarded to different Handlers for processing by the thread pool.
Insert picture description here
2. Realize the icon
In the NIO implementation, you can register the Accept event to the select selector and poll whether there is an "accept ready" event. If it is "connection ready" distributed to the Acceptor role processing, here processing "connection ready" is a thread; the "write ready" event is distributed to the Handler role responsible for writing and processed by the thread pool; the "ready ready" event is distributed to the person responsible for reading The Handler role is handled by the thread pool.
Insert picture description here
3. Master-slave Reactor multi-threading model
1. Principle diagram
Reactor multi-threading model. After Acceptor accepts the client connection request, it creates a SocketChannel and registers it in the Select of a thread in the Main-Reactor thread pool; the specific handling of read and write events is Use thread pool processing (Sub-Reactor thread pool).
Insert picture description here
2. Realize the diagram.
Register the Accept event to the select selector, poll whether there is an "accept ready" event; "Connect ready" is distributed to the Acceptor role processing, and a new SocketChannel is created and forwarded to a thread pool in the Main-Reactor Thread processing; In a certain thread of the designated Main-Reactor, register the SocketChannel for read and write events; when the "ready to write/ready" events are processed by the thread pool (Sub-Reactor thread pool).
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_36963950/article/details/108020214