I finally understand IO NIO Buffer Socket - zero basic notes, based on the new version of Shang Silicon Valley 2021 tutorial

foreword

This article is actually a hodgepodge. It integrates the IO/NIO of Shang Silicon Valley, the education of horse soldiers, as well as various online learning materials (all sources are marked and non-profit), some pre-knowledge of computer networks, and some of my own understanding. . The ultimate purpose of this simple note is to understand what IO and NIO are, Ollie!

People with poor foundations need a lot of pre-knowledge to learn this, otherwise they will not be able to understand. One of the features of this article is to introduce these prerequisite knowledge as much as possible, which is convenient for non-major/professionals/college scumbags to understand.

This article has a supporting code repository on Gitee for easy understanding : https://gitee.com/da-ji/java-se-demos

1. Preliminary knowledge

The theoretical basis is the following three articles (all from the public account: Code Farmer's Deserted Island Survival) from the bottom to the top, explaining clearly IO/NIO:
This is a series, read the article from the beginning:

Prerequisite 1: Processes and Threads

1. After reading this article, you still don’t understand threads and thread pools in high concurrency

After reading the link above: CPU execution only takes the address in memory of the instruction from the register (program counter). The execution process from the high-level language program we wrote to the CPU is as follows: The
insert image description here
main function is a thread and a process. It is the entry of the main program and is created during program startup. Of course you can also create multiple threads. After the main thread (which is also the main process) is started, it is created during program execution . Therefore, this address space already exists when the thread starts running, and the thread can use it directly. This is why various textbooks mention that creating threads is faster than creating processes (among other reasons of course).

As shown in the figure below, although each thread shares the address space of the main process, each thread has its own stack (which can be understood as the stack frame of the JVM, and the data generated by the function in the stack frame when it is executed includes: function parameters, local variables, return addresses, etc.)

insert image description here

Therefore, when each thread is created, it is done with the help of the operating system, and the operating system will allocate space such as stacks to the thread, so there is a certain overhead in the creation and destruction of threads. To address this overhead, we introduce the concept of thread pools:

insert image description here
insert image description here
insert image description here
insert image description here

Pre-knowledge 2: What happens at the bottom during IO

2. What happens to the program when reading the file?

After reading the article:

First of all, it is clear that not only the interaction between disk and memory (reading and writing files) is called IO, but the computer network sending and receiving messages is also called IO. The essence of IO is to copy the data from one side to the other side (the file copies the data to the memory, and the A computer copies the data packets to the B computer through the network)

The speed of any IO is extremely slow (compared to CPU), how to solve this problem? There are several solutions, one is to open another thread, the other is NIO, and the other is buffer (buffer).

buffer (buffer) concept:

insert image description here

The essence of all IO is the processing of Buffers. We put data into Buffers for the system to write external data, or read data from external systems from the system Buffers.

The concept of zero copy:
insert image description here

3. I finally understand, this article thoroughly understands I/O multiplexing

Second, IO style

https://zhuanlan.zhihu.com/p/25418336

Decorator pattern:

A lot of APIs for IO streams are actually the embodiment of the decorator pattern:

The most famous application of the decorator pattern in the Java language is the design of the Java I/O standard library. For example, InputStream subclasses FilterInputStream, OutputStream subclasses FilterOutputStream, Reader subclasses BufferedReader and FilterReader, and Writer subclasses BufferedWriter, FilterWriter, and PrintWriter, etc., are abstract decoration classes.
The following code is an example of the decorated class BufferedReader used to increase the buffer for FileReader:

BufferedReader in = new BufferedReader(new FileReader("filename.txt"));
String s = in.readLine();

insert image description here

So what is the decorator pattern?

https://mp.weixin.qq.com/s/hLLWmC61FwvtV4VZC57X5g

3. Problems with IO (blocking IO) and why NIO (non-blocking IO) was introduced

insert image description here
insert image description here

4. What is Buffer (concept)?

buffer (buffer) concept:

insert image description here

The essence of all IO is the processing of Buffers. We put data into Buffers for the system to write external data, or read data from external systems from the system Buffers.

Classes will be described in more detail later, but java.nio.Bufferhere is the unified Buffer concept.

The IO buffer can use BufferedReader or Byte array:byte[] buffer = new byte[1024] . For NIO buffers, see Chapter 8 of this article.

Five, NIO channel (Channel) and IO flow

IO is stream-oriented (Stream), and Channel in NIO is buffer-oriented (buffer)

It can be said that the two functions are the same, but the implementation is different.

Through the channel, we can operate the data source, but do not have to care about the specific physical structure of the data source. This data source may be various. For example, it can be a file or a network socket. In most applications, there is a one-to-one correspondence between channels and file descriptors or sockets. Channel is used to efficiently transfer data between a byte buffer and an entity on the other side of the channel (usually a file or socket).

Note that we're talking about bytes, not characters. That is to say, this channel can transfer any file.

Supplementary Knowledge: Bytes and Characters

The difference between bytes and characters can be checked: byte streams can handle all files (network transmission), while character streams can only handle plain text files.

Supplement:
Byte is a unit of measurement, indicating the amount of data. It is a unit of measurement used by computer information technology to measure storage capacity. Usually, a byte is equal to eight bits.
Character (Character) Letters, numbers, words and symbols used in computers, such as 'A', 'B', '$', '&', etc.
In UTF-8 encoding, one English word is one byte, and one Chinese word is three bytes.
In Unicode encoding, one English is one byte, and one Chinese is two bytes.

The two most basic byte stream classes in Java are InputStream and OutputStream, while character streams are Writer and Reader.

Channel is full duplex, it can both read and write. Taking FileChannel as an example, there are the following two application scenarios:
A. Read data from file to Buffer through FileChannel
B. Write data to FileChannel through Buffer, and then FileChannel writes the data back to the file

There are many kinds of Stream, covering from file to network (as mentioned above, they are the embodiment of decorator pattern):
insert image description here
There are also many kinds of Channel, covering from file to network:
insert image description here

Finally, whether it is Channel or Stream, the essence is to call the methods provided by the operating system layer by layer. Taking file IO as an example, whether it is FileInputStream or FileChannel, the native method is finally called: ReadFIle.

The code implements Gitee (the code location is in the preface of the article): the following figure location of JavaSE Demos :

insert image description here

6. What is Socket?

Socket is a socket. To put it directly, it is the process of encapsulating the three-way handshake of TCP to establish a connection and wave four times to release the connection.

Two computers want to communicate, one of them is responsible for reading data, first create a socket. Then just read wildly from that socket; as for how to create a three-way handshake, establish a connection, wave four times to release the connection, monitor the port number, etc., all of them can be simply configured in the socket, and the socket does it for us. .

Mr. Ma Bingbing said: socket helps us establish a quadruple: [source IP+port] and [destination IP+port] can determine an absolutely unique connection. It doesn't matter how to build and how to release, just leave it to the socket!

Socket is the middleware abstraction layer (connecting the application layer and the transport layer) for the communication between the application layer and the TCP/IP protocol suite , and it is a set of interfaces. In the design mode, Socket is actually a facade mode, which hides the complex TCP/IP protocol family behind the Socket interface. For users, a set of simple interfaces is all, and let the Socket organize the data to meet the specified requirements. protocol.

Normal Socket and NIO Socket

The code implements Gitee (the code location is in the preface of the article): the following figure location of JavaSE Demos :

The following is quoted from: www.cnblogs.com/blogtech/p/10142212.html
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

Seven, Channel: Socket channel (in fact, Socket Channel)

Note that the socket channel is different from the socket described above. It is a kind of Channel.
insert image description here

a.ServerSocketChannel

It is a Channel-based socket listener that adds channel semantics, so it can run in non-blocking mode

The code implements Gitee (the code location is in the preface of the article): the following figure location of JavaSE Demos :

insert image description here

b.SocketChannel

It is often used with ServerSocketChannel. The ServerSocketChannel above is just a listener, and this Channel operates a TCP network socket (Socket), which is a channel connected to a TCP network socket.

The code implements Gitee (the code location is in the preface of the article): the following figure location of JavaSE Demos :

insert image description here

insert image description here

As mentioned above, the reading and writing of Channel is buffer-oriented, and SocketChannel is no exception, and must cooperate with the buffer

c.DatagramChannel

The above SocketChannel is Socket-oriented TCP, and DatagramChannel is UDP-oriented. So DatagramChannel is connectionless.

DatagramChannel is connectionless, each datagram is a self-contained entity with its own destination address and data payload independent of other datagrams. Unlike stream-oriented sockets, DatagramChannel can send individual datagrams to different destinations. Likewise, DatagramChannel objects can also receive packets from arbitrary addresses. Each arriving datagram contains information about where it came from (source address)

The code implements Gitee (the code location is in the preface of the article): the following figure location of JavaSE Demos :
insert image description here

Eight, Java NIO Buffer explained in detail

The code implements Gitee (the code location is in the preface of the article): the following figure location of JavaSE Demos :
insert image description here

Introduction to Java NIO Buffer

The Buffer here refers to java.nio.Buffer, not the Buffer concept in the field of computer science mentioned earlier. Of course, NIO's buffer is also an implementation of Buffer in computer science.
insert image description here
insert image description here

So many sons of Buffer, the most commonly used is ByteBuffer

Basic usage of Buffer

insert image description here

Buffer's capacity, position, and limit are three important attributes

insert image description here
insert image description here
About the operation methods of the three important attributes of capacity, position and limit (rewind, clear, compact, mark, reset):
insert image description here
insert image description here

Buffer allocation, read and write data, flip (read and write switching)

insert image description here
insert image description here

buffer operation

a. Buffer fragmentation

In NIO, in addition to allocating or wrapping a buffer object, you can also create a sub-buffer based on the existing buffer object, that is, cut out a piece of the existing buffer as a new buffer, but The existing buffer and the created sub-buffer share data at the underlying array level, that is, the sub-buffer is equivalent to a view window of the existing buffer. A subbuffer can be created by calling the slice() method.

The code implements Gitee (the code location is in the preface of the article): BufferDemo2 - b01()

b. Read-only buffer

Read-only buffers are very simple, you can read them, but you cannot write data to them. Any regular buffer can be converted to a read-only buffer by calling the buffer's asReadOnlyBuffer() method, which returns a buffer that is exactly the same as the original buffer and shares data with the original buffer, except that it is only a buffer. read. If the content of the original buffer changes, the content of the read-only buffer also changes.

If an attempt is made to modify the contents of a read-only buffer, a ReadOnlyBufferException will be reported. Read-only buffers are useful for protecting data. When a buffer is passed to a method of an object, there is no way to know whether the method will modify the data in the buffer. Creating a read-only buffer guarantees that the buffer will not be modified. You can only convert regular buffers to read-only buffers, not read-only buffers to writable buffers.

The code implements Gitee (the code location is in the preface of the article): BufferDemo2 - b02()

c. Direct buffer

A direct buffer is a buffer for which memory is allocated in a special way to speed up I/O. The description in the JDK documentation is: Given a direct byte buffer, the Java virtual machine will do its best to directly address it Perform native I/O operations. That is, it tries to avoid copying the buffer's contents to or from an intermediate buffer before (or after) each call to the underlying operating system's native I/O operation. To allocate a direct buffer, you need to call the allocateDirect() method instead of the allocate() method, which is no different from a normal buffer.

The code implements Gitee (the code location is in the preface of the article): BufferDemo2 - b03()

d. Memory mapped file IO

Memory-mapped file I/O is a method of reading and writing file data that can be significantly faster than regular stream-based or channel-based I/O. Memory-mapped file I/O is accomplished by making the data in the file appear as the contents of a memory array, which at first sounds like nothing more than reading the entire file into memory, but that's not the case. In general, only the part of the file that is actually read or written is mapped into memory.

The code implements Gitee (the code location is in the preface of the article): BufferDemo2 - b04()

Nine, NIO Seletor (multiplexer)

The code implements Gitee (the code location is in the preface of the article): as shown in the following figure
insert image description here

A brief description of the Selector concept:

insert image description here
insert image description here
insert image description here
insert image description here

The basic idea of ​​a multiplexer:

See Chapter 6: Normal Sockets and NIO Sockets.

Simply put, using Selector communication, the server (ServerSocketChannel) and the client's channel (SocketChannel) can be registered to the Selector according to a certain Key.
Then the Selector can use a while loop for polling (similar to event monitoring), and when there is a real read and write operation event, different processing logic is performed according to different Keys.

Summary of NIO programming steps:

Step 1: Create a Selector Selector
Step 2: Create a ServerSocketChannel channel and bind the listening port
Step 3: Set the Channel channel to non-blocking mode
Step 4: Register the Channel with the Socketor selector and listen for connection events
Step 5 : Call the Selector's select method (cyclic call) to monitor the readiness of the channel
Step 6: Call the selectKeys method to obtain the ready channel set
Step 7: Traverse the ready channel set, determine the type of ready event, and implement specific business operations
Step 8: According to the business, decide whether you need to register the monitoring event again, and repeat the third step.

[Summary] Ten: NIO BIO AIO

BIO (synchronous blocking)

BIO is actually traditional IO, B of BIO (Blocking I/O) stands for Blocking, synchronous blocking I/O mode, data reading and writing must be blocked in a thread to wait for its completion.

The characteristic of BIO is that after the server listens to a request initiated by a client, it begins to process and respond. The processing here includes business logic processing, establishing a socket, shaking hands three times and waving hands four times, etc. The point is: as long as the request cannot be processed, other requests will not be able to come in, and you can only wait for the processing of the request to end. So it is called synchronous blocking IO.

We can use multi-threading and thread pooling methods to improve BIO , just open a thread each time a request is processed. But even with this method, the bottom layer is still blocking IO, but parallel processing is added.

NIO (synchronous non-blocking)

See Chapter 3 of this article for details: IO (the problem of blocking IO), and why NIO was introduced

NIO is a synchronous non-blocking I/O model. It is mentioned above that thread pools can be used to improve BIO, but the number of thread pools is limited after all. For example, the size of the thread pool is 100, the 101st request comes, the thread pool still cannot process the request, and the request must also block and wait.

The core of NIO's implementation of this mechanism is the NIO Selector (multiplexer). The IO call will not be blocked. Instead, the monitoring event is registered on the Selector. Only when the actual IO operation arrives, and the buffer and channel are in place, and the IO conditions are met, the IO operation will be performed. Thereby, the purpose of polling and processing multiple requests by one Selector thread can be realized. Of course, in Java, multiple requests are regarded as Channels, that is, a Selector with relatively few threads can handle a large number of Channels.

IO is for streams, and NIO is for buffers.

Why NIO is said to be non-blocking:

Due to the multiplexing mechanism, if no listening events (read and write events) are detected, the polling will continue. Only after the real IO operation is monitored, the blocking call will be made, which prevents the occurrence of blocking to a certain extent.
insert image description here

So NIO doesn't block like IO. After putting the data in the buffer, the thread can do other things. until the other end empties the buffer.

AIO (asynchronous non-blocking)

A stands for Async, which means asynchronous. This involves knowledge point synchronization and asynchrony, as well as the CallBack callback mechanism.

Asynchronous IO is implemented based on the event and callback mechanism, that is, the application will return directly after the operation, and will not be blocked there. When the background processing is completed, the operating system will execute a callback to notify the corresponding thread for subsequent operations.

Guess you like

Origin blog.csdn.net/weixin_44757863/article/details/121306876