How to solve the I/O bottleneck in high concurrency?

We all know that in the current context of the big data era, I/O is slower than memory, especially performance issues related to I/O are more prominent.

In many application scenarios, I/O read and write operations have become an important bottleneck in system performance, which cannot be ignored.

What is I/O?

I/O is the main channel for machines to obtain and exchange information, and streams are the main method for performing I/O operations.

In computers, a stream represents the transfer of information. Streams maintain order, so for a specific machine or application, we usually call the information obtained from the outside the input stream (InputStream), and the information sent out from the machine or application is called the output stream (OutputStream).

Together they are called input/output streams (I/O streams).

When machines or programs exchange information or data, they usually first convert the object or data into a specific form of stream.

Then, through the transmission of the stream, the data reaches the designated machine or program. At the destination, the stream is converted back into object data.

Therefore, streams can be viewed as a means of carrying data, facilitating the exchange and transmission of data.

Java's I/O operation classes are located java.ioin the package. Among them, InputStream, OutputStream, Readerand Writerclasses are the four basic classes in the I/O package.

They handle byte streams and character streams respectively. The chart below illustrates this:

+-------------+  
|   InputStream   |  
+------+------+
^  
|  
+---------+---------+
|       FileInputStream     |
+-----------------------+


+-------------+  
|   OutputStream  |  
+------+------+
^  
|  
+---------+---------+
|     FileOutputStream   |
+-----------------------+


+-------------+  
|       Reader        |  
+------+------+
^  
|  
+----------+---------+
|     FileReader         |
+-----------------------+


+-------------+  
|       Writer         |  
+------+------+
^  
|  
+----------+---------+
|    FileWriter         |
+-----------------------+

Whether it is file reading and writing or network transmission/reception, the smallest storage unit of information is always bytes. So why are I/O stream operations classified into byte stream operations and character stream operations?

We know that converting characters to bytes requires encoding, and this process can be time-consuming.

If we don't know the encoding type, it is easy to encounter problems such as garbled characters. Therefore, the I/O stream provides an interface for working directly with characters, allowing us to easily perform character stream operations in our daily work.

byte stream.

InputStreamand OutputStreamare abstract classes of byte streams. These two abstract classes derive several subclasses, each of which is designed for different types of operations.

Depending on the specific requirements, you can choose different subclasses to implement the corresponding functions.

•If you need to perform file read and write operations, you can use FileInputStreamand FileOutputStream. They are suitable for reading data from files and writing data to files. •If you want to use an array for read and write operations, you can use ByteArrayInputStreamand ByteArrayOutputStream. These classes allow you to read and write data to byte arrays. •If you are doing regular string reading and writing operations and want to introduce buffering to improve performance, you can use BufferedInputStreamand BufferedOutputStream. These classes introduce buffers during the reading and writing process, effectively reducing the number of actual I/O operations and thereby improving efficiency.

character stream.

Readerand Writerare abstract classes of character streams. These two abstract classes also derive several subclasses, each of which is designed for different types of operations. The specific details are shown in the figure below:

+---------+  
|   Reader    |  
+------+------+
^  
|  
+---------+---------+
|   InputStreamReader   |
+-----------------------+
|      FileReader          |
+-----------------------+
|      CharArrayReader   |
+-----------------------+


+---------+  
|    Writer    |  
+------+------+
^  
|  
+---------+---------+
|   OutputStreamWriter   |
+-----------------------+
|      FileWriter          |
+-----------------------+
|      CharArrayWriter   |
+-----------------------+

I/O performance issues.

We know that I/O operations can be divided into disk I/O operations and network I/O operations.

The former involves reading data from a disk source into memory and then persisting the read information to a physical disk.

The latter involves acquiring information from the network into memory and ultimately transmitting the information back to the network.

However, whether it is disk I/O or network I/O, significant performance problems will be encountered in traditional I/O systems.

# 1. Multiple memory copies.

In traditional I/O, we can use to InputStreamread data from the source and input the data stream into the buffer. We can then use to OutputStreamoutput the data to external devices, including disk and network.

Before continuing, you can view the specific process of input operations in the operating system, as shown in the following figure:

9501e1f7ff7d9733e2d07c7faaaff617.png

 

•The JVM initiates read()a system call and sends a read request to the kernel. •The kernel sends a read command to the hardware and waits for the data to be ready. •The kernel copies the data into its own buffer. •operating system

The kernel copies the data into a user-space buffer and read()the system call returns.

During this process, data is first copied from the external device to kernel space and then from kernel space to user space.

This results in two memory copy operations. These operations result in unnecessary data copying and context switching, ultimately reducing I/O performance.

# 2. Blocking.

In traditional I/O, InputStreamoperations read()are usually implemented using while loops. It continues to wait until the data is ready before returning.

This means that if there is no ready data, the read operation will wait forever, causing the user thread to be blocked.

In situations where there are fewer connection requests, this approach works well, providing fast response times.

However, when handling a large number of connection requests, it becomes necessary to create a large number of listening threads. In this case, if the thread waits for data that is not ready, it will be blocked and enter the wait state.

Once threads are blocked, they will continue to compete for CPU resources, resulting in frequent CPU context switches. This situation increases the performance overhead of the system.

This is why in scenarios with high concurrency requirements, traditional blocking I/O can become inefficient due to the high cost of thread management and context switching.

Asynchronous programming and non-blocking I/O techniques are often used to alleviate these problems and improve system efficiency.

How to optimize I/O operations?

# 1. Use buffering.

Using buffering is an effective way to optimize read and write stream operations, reducing frequent disk or network accesses, thereby improving performance. Here are some ways to use buffering to optimize read and write stream operations:

Using buffered streams : Java provides classes like BufferedReaderand BufferedWriterthat can wrap other input and output streams, introducing a buffering mechanism during read and write operations. This allows data to be read or written in batches, reducing the frequency of actual I/O operations. Specify buffer size : When creating a buffered stream, you can specify the buffer size. Selecting the appropriate buffer size based on data volume and performance requirements can optimize read and write operations. Using java.nio : The Java NIO (New I/O) library provides more flexible and efficient buffer management. ByteBufferYou can better manage memory and data by using buffer classes such as . Read or write multiple items at once : By using the appropriate API, you can read or write multiple data items at once, reducing the number of I/O operations. Combine operations : If you need to perform consecutive read or write operations, consider combining them into larger operations to reduce system call overhead. Timely refresh : For output streams, timely calling flush()methods can ensure that data is written to the target immediately and does not just stay in the buffer. Use try-with-resources : In Java 7 and above, using try-with-resources try-with-resourcesensures that the stream is automatically closed and resources are released after the operation is completed to avoid resource leaks.

The following is an example code snippet for using buffering for file reading and writing:

try (BufferedReader reader = new BufferedReader(new FileReader("input.txt"));
     BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt"))) {


    String line;
    while ((line = reader.readLine()) != null) {
        // 处理行
        writer.write(line);
        writer.newLine(); // 添加新行
    }


} catch (IOException e) {
    e.printStackTrace();
}

# 2. Use DirectBufferreduced memory copying.

Using DirectBufferis a technique for reducing memory copies in I/O operations, especially in the context of Java NIO (New I/O).

DirectBufferAllows you to use non-heap memory directly, which can lead to more efficient data transfer between Java and native code.

This can be particularly beneficial in I/O operations involving large amounts of data.

Here's how to use DirectBufferthe method to reduce memory copying:

1. Allocate DirectBuffer : Instead of using a traditional Java heap base array, allocate from local memory using a ByteBuffer.allocateDirect()class such as DirectBuffer2. Wrap an existing buffer : You can also ByteBuffer.wrap()wrap an existing local memory buffer using just specify the local memory address. 3. Use with channel I/O : When using NIO channels ( FileChannel, SocketChanneletc.), data can be read directly into DirectBufferor DirectBufferwritten directly from, without additional copying. 4. Use with JNI : If you work with native code through the Java Native Interface (JNI), using it DirectBufferallows your native code to directly access and manipulate data without expensive memory copies. 5. Pay attention to memory release : Remember that DirectBufferdirect memory needs to be released explicitly when you are done using it to prevent memory leaks. Call methods DirectBufferon cleaner()to release associated local memory.

Here is a simplified example ByteBufferused in DirectBufferfor efficient I/O:

try (FileChannel channel = FileChannel.open(Paths.get("data.bin"), StandardOpenOption.READ)) {
    int bufferSize = 4096; // 根据需要调整
    ByteBuffer directBuffer = ByteBuffer.allocateDirect(bufferSize);






 int bytesRead;
    while ((bytesRead = channel.read(directBuffer)) != -1) {
        directBuffer.flip(); // 准备读取
        // 在直接缓冲区中处理数据
        // ...


        directBuffer.clear(); // 准备下一次读取
    }


} catch (IOException e) {
    e.printStackTrace();
}

# 3. Avoid blocking and optimize I/O operations.

Avoiding blocking and optimizing I/O operations are key to improving system performance and responsiveness. Here are some ways to achieve these goals:

1. Use non-blocking I/O : Use non-blocking I/O technology, such as Java NIO, to allow the program to continue performing other tasks while waiting for data to be ready. This is achieved through selectors, which enable a single thread to handle multiple channels. 2. Utilize asynchronous I/O : Asynchronous I/O allows a program to submit an I/O operation and be notified when it is completed. Java NIO2 (Java 7+) provides support for asynchronous I/O. This reduces thread blocking and enables other tasks to execute while waiting for I/O to complete. 3. Use thread pools : Effectively use thread pools to manage thread resources and avoid creating new threads for each connection. This reduces the overhead of thread creation and destruction. 4. Utilize event-driven models : Utilizing event-driven frameworks such as Reactor and Netty can effectively manage connections and I/O events and achieve efficient non-blocking I/O. 5. Separate CPU-intensive and I/O operations : Separate CPU-intensive tasks from I/O operations to prevent I/O from blocking the CPU. Separation can be done using multithreading or multiprocessing. 6. Batch processing : Combine multiple small I/O operations into a larger batch operation to reduce the overhead of individual operations and improve efficiency. 7. Use buffers : Use buffers to reduce frequent disk or network access and improve performance. This applies to both file I/O and network I/O. 8. Regular maintenance and optimization : Regularly monitor and optimize resources such as disks, networks, and databases to ensure that they maintain good performance. 9. Use specialized frameworks : Choose appropriate frameworks, such as Netty, Vert.xetc., which have efficient non-blocking and asynchronous I/O functions.

Depending on your application scenario and requirements, you can implement one or more of these methods to avoid blocking, optimize I/O operations, and enhance system performance and responsiveness.

# 4. Channel.

As discussed earlier, traditional I/O originally relied on InputStreamand OutputStreamoperated on streams, which worked in units of bytes.

In the case of high concurrency and large data, this method can easily lead to blocking, resulting in performance degradation.

In addition, copying output data from user space to kernel space and then to the output device adds system performance overhead.

In order to solve the performance problem, traditional I/O later introduced buffering as a means to alleviate blocking.

It uses buffer blocks as the smallest unit. However, even with buffering, overall performance is still less than ideal.

Then came NIO (New I/O), which was based on buffered block unit operations.

On the basis of buffering, it introduces two components: "channel" and "selector". These additions make non-blocking I/O operations possible.

NIOIdeal for situations with a large number of I/O connection requests. Together, these three components enhance the overall performance of I/O.

Guess you like

Origin blog.csdn.net/weixin_37604985/article/details/132550999