The optimization process of compressing 20M files from 30 seconds to 1 second

The optimization process of compressing 20M files from 30 seconds to 1 second

There is a requirement that the 10 photos sent from the front end need to be processed by the back end and then compressed into a compressed package and transmitted through the network stream. I have never been in contact with compressing files in Java before, so I found an example on the Internet and changed it, and it can be used after the modification. It is also increasing sharply, and finally it took 30 seconds to compress a 20M file. The code for the compressed file is as follows.

public static void zipFileNoBuffer() {
    File zipFile = new File(ZIP_FILE);
    try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile))) {
        //开始时间
        long beginTime = System.currentTimeMillis();

        for (int i = 0; i < 10; i++) {
            try (InputStream input = new FileInputStream(JPG_FILE)) {
                zipOut.putNextEntry(new ZipEntry(FILE_NAME + i));
                int temp = 0;
                while ((temp = input.read()) != -1) {
                    zipOut.write(temp);
                }
            }
        }
        printInfo(beginTime);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

Here, I found a 2M image and tested it ten times in a loop. The printed result is as follows, the time is about 30 seconds.

fileSize:20M
consum time:29599

The first optimization process - from 30 seconds to 2 seconds

The first thing that comes to mind when optimizing is to utilize buffersBufferInputStream . The in method only reads one byte at a time FileInputStream. read()There are also instructions in the source code.

/**
 * Reads a byte of data from this input stream. This method blocks
 * if no input is yet available.
 *
 * @return     the next byte of data, or <code>-1</code> if the end of the
 *             file is reached.
 * @exception  IOException  if an I/O error occurs.
 */
public native int read() throws IOException;

This is a call to a native method to interact with the native operating system to read data from disk. It is very time-consuming to call a native method to interact with the operating system every time a byte of data is read. For example, we now have 30,000 bytes of data. If we use FileInputStream it, we need to call the local method 30,000 times to get this data, and if we use a buffer (assuming the initial buffer size is enough to put 30,000 bytes of data), then It only needs to be called once. Because the buffer read()will read the data directly from the disk directly into the memory when the method is called for the first time. Then slowly return byte by byte.

> BufferedInputStreamInternally encapsulated a byte array for storing data, the default size is 8192

The optimized code is as follows

public static void zipFileBuffer() {
    File zipFile = new File(ZIP_FILE);
    try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));
            BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(zipOut)) {
        //开始时间
        long beginTime = System.currentTimeMillis();
        for (int i = 0; i &lt; 10; i++) {
            try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(JPG_FILE))) {
                zipOut.putNextEntry(new ZipEntry(FILE_NAME + i));
                int temp = 0;
                while ((temp = bufferedInputStream.read()) != -1) {
                    bufferedOutputStream.write(temp);
                }
            }
        }
        printInfo(beginTime);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

output

------Buffer
fileSize:20M
consum time:1808

FileInputStreamIt can be seen that the efficiency has been improved a lot compared to the first use .

Second optimization pass - from 2 seconds to 1 second

Using the buffer bufferhas already met my needs, but with the idea of ​​​​using what you have learned, I thought about using the knowledge in NIO to optimize it.

Use Channel

Why use Channelit? Because in NIO new out Channeland ByteBuffer. It is precisely because their structure is more in line with the way the operating system performs I/O that their speed is significantly improved compared to traditional IO. ChannelLike a mine containing coal, but ByteBuffera truck dispatched to the mine. That is to say, our interactions with data are all ByteBufferinteractions with .

FileChannelThere are three classes that can be generated in NIO . They are FileInputStream, FileOutputStream, and can both read and write RandomAccessFile.

The source code is as follows

public static void zipFileChannel() {
    //开始时间
    long beginTime = System.currentTimeMillis();
    File zipFile = new File(ZIP_FILE);
    try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));
            WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {
        for (int i = 0; i &lt; 10; i++) {
            try (FileChannel fileChannel = new FileInputStream(JPG_FILE).getChannel()) {
                zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE));
                fileChannel.transferTo(0, FILE_SIZE, writableByteChannel);
            }
        }
        printInfo(beginTime);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

We can see that ByteBufferthe data transfer is not used here, but transferTothe method used. This method is to connect the two channels directly.

This method is potentially much more efficient than a simple loop
* that reads from this channel and writes to the target channel.  Many
* operating systems can transfer bytes directly from the filesystem cache
* to the target channel without actually copying them. 

This is the descriptive text on the source code, which probably means that transferTothe efficiency of use is better than Channelreading one in a loop and then writing another in a loop Channel. The operating system is able to transfer bytes directly from the filesystem cache to the target Channelwithout the need for an actual copystage.

> The copy phase is a process from kernel space to user space

It can be seen that the speed has been improved somewhat compared to using buffers.

------Channel
fileSize:20M
consum time:1416

Kernel space and user space

So why is the process from kernel space to user space slow? The first thing we need to understand is what is kernel space and user space. In the common operating system, in order to protect the core resources in the system, the system is designed into four areas, and the permissions are increased as you go in, so Ring0 is called the kernel space, which is used to access some key resources. Ring3 is called user space.

> User mode, kernel mode: the thread in the kernel space is called the kernel mode, and the thread in the user space belongs to the user mode

So what if we need to access core resources at this time when the application (applications all belong to user mode) need to access core resources? Then you need to call the interface exposed in the kernel to make a call, which is called a system call . For example, at this time our application needs to access files on disk. At this point, the application will call the interface openmethod of the system call, and then the kernel will access the file on the disk and return the file content to the application. The general process is as follows

Direct and indirect buffers

Since we want to read a file from a disk, we have to waste such a big setback. Is there any simple way to enable our application to directly manipulate disk files without the need for the kernel to transfer? Yes, that is to create a direct buffer.

  • Non-direct buffer: The non-direct buffer is the kernel mode we mentioned above as the middleman, and the kernel needs to be in the middle as a relay every time.

  • Direct buffer: Direct buffer does not require kernel space as transit copy data, but directly applies for a space in physical memory. This space is mapped to kernel address space and user address space, and data access between applications and disks is done through This directly applied physical memory interacts.

Since direct buffers are so fast, why don't we all use direct buffers? In fact, the direct buffer has the following disadvantages. Disadvantages of direct buffers:

  1. unsafe
  2. Consumes more because it's not directly opening up space in the JVM. The recycling of this part of the memory can only rely on the garbage collection mechanism, and when the garbage is recycled is not under our control.
  3. When data is written into the physical memory buffer, the program loses the management of these data, that is, when the data is finally written to the disk can only be determined by the operating system, and the application program can no longer interfere.

> To sum up, so we use the transferTomethod is to directly open up a direct buffer. So the performance is much improved

Use memory mapped files

Another new feature in NIO is memory-mapped files. Why are memory-mapped files fast? In fact, the reason is the same as mentioned above, which is to open up a direct buffer in memory. Interact directly with data. The source code is as follows

//Version 4 使用Map映射文件
public static void zipFileMap() {
    //开始时间
    long beginTime = System.currentTimeMillis();
    File zipFile = new File(ZIP_FILE);
    try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));
            WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {
        for (int i = 0; i &lt; 10; i++) {

            zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE));

            //内存中的映射文件
            MappedByteBuffer mappedByteBuffer = new RandomAccessFile(JPG_FILE_PATH, "r").getChannel()
                    .map(FileChannel.MapMode.READ_ONLY, 0, FILE_SIZE);

            writableByteChannel.write(mappedByteBuffer);
        }
        printInfo(beginTime);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

print as follows

---------Map
fileSize:20M
consum time:1305

You can see that the speed is similar to the speed of using Channel.

Use Pipe

A Java NIO pipe is a one-way data connection between 2 threads. Pipe has a source channel and a sink channel. The source channel is used to read data, and the sink channel is used to write data. You can see the introduction in the source code, which probably means that the writing thread will block until a reading thread reads data from the channel. If there is no data to read, the read thread will also block until the write thread writes the data. until the channel is closed.

 Whether or not a thread writing bytes to a pipe will block until another
 thread reads those bytes

The effect I want is this. The source code is as follows

//Version 5 使用Pip
public static void zipFilePip() {

    long beginTime = System.currentTimeMillis();
    try(WritableByteChannel out = Channels.newChannel(new FileOutputStream(ZIP_FILE))) {
        Pipe pipe = Pipe.open();
        //异步任务
        CompletableFuture.runAsync(()-&gt;runTask(pipe));

        //获取读通道
        ReadableByteChannel readableByteChannel = pipe.source();
        ByteBuffer buffer = ByteBuffer.allocate(((int) FILE_SIZE)*10);
        while (readableByteChannel.read(buffer)&gt;= 0) {
            buffer.flip();
            out.write(buffer);
            buffer.clear();
        }
    }catch (Exception e){
        e.printStackTrace();
    }
    printInfo(beginTime);

}

//异步任务
public static void runTask(Pipe pipe) {

    try(ZipOutputStream zos = new ZipOutputStream(Channels.newOutputStream(pipe.sink()));
            WritableByteChannel out = Channels.newChannel(zos)) {
        System.out.println("Begin");
        for (int i = 0; i &lt; 10; i++) {
            zos.putNextEntry(new ZipEntry(i+SUFFIX_FILE));

            FileChannel jpgChannel = new FileInputStream(new File(JPG_FILE_PATH)).getChannel();

            jpgChannel.transferTo(0, FILE_SIZE, out);

            jpgChannel.close();
        }
    }catch (Exception e){
        e.printStackTrace();
    }
}

Summarize

  • Life needs to be learned everywhere, and sometimes it is just a simple optimization that allows you to learn all kinds of different knowledge in depth. Therefore, in the study, we should not ask for more understanding, not only to know this knowledge, but also to understand why we do it.
  • The unity of knowledge and action: After learning a knowledge, try to apply it again. This is how you can remember it securely.

Source address

Reference article

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324124632&siteId=291194637