20M files from 30 seconds to 1 second optimization

20M compressed file optimization process from 30 seconds to 1 second

There needs to be a demand for front-end pass over 10 photos, then later back-end processing compressed into a compressed packets over the network to spread output. Previously had contact with the compressed file in Java, so it directly to the Internet to find an example of a changed a bit used, later renamed finish can also be used, but with the front end of the preaching of the image size is getting bigger, time-consuming increasing dramatically, and finally test a little 20M compressed files actually takes 30 seconds. Code compressed files is as follows.

public static void zipFileNoBuffer() { File zipFile = new File(ZIP_FILE); try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile))) { //开始时间 long beginTime = System.currentTimeMillis(); for (int i = 0; i < 10; i++) { try (InputStream input = new FileInputStream(JPG_FILE)) { zipOut.putNextEntry(new ZipEntry(FILE_NAME + i)); int temp = 0; while ((temp = input.read()) != -1) { zipOut.write(temp); } } } printInfo(beginTime); } catch (Exception e) { e.printStackTrace(); } } 

Here we find a 2M-size image, and the cycle ten times tested. Print results are as follows, time is about 30 seconds.

fileSize:20M
consum time:29599 

The first optimization process - from 30 seconds to 2 seconds

First thought is to optimize the use of the buffer BufferInputStream. In FileInputStreamthe read()each read method is only one byte. Source also described.

/**
 * Reads a byte of data from this input stream. This method blocks
 * if no input is yet available.
 *
 * @return     the next byte of data, or <code>-1</code> if the end of the
 *             file is reached.
 * @exception  IOException  if an I/O error occurs.
 */
public native int read() throws IOException; 

This is a call to the local method to interact with the native operating system, data is read from the disk. Each read a byte of data is called a local method to interact with the operating system, is very time consuming. For example, we now have 30,000 bytes of data, if FileInputStreamyou need to call the local method 30,000 times to obtain these data, but if you use a buffer, then (here assume that the initial size of the buffer enough to hold 30,000 bytes of data) then only need to call one on the line. Because the buffer in the first call read()when the method reads data directly from the disk will directly into memory. Followed by a byte by byte return slowly.

BufferedInputStreamInside the package of a byte array for storing data, the default size 8192

After optimization of the code as follows

public static void zipFileBuffer() { File zipFile = new File(ZIP_FILE); try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile)); BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(zipOut)) { //开始时间 long beginTime = System.currentTimeMillis(); for (int i = 0; i &lt; 10; i++) { try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(JPG_FILE))) { zipOut.putNextEntry(new ZipEntry(FILE_NAME + i)); int temp = 0; while ((temp = bufferedInputStream.read()) != -1) { bufferedOutputStream.write(temp); } } } printInfo(beginTime); } catch (Exception e) { e.printStackTrace(); } } 

Export

------Buffer
fileSize:20M
consum time:1808 

Can be seen compared to the first use FileInputStreamefficiency has improved many of the

Second optimization process - from 2 seconds to 1 second

Use Buffer bufferwords have to meet my needs, but the idea of faith to apply their knowledge, to think about using NIO optimize knowledge.

Use Channel

Why use Channelit? Because in the new NIO out Channeland ByteBuffer. It is because of their structure more in line with Caozuojitong perform I / O way, so its speed compared to traditional IO terms of speed have been significantly improved. ChannelLike a coal contains mineral deposits, and ByteBufferis delivered to the mineral truck. That is our interaction with the data is and ByteBufferinteraction.

NIO can be generated in FileChannelthree classes. Respectively FileInputStream, FileOutputStreamand not only read but also write RandomAccessFile.

Source follows

public static void zipFileChannel() { //开始时间 long beginTime = System.currentTimeMillis(); File zipFile = new File(ZIP_FILE); try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile)); WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) { for (int i = 0; i &lt; 10; i++) { try (FileChannel fileChannel = new FileInputStream(JPG_FILE).getChannel()) { zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE)); fileChannel.transferTo(0, FILE_SIZE, writableByteChannel); } } printInfo(beginTime); } catch (Exception e) { e.printStackTrace(); } } 

We can see here is not used ByteBufferfor data transmission, but the use transferToof the method. This method is directly connected to two channels.

This method is potentially much more efficient than a simple loop
* that reads from this channel and writes to the target channel. Many * operating systems can transfer bytes www.gouyiflb.cn directly from the filesystem cache * to the target channel without www.leyouzaixan.cn actually copying them. 

This is a text description of the source code, probably means that the use of transferToefficient than a loop Channelto read out and then recycled write another Channelgood. Operating system can directly transmit byte buffer from the file to the target system Channelwithout the need of the actual copyphase.

> Copy stage is a process from kernel space to user space

You can see the speed compared to the use of a buffer has been some improvement.

------Channel
fileSize:20M
consum time:1416 

Kernel space and user space

So why turn the user space from kernel space this process will slow it? First, we need to understand what is the kernel space and user space. In the commonly used operating system in order to protect core resources in the system, so the system design into four regions, the greater the privileges inside, so Ring0 called kernel space, access to some key resources. Ring3 called user space.

> User mode, kernel mode: the thread is called the kernel space kernel mode, the thread is in the user belongs to the user mode space

那么我们如果此时应用程序(应用程序是都属于用户态的)需要访问核心资源怎么办呢?那就需要调用内核中所暴露出的接口用以调用,称之为系统调用。例如此时我们应用程序需要访问磁盘上的文件。此时应用程序就会调用系统调用的接口open方法,然后内核去访问磁盘中的文件,将文件内容返回给应用程序。大致的流程如下

直接缓冲区和非直接缓冲区

既然我们要读取一个磁盘的文件,要废这么大的周折。有没有什么简单的方法能够使我们的应用直接操作磁盘文件,不需要内核进行中转呢?有,那就是建立直接缓冲区了。

  • 非直接缓冲区:非直接缓冲区就是我们上面所讲内核态作为中间人,每次都需要内核在中间作为中转。

  • 直接缓冲区:直接缓冲区不需要内核空间作为中转copy数据,而是直接在物理内存申请一块空间,这块空间映射到内核地址空间和用户地址空间,应用程序与磁盘之间数据的存取通过这块直接申请的物理内存进行交互。

既然直接缓冲区那么快,我们为什么不都用直接缓冲区呢?其实直接缓冲区有以下的缺点。直接缓冲区的缺点:

  1. 不安全
  2. 消耗更多,因为它不是在JVM中直接开辟空间。这部分内存的回收只能依赖于垃圾回收机制,垃圾什么时候回收不受我们控制。
  3. 数据写入物理内存缓冲区中,程序就丧失了对这些数据的管理,即什么时候这些数据被最终写入从磁盘只能由操作系统来决定,应用程序无法再干涉。

> 综上所述,所以我们使用transferTo方法就是直接开辟了一段直接缓冲区。所以性能相比而言提高了许多

使用内存映射文件

NIO中新出的另一个特性就是内存映射文件,内存映射文件为什么速度快呢?其实原因和上面所讲的一样,也是在内存中开辟了一段直接缓冲区。与数据直接作交互。源码如下

//Version 4 使用Map映射文件
public static void zipFileMap(www.jujinyulee.com) { //开始时间 long beginTime = System.currentTimeMillis(); File zipFile = new File(ZIP_FILE); try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile)); WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) { for (int i = 0; i &lt; 10; i++) { zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE)); //内存中的映射文件 MappedByteBuffer mappedByteBuffer = new RandomAccessFile(JPG_FILE_PATH, "www.dongfangyuld.com").getChannel() .map(FileChannel.MapMode.READ_ONLY, 0, FILE_SIZE); writableByteChannel.write(mappedByteBuffer); } printInfo(beginTime); } catch (Exception e) { e.printStackTrace(www.jintianxuesha.com); } } 

打印如下

---------Map
fileSize:20M
consum time:1305 

可以看到速度和使用Channel的速度差不多的。

使用Pipe

Java NIO 管道是2个线程之间的单向数据连接。Pipe有一个source通道和一个sink通道。其中source通道用于读取数据,sink通道用于写入数据。可以看到源码中的介绍,大概意思就是写入线程会阻塞至有读线程从通道中读取数据。如果没有数据可读,读线程也会阻塞至写线程写入数据。直至通道关闭。

 Whether or not a thread writing bytes to a pipe will block until another
 thread reads those bytes

我想要的效果是这样的。源码如下

//Version 5 使用Pip
public static void zipFilePip() { long beginTime = System.currentTimeMillis(); try(WritableByteChannel out = Channels.newChannel(new FileOutputStream(ZIP_FILE))) { Pipe pipe = Pipe.open(); //异步任务 CompletableFuture.runAsync(()-&gt;runTask(pipe)); //获取读通道 ReadableByteChannel readableByteChannel = pipe.source(); ByteBuffer buffer = ByteBuffer.allocate(((int) FILE_SIZE)*10); while (readableByteChannel.read(buffer)&gt;= 0) { buffer.flip(); out.write(buffer); buffer.clear(www.hnawesm.com); } }catch (Exception e){ e.printStackTrace(); } printInfo(beginTime); } //异步任务 public static void runTask(Pipe pipe) { try(ZipOutputStream zos = new ZipOutputStream(Channels.newOutputStream(pipe.sink())); WritableByteChannel out = Channels.newChannel(zos)) { System.out.println("Begin"); for (int i = 0; i &lt; 10; i++) { zos.putNextEntry(new ZipEntry(i+SUFFIX_FILE)); FileChannel jpgChannel = new FileInputStream(new File(JPG_FILE_PATH)).getChannel(); jpgChannel.transferTo(0, FILE_SIZE, out); jpgChannel.close(); } }catch (Exception e){ e.printStackTrace(); } } 

总结

  • 生活处处都需要学习,有时候只是一个简单的优化,可以让你深入学习到各种不同的知识。所以在学习中要不求甚解,不仅要知道这个知识也要了解为什么要这么做。
  • 知行合一:学习完一个知识要尽量应用一遍。这样才能记得牢靠。

源码地址

Guess you like

Origin www.cnblogs.com/qwangxiao/p/11366940.html