Large file copy, try NIO memory map

Recent projects where there is a need to demand implementation file copy, read and write java file copy flow, it is easy to think of IO in InputStream and OutputStream and the like, but the Internet search a bit file copy is there are many ways, in addition to tools IO, as well as NIO, Apache provided, JDK comes with the file copy method

IO copy

public class IOFileCopy {

    private static final int BUFFER_SIZE = 1024;

    public static void copyFile(String source, String target) {
        long start = System.currentTimeMillis();
        try(InputStream in = new FileInputStream(new File(source));
            OutputStream out = new FileOutputStream(new File(target))) {
            byte[] buffer = new byte[BUFFER_SIZE];
            int len;
            while ((len = in.read(buffer)) > 0) {
                out.write(buffer, 0, len);
            }

            System.out.println(String.format("IO file copy cost %d msc", System.currentTimeMillis() - start));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
复制代码

Traditional IO file read process can be divided into the following steps:

  • Kernel data read from the disk to the buffer, the process by the disk operation will read the data from the kernel buffer to disk via DMA operation, the process does not depend on CPU

  • User process the data from the kernel buffer to copy to user space buffer

  • A user process to read data from the user-space buffer

image.png

NIO copy

NIO copy files implemented in two ways, first through the pipeline, but through memory-mapped files in memory

public class NIOFileCopy {

    public static void copyFile(String source, String target) {
        long start = System.currentTimeMillis();
        try(FileChannel input = new FileInputStream(new File(source)).getChannel();
            FileChannel output = new FileOutputStream(new File(target)).getChannel()) {
            output.transferFrom(input, 0, input.size());
        } catch (Exception e) {
            e.printStackTrace();
        }

        System.out.println(String.format("NIO file copy cost %d msc", System.currentTimeMillis() - start));
    }
}
复制代码

Memory mapped file:

The virtual address of the kernel address space and user space is mapped to the same physical address , DMA hardware can fill the kernel and user space process while the visible buffer. User processes directly read from the memory contents of the file, and memory applications only need to deal with, do not need to copy the buffer back and forth, greatly improving the efficiency of IO copies. Memory-mapped files to load the memory used by the Java heap area outside

public class NIOFileCopy2 {

    public static void copyFile(String source, String target) {
        long start = System.currentTimeMillis();
        try(FileInputStream fis = new FileInputStream(new File(source));
            FileOutputStream fos = new FileOutputStream(new File(target))) {
            FileChannel sourceChannel = fis.getChannel();
            FileChannel targetChannel = fos.getChannel();
            MappedByteBuffer mappedByteBuffer = sourceChannel.map(FileChannel.MapMode.READ_ONLY, 0, sourceChannel.size());
            targetChannel.write(mappedByteBuffer);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.println(String.format("NIO memory reflect file copy cost %d msc", System.currentTimeMillis() - start));
        File targetFile = new File(target);
        targetFile.delete();
    }
}

复制代码

NIO memory-mapped file copy can be divided into the following steps

image.png

Files # copyFile method

public class FilesCopy {

    public static void copyFile(String source, String target) {
        long start = System.currentTimeMillis();
        try {
            File sourceFile = new File(source);
            File targetFile = new File(target);
            Files.copy(sourceFile.toPath(), targetFile.toPath());
        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.println(String.format("FileCopy file copy cost %d msc", System.currentTimeMillis() - start));
    }
}
复制代码

FileUtils#copyFile方法

Before the introduction of dependence must first use FileUtils

  • rely

     <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.4</version>
    </dependency>
    复制代码
  • FileUtils # copyFile wrapper class: FileUtilsCopy.java

    public class FileUtilsCopy {
    
        public static void copyFile(String source, String target) {
            long start = System.currentTimeMillis();
            try {
                FileUtils.copyFile(new File(source), new File(target));
            } catch (IOException e) {
                e.printStackTrace();
            }
    
            System.out.println(String.format("FileUtils file copy cost %d msc", System.currentTimeMillis() - start));
        }
    }
    复制代码

Performance Comparison

Since there are so many implementations, be sure to select the best performance from the

test environment:

  • windows 10
  • CPU 6 nuclear
  • JDK1.8

Test code: PerformTest.java

public class PerformTest {

    private static final String source1 = "input/test1.txt";
    private static final String source2 = "input/test2.txt";
    private static final String source3 = "input/test3.txt";
    private static final String source4 = "input/test4.txt";
    private static final String target1 = "output/test1.txt";
    private static final String target2 = "output/test2.txt";
    private static final String target3 = "output/test3.txt";
    private static final String target4 = "output/test4.txt";

    public static void main(String[] args) {
        IOFileCopy.copyFile(source1, target1);
        NIOFileCopy.copyFile(source2, target2);
        FilesCopy.copyFile(source3, target3);
        FileUtilsCopy.copyFile(source4, target4);
    }
}
复制代码

Performed a total of five times, read and write the file size are 9KB, 23KB, 239KB, 1.77MB, 12.7MB

image.png

Note: The units are milliseconds

From the results point of view:

  • File is very small => IO> NIO [Memory Mapping]> NIO [pipeline]> Files # copy> FileUtils # copyFile

  • When the file is small => NIO [Memory Mapping]> IO> NIO [pipeline]> Files # copy> FileUtils # copyFile

  • When files are large => NIO [Memory Mapping]>> NIO [pipeline]> IO> Files # copy> FileUtils # copyFile

When the file is small, IO efficient than NIO, NIO underlying implementation is more complex, NIO's advantage is not obvious. Meanwhile NIO memory map initialization time-consuming, so the file is small and there is no advantage compared to replicate IO

If the pursuit of efficiency can be selected to achieve NIO memory-mapped files are copied, but for large file copy using memory-mapped to pay special attention to system memory usage. Recommended: large file copy using memory mapping , the original is this:

For most operating systems, mapping a file into memory is more
expensive than reading or writing a few tens of kilobytes of data via
the usual {@link #read read} and {@link #write write} methods.  From the
standpoint of performance it is generally only worth mapping relatively
large files into memory
复制代码

The vast majority of the operating system memory mapped IO overhead expenses greater than

At the same time through the test results, tools and file replication method works the JDK is not high, if not the pursuit of efficiency can still use it, after all, can be less and less to write one line of code to write a line of code, write code to catch fish no joy

Years ago last article, I'm afraid too many blessings thirty at night, you will not see my greeting here ahead of the New Year I wish you all Fortune "mouse" are "mouse", but to

image.png

Guess you like

Origin juejin.im/post/5e2523686fb9a030051f013e