The fastest compare two files in .NET CORE contents are the same way - continued

In the previous blog post, I use several methods to try to find what is the fastest way to compare two files in the .NET CORE. After the article was published, aroused a lot of discussion of the Friends of Bo, I express to you for your support heartfelt thanks.

Which also has presented the outcome of the Friends of Bo I last used ReadOnlySpan method of doubt, that it is not the result of normal fast, almost beyond the limits of disk IO speed. In this regard I would like to deeply reflect on ------ I ReadOnlySpan on the final execution, it takes advantage of disk cache, greatly accelerated the speed of comparison , when discovered this, I immediately canceled before the release of Bowen, and redesigned the entire test program, using a more rigorous and fair way to test each comparison method and wrote the blog post to set the record straight.

In addition, in the process of re-tested, fully taken Bo Friends of the opinion, improving the following points:

  • All tests use BenchmarkDotNet to

    For more professional fair result, refactor the code to test the use BenchmarkDotNet

  • Try different cache sizes

    In order to fully test the effect of different buffer size for speed, cache size of the byte array divided into three kinds, namely:

    • 4096 * 10
    • 4096 * 100
    • 4096 * 1000
  • Try to use asynchronous IO method

    To observe the effect on the speed of asynchronous IO

  • Clear disk cache before running

    Of course, this is this post the most important thing here use a method Win32API relatively clear after each disk cache (see text at the beginning of the code run method, if the code needs to run on platforms other than Windows, need to clear the cache of self-realization )

Principle on the comparative method see on a blog, not repeat them here, just to show results:


BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.2.401
  [Host]     : .NET Core 2.2.6 (CoreCLR 4.6.27817.03, CoreFX 4.6.27818.02), 64bit RyuJIT
  DefaultJob : .NET Core 2.2.6 (CoreCLR 4.6.27817.03, CoreFX 4.6.27818.02), 64bit RyuJIT

Method buffer_size Mean Error StdDev Median
CompareByMD5 40960 3.294 s 0.0608 s 0.0539 s 3.311 s
CompareByMD5Async 40960 4.723 s 0.0137 s 0.0128 s 4.720 s
CompareByToInt64 40960 4.883 s 0.0140 s 0.0131 s 4.886 s
CompareByByteArray 40960 4.713 s 0.0059 s 0.0052 s 4.714 s
CompareByByteArrayAsync 40960 4.687 s 0.0070 s 0.0066 s 4.688 s
CompareByString 40960 5.491 s 0.1066 s 0.0997 s 5.483 s
CompareBySequenceEqual 40960 5.185 s 0.1028 s 0.1337 s 5.180 s
CompareByWin32API 40960 4.334 s 0.0209 s 0.0195 s 4.331 s
CompareByReadOnlySpan 40960 4.316 s 0.0209 s 0.0195 s 4.313 s
CompareByReadOnlySpanAsync 40960 4.699 s 0.0235 s 0.0220 s 4.695 s
CompareByMD5 409600 3.329 s 0.0639 s 0.0808 s 3.334 s
CompareByMD5Async 409600 4.727 s 0.0192 s 0.0179 s 4.720 s
CompareByToInt64 409600 4.881 s 0.0111 s 0.0104 s 4.879 s
CompareByByteArray 409600 3.017 s 0.0583 s 0.0798 s 3.014 s
CompareByByteArrayAsync 409600 3.038 s 0.0935 s 0.1370 s 2.996 s
CompareByString 409600 5.086 s 0.0871 s 0.0815 s 5.075 s
CompareBySequenceEqual 409600 5.019 s 0.0978 s 0.0915 s 4.998 s
CompareByWin32API 409600 3.048 s 0.1061 s 0.1263 s 3.017 s
CompareByReadOnlySpan 409600 3.079 s 0.0862 s 0.1264 s 3.045 s
CompareByReadOnlySpanAsync 409600 2.976 s 0.0484 s 0.0452 s 2.988 s
CompareByMD5 4096000 3.456 s 0.0850 s 0.2410 s 3.369 s
CompareByMD5Async 4096000 4.766 s 0.0412 s 0.0385 s 4.762 s
CompareByToInt64 4096000 5.003 s 0.0789 s 0.0659 s 4.998 s
CompareByByteArray 4096000 2.558 s 0.0505 s 0.1055 s 2.607 s
CompareByByteArrayAsync 4096000 2.500 s 0.0492 s 0.0766 s 2.508 s
CompareByString 4096000 6.024 s 0.0655 s 0.0613 s 6.020 s
CompareBySequenceEqual 4096000 4.949 s 0.0793 s 0.0742 s 4.931 s
CompareByWin32API 4096000 2.582 s 0.0511 s 0.0881 s 2.620 s
CompareByReadOnlySpan 4096000 2.677 s 0.0503 s 0.0420 s 2.666 s
CompareByReadOnlySpanAsync 4096000 2.460 s 0.0492 s 0.0657 s 2.458 s

"Buffers_size" array of bytes that is the buffer size

"Mean" column that is time consuming on average, lower, the better

This time I tested the file size is 500MB, from the data we can observe the following behaviors:

  • CompareByMD5A method as byte array buffer is not used, so the results of tests in each group substantially stable even in the fastest way is set 40960
  • CompareByStringThe worst in all groups, the second difference is that CompareBySequenceEqual, so I have no incentive for them to write asynchronous methods
  • Byte arrays cache method substantially increases as the cache, the faster.
  • Asynchronous methods in the cache that the largest group began winning outcome (409600), the significance is not great but overall,
  • CompareByReadOnlySpanAlso outstanding performance, let me wrong on this blog post made a lot of peace of mind
  • CompareByByteArrayIn each group are quite competitive, almost CompareByReadOnlySpan. But when disk cache hits, neck and neck CompareByReadOnlySpanwill be the same speed to take off like hanging open.
  • CompareByWin32APIIt is also very good, but can only be used on Windows platform

in conclusion:

  • The simplest is MD5, speed is acceptable
  • The most simple is a ByteArray, the code easy to understand, good speed
  • The most practical thing CompareByReadOnlySpan, speed is good, but also the use of disk cache

代码放在GITHUB上, 其中清除磁盘的缓存需要管理员权限, 所以需要以管理员权限运行Visual Studio

关于文件比较的方法希望通过这篇博文能得到正确的结论了, 也欢迎广大博友积极评论!

Guess you like

Origin www.cnblogs.com/waku/p/11447819.html