In the previous blog post, I use several methods to try to find what is the fastest way to compare two files in the .NET CORE. After the article was published, aroused a lot of discussion of the Friends of Bo, I express to you for your support heartfelt thanks.
Which also has presented the outcome of the Friends of Bo I last used ReadOnlySpan method of doubt, that it is not the result of normal fast, almost beyond the limits of disk IO speed. In this regard I would like to deeply reflect on ------ I ReadOnlySpan on the final execution, it takes advantage of disk cache, greatly accelerated the speed of comparison , when discovered this, I immediately canceled before the release of Bowen, and redesigned the entire test program, using a more rigorous and fair way to test each comparison method and wrote the blog post to set the record straight.
In addition, in the process of re-tested, fully taken Bo Friends of the opinion, improving the following points:
All tests use BenchmarkDotNet to
For more professional fair result, refactor the code to test the use BenchmarkDotNet
Try different cache sizes
In order to fully test the effect of different buffer size for speed, cache size of the byte array divided into three kinds, namely:
- 4096 * 10
- 4096 * 100
- 4096 * 1000
Try to use asynchronous IO method
To observe the effect on the speed of asynchronous IO
Clear disk cache before running
Of course, this is this post the most important thing here use a method Win32API relatively clear after each disk cache (see text at the beginning of the code run method, if the code needs to run on platforms other than Windows, need to clear the cache of self-realization )
Principle on the comparative method see on a blog, not repeat them here, just to show results:
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.2.401
[Host] : .NET Core 2.2.6 (CoreCLR 4.6.27817.03, CoreFX 4.6.27818.02), 64bit RyuJIT
DefaultJob : .NET Core 2.2.6 (CoreCLR 4.6.27817.03, CoreFX 4.6.27818.02), 64bit RyuJIT
Method | buffer_size | Mean | Error | StdDev | Median |
---|---|---|---|---|---|
CompareByMD5 | 40960 | 3.294 s | 0.0608 s | 0.0539 s | 3.311 s |
CompareByMD5Async | 40960 | 4.723 s | 0.0137 s | 0.0128 s | 4.720 s |
CompareByToInt64 | 40960 | 4.883 s | 0.0140 s | 0.0131 s | 4.886 s |
CompareByByteArray | 40960 | 4.713 s | 0.0059 s | 0.0052 s | 4.714 s |
CompareByByteArrayAsync | 40960 | 4.687 s | 0.0070 s | 0.0066 s | 4.688 s |
CompareByString | 40960 | 5.491 s | 0.1066 s | 0.0997 s | 5.483 s |
CompareBySequenceEqual | 40960 | 5.185 s | 0.1028 s | 0.1337 s | 5.180 s |
CompareByWin32API | 40960 | 4.334 s | 0.0209 s | 0.0195 s | 4.331 s |
CompareByReadOnlySpan | 40960 | 4.316 s | 0.0209 s | 0.0195 s | 4.313 s |
CompareByReadOnlySpanAsync | 40960 | 4.699 s | 0.0235 s | 0.0220 s | 4.695 s |
CompareByMD5 | 409600 | 3.329 s | 0.0639 s | 0.0808 s | 3.334 s |
CompareByMD5Async | 409600 | 4.727 s | 0.0192 s | 0.0179 s | 4.720 s |
CompareByToInt64 | 409600 | 4.881 s | 0.0111 s | 0.0104 s | 4.879 s |
CompareByByteArray | 409600 | 3.017 s | 0.0583 s | 0.0798 s | 3.014 s |
CompareByByteArrayAsync | 409600 | 3.038 s | 0.0935 s | 0.1370 s | 2.996 s |
CompareByString | 409600 | 5.086 s | 0.0871 s | 0.0815 s | 5.075 s |
CompareBySequenceEqual | 409600 | 5.019 s | 0.0978 s | 0.0915 s | 4.998 s |
CompareByWin32API | 409600 | 3.048 s | 0.1061 s | 0.1263 s | 3.017 s |
CompareByReadOnlySpan | 409600 | 3.079 s | 0.0862 s | 0.1264 s | 3.045 s |
CompareByReadOnlySpanAsync | 409600 | 2.976 s | 0.0484 s | 0.0452 s | 2.988 s |
CompareByMD5 | 4096000 | 3.456 s | 0.0850 s | 0.2410 s | 3.369 s |
CompareByMD5Async | 4096000 | 4.766 s | 0.0412 s | 0.0385 s | 4.762 s |
CompareByToInt64 | 4096000 | 5.003 s | 0.0789 s | 0.0659 s | 4.998 s |
CompareByByteArray | 4096000 | 2.558 s | 0.0505 s | 0.1055 s | 2.607 s |
CompareByByteArrayAsync | 4096000 | 2.500 s | 0.0492 s | 0.0766 s | 2.508 s |
CompareByString | 4096000 | 6.024 s | 0.0655 s | 0.0613 s | 6.020 s |
CompareBySequenceEqual | 4096000 | 4.949 s | 0.0793 s | 0.0742 s | 4.931 s |
CompareByWin32API | 4096000 | 2.582 s | 0.0511 s | 0.0881 s | 2.620 s |
CompareByReadOnlySpan | 4096000 | 2.677 s | 0.0503 s | 0.0420 s | 2.666 s |
CompareByReadOnlySpanAsync | 4096000 | 2.460 s | 0.0492 s | 0.0657 s | 2.458 s |
"Buffers_size" array of bytes that is the buffer size
"Mean" column that is time consuming on average, lower, the better
This time I tested the file size is 500MB, from the data we can observe the following behaviors:
CompareByMD5
A method as byte array buffer is not used, so the results of tests in each group substantially stable even in the fastest way is set 40960CompareByString
The worst in all groups, the second difference is thatCompareBySequenceEqual
, so I have no incentive for them to write asynchronous methods- Byte arrays cache method substantially increases as the cache, the faster.
- Asynchronous methods in the cache that the largest group began winning outcome (409600), the significance is not great but overall,
CompareByReadOnlySpan
Also outstanding performance, let me wrong on this blog post made a lot of peace of mindCompareByByteArray
In each group are quite competitive, almostCompareByReadOnlySpan
. But when disk cache hits, neck and neckCompareByReadOnlySpan
will be the same speed to take off like hanging open.CompareByWin32API
It is also very good, but can only be used on Windows platform
in conclusion:
- The simplest is MD5, speed is acceptable
- The most simple is a ByteArray, the code easy to understand, good speed
- The most practical thing
CompareByReadOnlySpan
, speed is good, but also the use of disk cache
代码放在GITHUB上, 其中清除磁盘的缓存需要管理员权限, 所以需要以管理员权限运行Visual Studio
关于文件比较的方法希望通过这篇博文能得到正确的结论了, 也欢迎广大博友积极评论!