The CRC battle between Redis author Antirez and Contributor Mattsta

Hello everyone, I am yes.

Yesterday cousin said there was a school girl asked him why use Redis CRC16(key) mod 16384to calculate the position in which the key slot, I think it is generally used for CRC check, and then converted into a binary polynomial get through the remainder by modulo-2 division, where It seems that there is nothing wrong with using it as a Hash function (students who don't know much about CRC can check it first).

So I went to check to see if the author of Redis, antirez, had any relevant introduction. This check found an article written by an old man named mattsta in 14 years. This old man’s article is too interesting, he thinks The CRC algorithm implemented by Redis is too rudimentary. He has an enhanced version of the CRC algorithm-CRCSpeed ​​that can improve performance by 4 times, so he wrote this article for a wave of analysis.

After reading it, I immediately went to look at the source code of my local version 5.0, and found that the CRC algorithm did not adopt his enhanced version and was still an old implementation. I went to see the latest version 6.0 and found that CRC-64 was changed to the implementation of CRCSpeed. It was submitted on April 28, 2020. The submitter was antirez. This made me more curious. This span from 2014 to 2020 It's a bit big, what happened?

Then I went to see Mattsta's Github, he was also the Redis Contributor, so I tracked the history of the development of the whole thing, and things became more and more interesting. I can't wait to read this guy's article again with everyone, and go through the matter again.

But let me thank my cousin first, otherwise I won't have a chance to read this article, come to my cousin to say hello to everyone!

"Hello everyone! I am a cousin." Alright, cousin, you can go now.

The cousin quit the group chat cursingly: "The tool man is the tool man".

Next, I will use my bad English to translate, please correct me if there are any errors.

Fancy CRCing You Here

The title of this old man's article has that flavor! Fancy CRCing You Here, I also specially went to my eighth classmate to translate it.

Then the first sentence of this old man's article is very thoughtful.

The simple translation is that the people of short click the link directly to see the result. Then I am definitely not short. Then this old man threw the following words:

Many people copy the implementation of Redis CRC-64 into their projects, and there are also a few copies of CRC-16. This algorithm can be used, but it can be implemented better.

I looked at 2353 projects on Github that used Redis's CRC-64 implementation, and 325 projects used Redis's CRC-16 implementation.

This old man feels sorry for them, alas, they deserve a better CRC algorithm. Then my brother turned on the first wave of output.

A simple look is:

  • Redis decided to ignore the increase in throughput and latency because they prefer to code like in 1999 .

  • The price is that their outdated traditional design is 40 times slower. Everyone did a great job . (I don’t know if this old man made a typo or deliberately wrote 4 times as 40 times at the beginning, I guessed it:), the hyperlink jumped over to the realization of the CRC-64 written by antirez, rude and straightforward, I like it very much ).

  • Hey , if someone writes an implementation that replaces redis and memcached now, it would be great.

Then the old man started to move.

What’s Wrong

He first said that CRC cannot be parallel in nature, because the value of the next iteration depends on the previous iteration. Then it pointed out where CRC is used in Redis.

CRC-64 is used in three places:

  • Add a checksum when migrating keys across instances, (and verify the above checksum)

  • Add a checksum for RDB output for replication and persistence (optional, it can be disabled by configuration because of low performance)

  • Used for memory test

CRC-16 is used in one place:

  • As a hash function of the key assigned to the cluster slot in the cluster

mattsta then let go:

The simple translation: Redis realize this is extremely simple (this extremelyword I prefer). It is a simple table lookup method, and then loops one byte by one byte over the past , and the time complexity is O(N).

Immediately this old man will tell how to do it.

What’s Better

A brief translation: I doubled the random on the Internet, and want to see how other people implement CRC-64. But most of them copy Redis (hey, hate it).

Then mattsta found that a buddy named Mark in stackoverflow wrote a high-speed version of CRC-64 implementation. He compared the implementation of Mark and the implementation of Redis and found that the version of Mark is 400% faster than the Redis version, respectively 1.6 GB/s and 400 MB/s.

but! The CRC-64 implemented by Mark and Redis belong to different versions. Yes, there are many variants of the CRC-64 algorithm , so mattsta temporarily pointed the gun at the CRC.

Forgive my ignorance, I always thought that pretty and girl are more suitable, but they can also be used with awful, I learned, um. pretty f....

Then mattsta said that we can't directly use Mark's implementation, but we can look at Mark's implementation.

What’s Improved

mattsta first showed the implementation of Redis, which is to loop through each byte, and then look up the table.

Then he posted the implementation of the quick version.

It can be seen that it is also a table lookup method, but instead of processing 1 byte at a time, 8 bytes are processed.

mattsta used a description that I find very funny tiger prepping to devour your entrails. This new version of the code looks like a tiger is ready to eat your gut! Hold on for a while! This is still a hyperlink. I clicked it and it was really a tiger picture!

Hahaha, let me laugh for a while. This old man is very interesting!

Then he said that the reason for the fast algorithm is to use multi-dimensional array to look up the table, and each loop can process 8 bytes.

So for 500 MB of input, the Redis version requires 500 million cycles, while the new version only needs 62.5 million cycles.

This slicing-by-8is released by Intel researchers in 2006, which means that 8 lookup tables are used, each of which contains another 256-byte lookup table to implement a CRC-64 algorithm that can process 8 bytes at a time. Simply put, it is the operation of space for time.

So the old man found inspiration, and he wanted to implement a CRC algorithm that matches Redis.

Result

A simple translation: After a year of hard work, mattsta finally made a fast version of the CRC-64 algorithm that matches Redis, and as an extra bonus, it can also be used on CRC-16, and it can abandon the old version of the source code. Heap static lookup table. You can dynamically generate them when needed, instead of dragging them to bloat the code.

Let's take a look at the old version of the code. There are indeed a bunch of codes. I cut a short section.

This is the fast CRC implementation version-crcspeed written by mattsta, which is not only faster, but also clears the code.

Then mattsta came with a wave. Don't believe what I said, let's let the data speak (Ao Jiao.jpg).

The crcspeed measured by mattsta's own notebook is better than the Redis implementation in terms of time consumption, throughput and CPU cycles per byte.

Real-World Impact

Mattsta also pointed out what crcspeed can bring to Redis?

A simple translation is: Redis forks out child processes when generating RDB, so copy-on-write is used, so the memory growth depends on the load of writing, so the RDB is quickly terminated and the fork child process is exited. COW memory will be less, and CRC-64 is used as a check when generating RDB, then the faster the CRC-64 check, the faster the RDB is generated, and the less memory is used for COW copying .

and

When using CRC-16 to map slots, if the user is doing something weird, such as using a 300 MB key, then the fast CRC-16 can reduce the cluster slot allocation overhead by 400%!

If users are doing wacky thingsI agree with this very much. Whether inside or outside the company, the interface you implement must be viewed with the greatest malicious invocation by users.

mattsta says this is an effective and efficient win-win in many aspects!

I thought that the articles are almost here, but not.

Minor Notes

It can be seen that Mattsta doesn't want to make wheels, but there really are no wheels! So he can only realize one by himself, this is a new wheel!

Resources Consulted

Then he listed some of the resources he referred to. First of all, he thanked the article "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS".

Let's learn the correct posture for thanking reference materials.

See it, this is the right posture for thanks! Let's take a look at the article it is thankful for.

One said one, indeed, pure passerby. As a txt, well-written, well-formatted and interesting. All aspects of style, layout and tone have been carefully considered by experts.

It's not enough to praise mattsta, you have to add a little bit of your own thoughts.

A simple look: To some extent, the Internet has lost its ability to preserve well-written, well-formatted, informative guides, answers to common questions, and approachable research papers. We should strive to take back that part of the world. What is to blame for this loss? Over-reliance on style? CSS? Or JavaScript? PHP? .

The best language warning in the world!

Did you see it? That's thanks. mattsta pursues pure dry goods, don't give me some bells and whistles!

And this article also convinced mattsta that he was not capable of implementing a CRC-64 algorithm, so he actually relied on pycrc to implement it.

Then the old man said yahyahyah, linux kernel is also used in this way.

Brother also find an introduction slicing by 8on the Intel web of paper, but now has gone. So I mocked a wave of Intel first, and by the way, I also complained about the silly *Google code. Every time I opened this pdf, it was automatically downloaded instead of online.

This old man really has an appetite for me hahaha. Mattsta's article is not over yet, here are some implementation details. Interested friends can take a look at that time, and there will be a link at the end of the article. I won't follow along anymore.

Let's look at why Mattsta, as a Redis Contributor, wrote this article? Shouldn't it be solved directly by mentioning pr? Is there anything fatal about this algorithm that makes Antirez not accept it? Let's follow it up!

Track the ins and outs of things

First this article was written on 2014-12-22

mattsta started its first submission to redis.io in 2013-12.

Then mattsta started to export to Redis. On 2014-04-01, it raised a related issue and attached its own comparison.

The issue raised did not receive a response from the team members, so lonely as snow, only a golden retriever called for him. This issue is still open at this time.

Then in the same year, 2014-11-23, mattsta created the crcspeed library and submitted its implementation.

And submitted pr on December 22, 2014, it was the same day as writing the article! And it's pr after writing the article. At first, I thought it was an article that PR was not accepted for a long time, and then I wrote angrily.

You can see a team member responded after a day, and he said I don’t know if this will be merged (I think it should), but damn it! This is a great improvement! Cowhide Klass!

2014-12-23, mattsta made some supplementary explanations on pr. However, there was no response.

Until 2015-01-10, mattsta made another wave of updates to pr.

Finally on 2015-02-25, I waited for antirez's reply.

Antirez said, it’s very interesting, but I want to see a fixed reproducible test case with greater than 5% revenue, even a comprehensive test is fine, as long as the obvious can be reflected in Redis, I believe that from the cluster The crc16 test can easily prove the effect, and now there is no rush for a faster implementation of the merger, but if one day you complete such a test, I will be very grateful.

Then added a mark to this pr review - and - merge.

I also added a ps: Generally speaking, it is very important to prove that the performance of a thing is improved. I made an exception here, because I think its separate test is indeed much faster. I believe that even if Redis does not use this experience, But sooner or later we will also benefit from it

Simply put, mattsta, you have to do a Redis-related test to prove that it really improves the performance of Redis, so that I can merge, but I will make an exception. I will recognize you and give you a mark! (But there is no real merger).

In other words, antirez actually recognizes the implementation of mattsta, but mattsta does not give a test related to Redis, so this pr cannot be merged yet.

This pr has been here, it has never been updated, and it is still open.

Mattsta didn't continue to say anything. The output of Redis will no longer be output after early 2015. The CRC-16 is still using the old version. CRC-64 is a modification made by antirez on 2020-04-28 after six years, and it uses mattsta's crcspeed.

Look back again

You can go to mattsta to raise the issue on 2014-04-01, and then research by yourself without any response, find a lot of information, finally realized crcspeed, and published an article, and then raised a PR on the same day. Then it took nearly two months to get a reply from antirez. Since there is no substantial test on Redis, it will not be merged, but it will be affirmed.

But I personally guess that mattsta may still be a little angry. For such a universal thing, I have given a horizontal comparison test! I have analyzed this principle so clearly! This is obviously ok, what do you want me to test! Do not merge and pull down! (Again Tsundere.jpg).

The angle of antirez is different, he is the real father of Redis. You are right, I agree with you, but you have to show me a substantive proof to see how much you have improved my Redis.

In fact, I can understand both sides, the roles are different. In the end, we finally learned the ins and outs of the whole thing, and attached the beautiful photos of mattsta all the time. It seems that the hair volume is good.

This article is about such a thing. In fact, I just took a gossip to see why the apparently correct pr mentioned by Mattsta as a Contributor was not merged. As for the CRC, I don't care hahahaha.

Of course, mattsta's research heart is worth learning, and of course his funny description and colorful thanks. And antirez's rigor of pr is also worthy of our emulation.

link

https://matt.sh/redis-crcspeed

https://github.com/mattsta?tab=repositories


I am yes, from a little bit to a little bit, see you in the next article .

Guess you like

Origin blog.csdn.net/yessimida/article/details/107889474