How to query 1 billion data within 0.003ms? And how to achieve query within 0.03μs?

There are no queries in the world that cannot be optimized, and if there are, then the money is not enough.                                           
                                                                                                                 —— Anonymous


This is already the third blog written by the author for 8eqbang, and it is also the last of this series. We mentioned last time that using the binary storage method can increase the speed by four times, and it only takes 0.12ms to complete a query. Is there a faster solution? The answer is yes, and there are faster solutions after optimization.


Before , our query idea has always been binary search, which is actually the key to the optimization problem. The time complexity of binary search is O(logN). Although it is very fast, a single query still needs about 31 value comparisons. In fact, we can reduce the time complexity to O(1) by typing tables, because the QQ number itself is an integer value, and we only need to increase the size of the table to 4 billion to complete the hash. It can be calculated that the required space size is 4000000000×9b/1024/1024/1024≈33.5G, which just coincides with the actual size of the file.


The following figure shows the query speed test results of the table method. We can see that in the case of table printing, the single query time only takes 3.08 microseconds, which is much lower than the previous 0.12 milliseconds. The speed has increased by nearly 40 times, and the cost is only It takes up 33.5-5.86=27.64 (G) hard disk space.

We all know that O(1) is often the best algorithm, so can we continue to improve the query speed? In fact, you might as well be bold. Since the core of this method lies in random access, it is better to use RAM as the best carrier for 8e data.


The figure below shows the result after the binary binary file is loaded into the memory and queried. It can be seen that a single query only takes 1 microsecond, and there is still no table. Similarly, we can calculate that if there is a computer with 64G memory, the single search time of the table method is about 27 nanoseconds, and the computer can perform 37.03 million searches on 700 million data in one second, and only 10 memory sticks are enough Solve the query needs of the whole country.

Attached is the full code for the table method and the in-memory dichotomy:

 

 

Guess you like

Origin blog.csdn.net/ik666/article/details/126807855
Recommended