Why is Clickhouse so fast?

When it comes to Clickhouse, everyone knows that it is very fast, so why is CH so fast? Is it because of the use of vectorization, columnar database or other. Let's explore its mystery together. (Summary of "ClickHouse Principle Analysis and Application Practice")

 

1. Hardware

Clickhouse will perform Group By in memory and use HashTable to load data. At the same time, CH is also particularly concerned about the CPU L3 level cache, because an L3 level cache failure will bring 70-100ns, the accumulation of which will add up. A 32G may waste 500 million operations per second. It is precisely because of this that CH can achieve 175 million data scans per second in benchmark queries.

 

2. Algorithm

In terms of constants, CH uses Volnisky's algorithm; for non-constants, it uses the vectorization of the CPU to perform SIMD for optimization; regular uses the re2 and hyperscan algorithms.

 

3. Optimized for the occasion

CH will use different algorithms in different scenarios. For example, in the deduplication function uniqCombined, different algorithms will be selected according to the amount of data: when the amount of data is relatively small, it will choose to use Array to save; when the amount of data is medium, use HashSet; when the amount of data is large, it will use HyperLogLog algorithm.

 

4. Vectorization

CH uses vectorized execution. SIMD is widely used in scenarios such as text conversion, data filtering, data decompression, and JSON conversion. Compared to purely using the CPU, the use of register brute force optimization can be regarded as a dimensionality reduction blow

 

5. Continuous testing and continuous improvement

A good product can certainly be used in various scenarios. Because CH has the natural advantages of Yandex, it often uses real data for testing and tries to use it in various scenarios. As a result, it has obtained a rapid version update, which is basically maintained at a monthly update.

Guess you like

Origin blog.csdn.net/sileiH/article/details/113702750