Vectorized calculation and compilation and execution of database optimization technology

0 Introduction

Recently, I started to come into contact with ClickHouse, a columnar database developed by a fighting nation. It is a high-performance OLAP. According to some tests, its query efficiency reaches tens to hundreds of times that of mysql. So as to why its performance is so high, a very important optimization is that it implements vectorized calculation and compilation and execution. Here are a few articles explaining these two optimization operations.

1 Volcano model

Before talking about optimization, you might as well read this article Introduction to the Volcano Model . The disadvantage of the volcano model is that a tuple (Tuple-at-a-time) is calculated every time, which will cause multiple operator nodes to call next, which is a large number of virtual function calls, which will result in low CPU utilization. .

2 Vectorized calculation and compilation and execution

This article is very good in the analysis of vectorization and compilation and execution .
The volcano model is a pull model, and the compilation and execution implements a push execution model. It uses LLVM to generate an intermediate language, leaves the tuple in the register from the bottom up, and each operator directly processes the data in the register.
As for the vectorized calculation, it is easier to understand. It still uses a pull model similar to the volcano model. The only difference is that the next() function of its Operator returns a batch of data each time. By using SIMD (Single Instruction Multiple Data), SIMD can process 2, 4, 8 or more copies of data on a single CPU instruction, which greatly improves efficiency.
These two methods are incompatible, but there are also systems that are compatible. How to implement ClickHouse, I will study later.

Guess you like

Origin blog.csdn.net/MoonWisher_liang/article/details/115336286