Detailed explanation of the vectorized execution engine of data warehouse

This article is shared from Huawei Cloud Community " GaussDB (DWS) Vectorized Execution Engine Detailed Explanation ", author: yd_212508532.

Preface

  • Applicable version: [Baseline function]

Most traditional row execution engines adopt a one-tuple-at-a-time execution mode. In this way, most of the time during the execution process, the CPU is not used to process data, but more to traverse the execution tree, which will lead to low effective utilization of the CPU. In the face of a huge number of function calls in OLAP scenarios, huge overhead is required. In order to solve this problem, a vectorization engine was added to GaussDB (DWS). The vectorization engine uses a batch-at-a-time execution mode of tuples, which can greatly reduce the cost of traversing execution nodes. At the same time, the vectorization engine is also naturally connected to column storage, making it easier to load vectorized column data in the underlying scanning nodes. Column storage + vectorized execution engine is one of the golden keys that opens the door to OLAP performance!

About row storage and column storage tables

The row storage table stores tuples into Page pages by rows. It is mostly used in TP scenarios where the data is frequently updated, there are many additions, deletions and modifications, and the query results involve multiple columns of the table.

Row storage table storage method

Column storage tables are stored in columns, and each column's data is stored in a file. Mostly used in AP scenarios.

  • The number of table columns is large, the number of accessed columns is small, and the number of IO operations is reduced.
  • Column data is homogeneous, improving data compression ratio
  • Operations based on column batch data, the CPU cache hit rate is high

Column storage table storage method

execution framework

The executor is the hub of interaction between the optimizer and the storage engine. Taking the execution plan tree generated by the optimizer as input, the data is accessed from the storage engine, and various execution operators are operated according to the plan to realize data processing. Using the Pipeline mode, the row executor operates one tuple at a time, and the column executor operates one batch at a time. The upper layer drives the lower layer, allowing data to flow up the execution tree. Provides execution operators for various data processing. The figure below shows the top-down control flow and bottom-up data flow.

Pipeline mode of executor

The execution process of the executor can be divided into these three steps:

  1. Executor initialization: Construct the executor's global status information estate, recursively traverse each node of the plan tree, and initialize its execution status information planstate
  2. Execution of the executor: The entrances of the row engine and the vectorization engine are independent. Starting from the root node of the plan tree, it recursively traverses to the leaf nodes to obtain a tuple/batch. After processing by layer-by-layer node operators, a result tuple/batch is returned until No more tuple/batch.
  3. Cleaning up the executor: Recycle the global status information of the executor and clean up the execution status of each plan node.

The execution process of the executor

column executor

The problem with the row executor is that most of the CPU processing is in the process of traversing the Plan Tree instead of actually processing the data, and the effective CPU utilization is low. The unique application scenarios of column storage tables require a supporting vectorization engine to truly take advantage of its performance improvement in OLAP scenarios. Therefore, the basic idea of ​​transforming the column executor is to process one column of data at a time.

Like the row executor, the vectorized execution engine scheduler follows the Pipeline mode, but each processing and data transfer between operators is one batch at a time (that is, 1000 rows of data), which improves the CPU hit rate and reduces IO read operations. The data flow structure of the column executor VectorBatch is shown in the figure below.

Column Executor Data Flow Structure VectorBatch

Mixing rows and columns: Adapter operator

Some scenarios of column storage tables do not support vectorized execution engines, such as: string_to_array, listagg, string_agg, etc.
GaussDB has the ability to automatically switch between two sets of row and column engines.

Automatic switching of row and column engines

For column storage data, if there is only a row engine, it is usually necessary to reconstruct the column data into tuples for the execution engine to process row by row. The Tuple deform process affects the performance of column storage data query processing.

Vectorized execution engine performance

Comparing the calculation performance of the same expression x*(1-y) by the row and column storage engine, we can see that the Cstore Scan operator of the column storage engine takes 85% less time than the Seq Scan operator of the row storage engine.

Row/column engine performance comparison

The characteristics of vector computing are: calculating multiple values ​​at one time, reducing function calls and context switches, and making full use of the CPU cache and vectorized execution instructions to improve performance.

Performance advantages of vectorized execution engines:

  • One batch at a time, read more data and reduce the number of IO reads
  • Due to the large number of records in the Batch, the cache hit rate of the corresponding CPU increases.
  • The number of function calls during Pipeline mode execution is reduced.
  • Matched with column storage tables to reduce tuple deform, that is, the time overhead of reconstructing tuples from column storage data

Comparison of operators of row/column executors

The execution operators of the vectorization engine are similar to the row execution engine, including control operators, scan operators, materialization operators and connection operators. It will also be represented by nodes, inherited from row execution nodes, and the execution process will be recursive. The main nodes included are: CStoreScan (sequential scan), CStoreIndexScan (index scan), CStoreIndexHeapScan (using Bitmap to obtain tuples), VecMaterial (materialization), VecSort (sorting), VecHashJoin (vectorized hash connection), etc., which will be discussed one by one below. Introduce these execution operators.

Scan operator

The scan operator is used to scan the data in the table, and each time it obtains a tuple as the input of the upper node, it exists in the leaf node of the query plan tree. It can not only scan the table, but also scan the result set of the function, the linked list structure, and the child. Query result set. Some of the more common scan operators are shown in the table.

Operator (row/column storage operator) meaning Appear scene
SeqScan/ CStoreScan sequential scan The most basic scan operator, used to scan physical tables (sequential scan without index assistance)
IndexScan/CStoreIndexScan index scan An index is created on the attributes involved in the selection criteria
IndexOnlyScan/CStoreIndexOnlyScan Return tuple directly from index Index columns completely cover result set columns
BitmapScan(BitmapIndexScan, BitmapHeapScan) / CStoreIndexHeapScan (CStoreIndexAnd, CStoreIndexOr,CStoreIndexCtidScan) Use Bitmap to get tuples BitmapIndexScan uses the index on the attribute to scan and returns the result as a bitmap; BitmapHeapScan obtains the tuple from the bitmap output by BitmapIndexScan
TidScan Get tuple by tuple tid 1.WHERE conditions(like CTID = tid or CTID IN (tid1, tid2, …)) ;2.UPDATE/DELETE … WHERE CURRENT OF cursor
SubqueryScan/VecSubqueryScan subquery scan Use another query plan tree (subplan) as the scan object to scan tuples
FunctionScan function scan FROM function_name
ValuesScan Scan the values ​​linked list Scan the collection of tuples given by the VALUES clause
ForeignScan/VecForeignScan External table scan Query external table
CteScan/VecCteScan CTE table scan Scan subqueries defined with WITH clause in a SELECT query

connection operator

The join operator corresponds to the join operation in relational algebra. Taking table t1 join t2 as an example, the main centralized join types are as follows: inner join, left join, right join, full join, semi join, anti join , and their implementation methods include Nestloop ,HashJoin,MergeJoin ;

Operator (row/column storage operator) meaning Appear scene
NestLoop/VecNestLoop Nested loop connection, violent connection, scan the internal table for each row Inner Join, Left Outer Join, Semi Join, Anti Join
MergeJoin/VecMergeJoin Merge connection (input order), sorting of inner and outer tables, positioning the first and last ends, and connecting tuples at once. Equijoin Inner Join, Left Outer Join, Right Outer Join, Full Outer Join, Semi Join, Anti Join
HashJoin/VecHashjoin Hash join, the inner and outer tables use the hash value of the join column to create a hash table, and the same values ​​must be in the same hash bucket. Equijoin Inner Join, Left Outer Join, Right Outer Join, Full Outer Join, Semi Join, Anti Join

materialization operator

Materialized operators are a type of node that can cache tuples. During execution, many extended physical operators need to first obtain all tuples before they can operate (such as aggregate function operations, sorting without index assistance, etc.). This requires using materialization operators to cache the tuples;

Operator (row/column storage operator) meaning Appear scene
Material/VecMaterial materialize Cache child node results
Sort/VecSort sort ORDER BY clause, connection operation, grouping operation, set operation, with Unique
Group/VecGroup Grouping operations GROUP BY child clause
Agg/VecAggregation Execute aggregate function 1. Aggregation functions such as COUNT/SUM/AVG/MAX/MIN; 2. DISTINCT clause; 3. UNION to remove duplicates; 4. GROUP BY clause
WindowAgg/VecWindowAgg window function WINDOW clause
Unique/VecUnique Deduplication (the lower level has been sorted) 1. DISTINCT clause; 2. UNION deduplication
Hash HashJoin helper node Construct a hash table and cooperate with HashJoin
SetOp/VecSetOp Handling collection operations INTERSECT/INTERSECT ALL, EXCEPT/EXCEPT ALL
LockRows Handling row-level locks SELECT … FOR SHARE/UPDATE

control operator

Control operators are a type of node used to handle special situations and implement special execution processes.

Operator (row/column storage operator) meaning Appear scene
Result/VecResult Calculate directly 1. Does not include table scan; 2. There is only one VALUES clause in the INSERT statement; 3. When Append/MergeAppend is the plan root node (projection push-up)
ModifyTable INSERT/UPDATE/DELETE upper node INSERT/UPDATE/DELETE
Append/VecAppend addition 1. UNION(ALL); 2. Inheritance table
MergeAppend Append (input ordered) 1. UNION(ALL); 2. Inheritance table
RecursiveUnion Handling UNION subqueries defined recursively in WITH clause WITH RECURSIVE … SELECT … statement
BitmapAnd Bitmap logical AND operation BitmapScan for multidimensional index scanning
BitmapOr Bitmap logical OR operation BitmapScan for multidimensional index scanning
Limit/VecLimit Handling LIMIT clauses OFFSET … LIMIT …

Other operators

Other operators include Stream operators, and operators such as RemoteQuery

Operator (row/column storage operator) meaning Appear scene
Stream Multi-node data exchange Execute a distributed query plan, and there is data exchange between nodes
Partition Iterator Partitioned iterator Partition table scan, iteratively scan each partition
VecToRow/RowToVec Column to row/Row to column Mixed scene of ranks and ranks
DfsScan / DfsIndexScan HDFS table (index) scan HDFS table scan

The evolution of Gaussdb vectorization

After the first generation vectorization engine, GaussDB evolved vectorization engines with higher performance: Sonic vectorization engine and Turbo vectorization engine.
In order to improve OLAP execution performance, GaussDB continues to evolve on the road of column storage + vectorized execution engine and batch calculation:

  • Stream operator + distributed execution framework supports data flow between multiple nodes
  • SMP, multi-thread parallelism within the node, making full use of idle hardware resources
  • LLVM technology, a new code generation framework, JIT (just in time) compiler, eliminates tuple deform bottlenecks
  • Sonic vectorization engine further vectorizes HashAgg and HashJoin operators, and implements different Arrays to calculate data according to different types of each column.
  • The new generation Turbo vectorization engine further vectorizes most operators. Based on the Sonic engine, Null optimization, large integer optimization, Stream optimization, Sort optimization, etc. are added to further improve performance.

Summarize

This article introduces the GaussDB vectorized execution engine, and elaborates on its framework, principles, overview of each operator, and performance improvement.

 

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~

 

I decided to give up on open source Hongmeng. Wang Chenglu, the father of open source Hongmeng: Open source Hongmeng is the only architectural innovation industrial software event in the field of basic software in China - OGG 1.0 is released, Huawei contributes all source code Google Reader is killed by the "code shit mountain" Fedora Linux 40 is officially released Former Microsoft developer: Windows 11 performance is "ridiculously bad" Ma Huateng and Zhou Hongyi shake hands to "eliminate grudges" Well-known game companies have issued new regulations: employee wedding gifts must not exceed 100,000 yuan Ubuntu 24.04 LTS officially released Pinduoduo was sentenced for unfair competition Compensation of 5 million yuan
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/11054711