In-depth analysis of KaiwuDB focus operation

 

1. AST abstract syntax tree

Execute a simple SQL statement SELECT avg(b) FROM NATION GROUP BY b. NATION is a small table with only 25 records; perform an average aggregation operation on column 2 b. The SQL statement in the above example is parsed by the analyzer and the AST is obtained, as shown in the figure below.

2. Logical plan

Convert AST into a tree-structured Plan, called a logical query plan. Each syntax element in the abstract syntax tree is converted into a query logical unit, such as scanNode, sortNode, groupNode, etc.

The logical plan in the example is very simple, which is scanning nodes (Scan) and aggregation (Group By). Command EXPLAIN SELECT avg(b) FROM NATURE GROUP BY b; the display is as follows:

3. Physics Plan

The (DistSQLPlanner).PlanAndRun method converts the logical plan into a physical plan, in which the createPlanForNode method is recursively called to generate each physical operator and handed over to the executor for specific execution. The process of generating a physical plan is that KaiwuDB decides whether to generate a distributed execution plan based on the distribution of the underlying KV data and the estimated size of the returned data set.This example is a physical plan for local execution.

In the (DistSQLPlanner).PlanAndRun method, the logical plan will be converted into a physical plan, and each physical operator will be generated by recursively calling the createPlanForNode method. These operators will be handed over to the executor for specific execution; KaiwuDB will execute the plan according to the distribution and distribution of the underlying KV data. Estimate the size of the returned data set and determine whether a distributed execution plan needs to be generated.

There is no one-to-one correspondence between logical plan nodes and physical plan nodes, but in this example, Scan and Group in the logical plan correspond to the TableReader operator and aggregation operator in the physical plan respectively.

4. Execution

Finally, the (DistSQLPlanner).Run method is called to execute the physical plan. The execution engine uses the Volcano model, and each layer of execution operators obtains a record by calling the Next method of the next layer.

Aggregation is divided into two situations.The specific execution process is as follows:

(1)HashAggregater

During the calculation process of Hash Aggregate, we need to maintain a Hash table. The key of the Hash table is the Group-By column of the aggregate calculation. Taking the average function avg as an example, the values ​​are the intermediate results of the aggregate function, sum and count. In the example of finding avg(b) in Group-By column a, find the value whose key is Group-By column b, that is, sum(b) and count(b).

During the calculation process, you only need to calculate the key based on each row of input data, find the corresponding value in the Hash table, and update it.

// Next is part of the RowSource interface.
func (ag *hashAggregator) Next() (sqlbase.EncDatumRow, *distsqlpb.ProducerMetadata) {
  for ag.State == runbase.StateRunning {
    var row sqlbase.EncDatumRow
    var meta *distsqlpb.ProducerMetadata
    switch ag.runningState {
    case aggAccumulating:
      ag.runningState, row, meta = ag.accumulateRows()
    case aggEmittingRows:
      ag.runningState, row, meta = ag.emitRow()
    default:
      log.Fatalf(ag.Ctx, "unsupported state: %d", ag.runningState)
    }

    if row == nil && meta == nil {
      continue
    }
    return row, meta
  }
  return nil, ag.DrainHelper()
}

Among them, ag.runningState is aggAccumulating: it is the state when the data has not been read yet and the final agg result has not been calculated. When all the data is read, the state is set to aggEmittingRows to output the result. The above plan is a typical Hashagg example logical plan.

(2) OrderAggregator operator

The calculation of OrderAggregate needs to ensure that the input data is ordered according to the Group-By column. During the calculation process, whenever the value of a new Group is read or all data input is completed, the final result of the aggregation of the previous Group is calculated. Because the input data of OrderAggregate needs to ensure continuous input of data from the same Group, Stream Aggregate can immediately return results upward after processing the data of a Group, unlike HashAggregate, which needs to process all data before it can correctly return results to the outside world.

When the upper-layer operator only needs to calculate part of the results, such as Limit, after obtaining the required number of rows, the subsequent useless calculations of OrderAggregate can be interrupted in advance. When there is an index on the Group-By column, reading data from the index can ensure that the input data is ordered according to the Group-By column. At this time, the data of the same Group is continuously input into the OrderAggregate operator, which can avoid additional sorting operations. If you want to use Orderagg, you need to index the groupby column: create index on nature(b), and you can see the planned changes.

The following is the Hashagg and Orderagg aggregation method class diagram:

5. Optimization method

In order to improve the execution efficiency of aggregation operators, KaiwuDB proposes the following two parallel focusing methods:

(1) HashAggregater parallelism

The parallel design idea of ​​HashAggregater is that after parallel calculation, the calculated results are directly summarized in Hash according to the intermediate statistical information generated during calculation, and summarized by final workers. HashAggExec handles all aggregate functions, built according to the Aggragator plan, and when Next() is called, reads all data from Src and updates all items in partialagfuncs until all gorutinees are completed.

The specific modification idea is to build a Parallelworker operator (the original place of newAggregator) in the upper layer Processor , and let it build newAggregator, that is, parallel The calculated HashAggregator sub-operator is constructed according to the set concurrency, and the constructed newAggregator takes the tablereader block to read and calculate. The calculated result is not sent, but is passed into the pipeline. The top-level Parallelworker operator traverses each pipeline and performs a HashAggragator on all the data obtained. Finally, all results are sent.

(2) OrderAggregator operator parallelism

The overall parallel design idea of ​​OrderAggregater is basically the same as HashAggregator. However, since the OrderAggregator is already ordered, when distributing data, it needs to be divided into blocks and distributed to sub-operators in order. The calculated results are directly summarized in Hash according to the intermediate statistical information generated during calculation, and the final workers are directly summarized in the order of allocated blocks. The other parts are the same as HashAggregator, and all results are sent at the end.

The following figure is a parallel operation flow chart using HashAggregator as an example:

 

Tang Xiaoou, founder of SenseTime, passed away at the age of 55 In 2023, PHP stagnated Wi-Fi 7 will be fully available in early 2024 Debut, 5 times faster than Wi-Fi 6 Hongmeng system is about to become independent, and many universities have set up “Hongmeng classes” Zhihui Jun’s startup company refinances , the amount exceeds 600 million yuan, and the pre-money valuation is 3.5 billion yuan Quark Browser PC version starts internal testing AI code assistant is popular, and programming language rankings are all There's nothing you can do Mate 60 Pro's 5G modem and radio frequency technology are far ahead MariaDB splits SkySQL and is established as an independent company Xiaomi responds to Yu Chengdong’s “keel pivot” plagiarism statement from Huawei
{{o.name}}
{{m.name}}

おすすめ

転載: my.oschina.net/u/5148943/blog/10123193