I. Introduction
This article mainly introduces how a query statement generates a physical plan from a logical plan. The operations to be executed by the query statement and the table information involved are stored in the PlanNode of the logical plan . The construction of the physical plan is to convert the corresponding PlanNode into the corresponding operator (Processor ) by analyzing the PlanNode of the logical plan. Then connect through Stream .
In order to facilitate your understanding of the content of this article, first briefly introduce the following three concepts:
- Physical Plan : Physical plan, the plan obtained by selecting each operator of the logical query plan to implement the algorithm and selecting the execution order of these operators.
- Processor : The result obtained by the physical plan, which stores information about the operator that the statement needs to execute, what operator needs to be executed, and which node the operator needs to be sent to for execution in the distributed system, etc.
- Stream : The information stored in the Processor , marking the execution order of the operators in the Processor and the information of the execution nodes.
2. Construction of physical plan
We will introduce the generation of physical plan through the following SQL statement :
select max(height),class from heights join students on heights.id=students.id group by class having class in(1,2) order by max(height) desc limit 2;
The PlanNode of the query statement is shown in the following figure:
Taking the above-mentioned PlanNode as an example, the bottom layer is two scanNodes , which are a full table scan of heights and students respectively, and the result will be returned to the joinNode on the upper layer . The joinNode will join the two tables to generate a virtual table. There are all columns in both tables.
The renderNode on the upper layer will query this virtual table to filter out the height column and class column, and the groupNode will group the class column and perform max aggregation on the class . Then, sortNode sorts the max aggregation results processed by groupNode , and limitNode performs corresponding operations on the results. Finally, the topmost renderNode queries the results and filters out the max(height) and class columns. The above is the detailed information of the PlanNode .
Then, the PlanNode will be parsed to generate a physical plan through the createPlanForNode function . This function is a recursive function, which will construct the corresponding physical plan through the type of PlanNode .
Taking the above query statement as an example, the PlanNode will recursively execute the scanNode construction function createTableReaders first; then, create a tablereader spec through initTableReaderSpec ; then, obtain the filter and limit of the operator through the plannode passed down from the logical plan ; then, through MakeExpression() constructs the filter of the physical plan and passes the filter and limit into the post .
Finally, judge whether it is a distributed read plan through the isLocal of planCtx .
-
If so, build a SpanPartition array and read the value of the table from each node ;
-
If not, just read the local data.
The specific process is shown in the figure below :
After constructing the plans of left scanNode and right scanNode , rightPlan and leftPlan are obtained. LeftPlan and rightPlan execute MergePlans() to merge the left and right plans, and merge the processor and stream information of the left and right plans .
The steps to judge whether it is distributed execution are similar to the above judgment method. Finally, judge whether leftMergeOrd.Columns is equal to nil .
-
If so , build hashjoinspec;
-
If not, build the mergejoinspec .
Execute AddjoinStage() to add the joinProcessor to the specified node and connect the left and right outputs to these Processors , and the joinNode is processed. The basic flow is as shown in the figure below. Process renderNode , groupNode , sortNode, etc. in turn, and add the corresponding operator information to the physical plan.
Microsoft's official announcement: Visual Studio for Mac retired The programming language created by the Chinese developer team: MoonBit (Moon Rabbit) Bjarne Stroustrup, the father of C++, shared life advice Linus also dislikes messy abbreviations, what TM is called "GenPD" Rust 1.72.0 released , the minimum supported version in the future is Windows 10 Wenxin Yiyan opens WordPress to the whole society and launches the "100-year plan" . : Crumb green language V1.0 officially released