Building Efficient Physical Plans: From Logical Query to Operator Implementation

I. Introduction

This article mainly introduces how a query statement generates a physical plan from a logical plan. The operations to be executed by the query statement and the table information involved are stored in  the PlanNode  of the logical plan . The construction of the physical plan is to  convert the  corresponding  PlanNode into the corresponding operator (Processor ) by analyzing the PlanNode of the logical plan. Then connect through Stream  .

In order to facilitate your understanding of the content of this article, first briefly introduce the following three concepts:

  • Physical Plan : Physical plan, the plan obtained by selecting each operator of the logical query plan to implement the algorithm and selecting the execution order of these operators.
  • Processor : The result obtained by the physical plan, which stores information about the operator that the statement needs to execute, what operator needs to be executed, and which node the operator needs to be sent to for execution in the distributed system, etc.
  • Stream : The information stored in  the Processor  , marking  the execution order of the operators in the Processor  and the information of the execution nodes.

2. Construction of physical plan

We will introduce the generation of physical plan through the following  SQL statement :

select max(height),class from heights join students on heights.id=students.id group by class having class in(1,2) order by max(height) desc limit 2;

The PlanNode  of the query statement  is shown in the following figure:

Taking the above-mentioned  PlanNode  as an example, the bottom layer is two  scanNodes , which are  a full table scan of  heights  and  students  respectively, and the result will be returned to the joinNode on the upper layer . The joinNode  will join  the two tables  to generate a virtual table. There are all columns in both tables.

The renderNode  on the upper layer  will query this virtual table to filter out  the height  column and  class  column, and the groupNode  will  group  the class  column  and  perform  max  aggregation on the class  . Then,  sortNode  sorts the max  aggregation results processed  by  groupNode  , and limitNode  performs corresponding operations on the results. Finally, the topmost  renderNode  queries the results and filters out  the max(height)  and  class  columns. The above is the detailed information of the  PlanNode  .

Then,  the PlanNode  will be parsed to generate a physical plan through the createPlanForNode  function  . This function is a recursive function, which will  construct the corresponding physical plan through the type of PlanNode  .

Taking the above query statement as an example, the  PlanNode  will  recursively execute  the scanNode  construction function  createTableReaders first; then, create  a tablereader  spec through  initTableReaderSpec  ; then, obtain the filter  and  limit of the operator  through the plannode  passed down from the logical plan  ; then, through  MakeExpression()  constructs  the filter  of the physical plan and  passes  the filter  and  limit  into the post  .

Finally, judge whether it is a distributed read plan through  the isLocal  of  planCtx  .

  • If so, build  a SpanPartition  array and read  the value of the table  from each node ;

  • If not, just read the local data.

The specific process is shown in the figure below :

After constructing  the plans of  left scanNode  and  right scanNode  , rightPlan  and  leftPlan are obtained. LeftPlan  and  rightPlan  execute  MergePlans()  to merge the left and right plans, and merge the processor  and  stream  information of the left and right plans  . 

The steps to judge whether it is distributed execution are similar to the above judgment method. Finally, judge  whether  leftMergeOrd.Columns  is equal to nil .

  • If so , build  hashjoinspec;

  • If not, build  the mergejoinspec .

Execute  AddjoinStage()  to  add the joinProcessor  to the specified node and connect the left and right  outputs  to these Processors  , and the joinNode  is processed. The basic flow is as shown in the figure below. Process  renderNode  groupNode sortNode,  etc. in turn, and add the corresponding operator information to the physical plan.

Microsoft's official announcement: Visual Studio for Mac retired The programming language created by the Chinese developer team: MoonBit (Moon Rabbit) Bjarne Stroustrup, the father of C++, shared life advice Linus also dislikes messy abbreviations, what TM is called "GenPD" Rust 1.72.0 released , the minimum supported version in the future is Windows 10 Wenxin Yiyan opens WordPress to the whole society and launches the "100-year plan" . : Crumb green language V1.0 officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5148943/blog/10092169