PostgreSQL technology insider (8) source code analysis - projection operator and expression calculation

In the previous Postgres technology insider series live broadcast, we introduced the implementation principles and underlying details of Postgres projection operators and expression calculations. This article is organized according to the live broadcast content. The author is currently a HashData kernel R&D engineer.

projection

A type of relational algebra used to select from a relation R the columns whose attributes are contained in A.
Also used in PG to introduce additional columns (such as calculation expressions)

Quick question and quick answer
create table t (int a, int b);
Which of the following SQL will trigger the projection logic?
select * from t;
select a, b from t;
select b, a from t;
select a as b, b as a from t;
select a from t;
select a, b, b from t;
select a + 1, b from t;
select ctid, * from t;

Welcome to leave a message and write down your answer~

PG's implementation of projection

The overall implementation can be broken down into three main parts:

1. Determine whether projection is required

If and only when the descriptor of the scanned relationship does not match the targetlist of the plan node, projection is required, and the tlist_matches_tupdesc function is used to implement related judgment logic.

static bool
tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc)
{
 int   numattrs = tupdesc->natts;
 int   attrno;
 ListCell   *tlist_item = list_head(tlist);

 /* Check the tlist attributes */
 for (attrno = 1; attrno <= numattrs; attrno++)
 {
  Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
  Var     *var;

  if (tlist_item == NULL)
   return false;  /* tlist too short */
  var = (Var *) ((TargetEntry *) lfirst(tlist_item))->expr;
  if (!var || !IsA(var, Var))
   return false;  /* tlist item not a Var */
  /* if these Asserts fail, planner messed up */
  Assert(var->varno == varno);
  Assert(var->varlevelsup == 0);
  if (var->varattno != attrno)
   return false;  /* out of order */
  if (att_tup->attisdropped)
   return false;  /* table contains dropped columns */
  if (att_tup->atthasmissing)
   return false;  /* table contains cols with missing values */

  /*
   * Note: usually the Var's type should match the tupdesc exactly, but
   * in situations involving unions of columns that have different
   * typmods, the Var may have come from above the union and hence have
   * typmod -1.  This is a legitimate situation since the Var still
   * describes the column, just not as exactly as the tupdesc does. We
   * could change the planner to prevent it, but it'd then insert
   * projection steps just to convert from specific typmod to typmod -1,
   * which is pretty silly.
   */
  if (var->vartype != att_tup->atttypid ||
   (var->vartypmod != att_tup->atttypmod &&
    var->vartypmod != -1))
   return false;  /* type mismatch */

  tlist_item = lnext(tlist, tlist_item);
 }

 if (tlist_item)
  return false;   /* tlist too long */

 return true;
}

Its core logic is to traverse the targetlist and determine whether it matches the descriptor of the scan table, which is relatively simple and will not be discussed here.

2. The information needed to construct the projection

ProjectionInfo *
ExecBuildProjectionInfo(List *targetList,
      ExprContext *econtext,
      TupleTableSlot *slot,
      PlanState *parent,
      TupleDesc inputDesc)
{
 /* Insert EEOP_*_FETCHSOME steps as needed */
 ExecInitExprSlots(state, (Node *) targetList);
 /* Now compile each tlist column */
 foreach(lc, targetList)
 {
        /* 考虑投影列为var的情况 */
  if (tle->expr != NULL &&
   IsA(tle->expr, Var) &&
   ((Var *) tle->expr)->varattno > 0)
  {
   if (inputDesc == NULL)
    isSafeVar = true; /* can't check, just assume OK */
   else if (attnum <= inputDesc->natts)
   {
    Form_pg_attribute attr = TupleDescAttr(inputDesc, attnum - 1);

    /*
     * If user attribute is dropped or has a type mismatch, don't
     * use ASSIGN_*_VAR.  Instead let the normal expression
     * machinery handle it (which'll possibly error out).
     */
    if (!attr->attisdropped && variable->vartype == attr->atttypid)
    {
     isSafeVar = true;
    }
   }
   
           /* 对于简单的情况只需要 EEOP_ASSIGN_*_VAR 即可 */
            if (isSafeVar)
            {
                /* Fast-path: just generate an EEOP_ASSIGN_*_VAR step */
                switch (variable->varno)
                {
                    case INNER_VAR:
                        /* get the tuple from the inner node */
                        scratch.opcode = EEOP_ASSIGN_INNER_VAR;
                        break;

                    case OUTER_VAR:
                        /* get the tuple from the outer node */
                        scratch.opcode = EEOP_ASSIGN_OUTER_VAR;
                        break;

                        /* INDEX_VAR is handled by default case */

                    default:
                        /* get the tuple from the relation being scanned */
                        scratch.opcode = EEOP_ASSIGN_SCAN_VAR;
                        break;
                }

               /* 
                 * 这里是核心逻辑 构建了投影所需要的执行步骤 在执行过程中按照步骤依次执行即可
                 * 这么做的本质是为了降低函数递归调用的运行成本
                 */
                ExprEvalPushStep(state, &scratch);
            }
           else
            {
               /* 具体来说,包含表达式计算,或者系统变量等情况时,要按照常规方式处理表达式 */
                /*
                 * Otherwise, compile the column expression normally.
                 *
                 * We can't tell the expression to evaluate directly into the
                 * result slot, as the result slot (and the exprstate for that
                 * matter) can change between executions.  We instead evaluate
                 * into the ExprState's resvalue/resnull and then move.
                 */
                ExecInitExprRec(tle->expr, state,
                                &state->resvalue, &state->resnull);
               
               // 投影求值计算的时候会用到 attnum 和 resultnum
                scratch.d.assign_var.attnum = attnum - 1;
                scratch.d.assign_var.resultnum = tle->resno - 1;
                ExprEvalPushStep(state, &scratch); 
            }
        }
       
    }
}

In this section, we mainly annotate the fast-path related logic in the above code, and the rest of the logic will be explained later.

The core logic of this code is to  ExprEvalPushStep construct the execution process of the projection represented by an array by calling, and identify the type of each step through the opcode, so that different processes can be called according to the opcode during the execution phase. Please refer to the execution process below. understand.
Compared with traditional expression evaluation logic, the advantage of writing this way is to reduce function recursive calls.

3. Execute the projection operator

The entry function of the executor projection operator, you can see that the key function is that ExecEvalExprSwitchContextthe logic related to expression evaluation in PG is implemented through this function.

#ifndef FRONTEND
static inline TupleTableSlot *
ExecProject(ProjectionInfo *projInfo)
{
 ExprState  *state = &projInfo->pi_state;
 TupleTableSlot *slot = state->resultslot; // 投影之后的结果;目前还是未计算的状态

   /* Run the expression, discarding scalar result from the last column. */
 (void) ExecEvalExprSwitchContext(state, econtext, &isnull);

 return slot;
}

First of all, let's introduce the framework of PG expression evaluation. Projection evaluation is also implemented by expression evaluation, that is, calling, and ExecEvalExprSwitchContext its underlying layer is called  ExecInterpExpr.

The entire execution process is based on a set of distributor mechanisms implemented by macro definitions, which implements the logic of sequentially executing the previously constructed expression evaluation steps. During process execution, we will use ExprState to store intermediate calculation results and other execution states. The specific code is as follows:

// opcode对应步骤的实现逻辑的标识 用于goto
#define EEO_CASE(name)  CASE_##name:
// 分发至步骤的执行逻辑
#define EEO_DISPATCH()  goto *((void *) op->opcode)
// 
#define EEO_OPCODE(opcode) ((intptr_t) dispatch_table[opcode])
// 当前步骤执行完毕时移动至下一个需要执行的步骤
#define EEO_NEXT() \
 do { \
  op++; \
  EEO_DISPATCH(); \
 } while (0)

ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
{
 op = state->steps; // 存储所有的步骤,我们通过宏不断移动当前执行的步骤
 resultslot = state->resultslot; // 用于存放最后返回的结果值
 innerslot = econtext->ecxt_innertuple;
 outerslot = econtext->ecxt_outertuple;
 scanslot = econtext->ecxt_scantuple;
  
 EEO_DISPATCH();

   EEO_CASE()
  EEO_CASE(EEOP_DONE)
  {
   goto out;
  }
  
  EEO_CASE(EEOP_SCAN_FETCHSOME)
  {
   CheckOpSlotCompatibility(op, scanslot);

   slot_getsomeattrs(scanslot, op->d.fetch.last_var);

   EEO_NEXT();
  }

    EEO_CASE(EEOP_ASSIGN_SCAN_VAR)
  {
   int   resultnum = op->d.assign_var.resultnum;
   int   attnum = op->d.assign_var.attnum;

   /*
    * We do not need CheckVarSlotCompatibility here; that was taken
    * care of at compilation time.  But see EEOP_INNER_VAR comments.
    */
   resultslot->tts_values[resultnum] = scanslot->tts_values[attnum];
   resultslot->tts_isnull[resultnum] = scanslot->tts_isnull[attnum];

   EEO_NEXT();
  }
out:
   *isnull = state->resnull
   return state->resvalue
}

In this way, we have realized the logic of projection column calculation, and the final tuple is stored in it  state->resultslotfor use by upper-level operators.

expression calculation

Below we introduce the implementation of expression calculation. The expression calculation process reuses the same logic as the previous projection column evaluation process, that is, the same distribution mechanism is used for evaluation.

Significant differences include:

1) Calculation logic is usually more complex and requires multiple steps to complete, essentially evaluating the expression tree in an iterative manner;

2) The expression can be pre-calculated, and the constant part can be evaluated at the optimizer stage to avoid repeated evaluation during the iteration process.

Let's use an example to study how the evaluation steps corresponding to the expression tree are constructed. First, let's take a look at how PG represents the expression tree in memory. Taking the above query as an example, the following FuncExpr can find out the corresponding relationship explain select (SQRT(POWER(i,i))) from generate_series(1,5) i;very clearly .(SQRT(POWER(i,i)))

FuncExpr [funcid=1344 funcresulttype=701 funcretset=false funcvariadic=false funcformat=COERCE_EXPLICIT_CALL is_tablefunc=false]
        FuncExpr [funcid=1368 funcresulttype=701 funcretset=false funcvariadic=false funcformat=COERCE_EXPLICIT_CALL is_tablefunc=false]
                FuncExpr [funcid=316 funcresulttype=701 funcretset=false funcvariadic=false funcformat=COERCE_IMPLICIT_CAST is_tablefunc=false]
                        Var [varno=1 varattno=1 vartype=23 varnosyn=1 varattnosyn=1]
                FuncExpr [funcid=316 funcresulttype=701 funcretset=false funcvariadic=false funcformat=COERCE_IMPLICIT_CAST is_tablefunc=false]
                        Var [varno=1 varattno=1 vartype=23 varnosyn=1 varattnosyn=1]

Generally speaking, recursive calculation is usually used to solve such an expression tree. The underlying expression is calculated first, and then the high-level expression is calculated. The overall process is similar to the process of traversing the tree in postorder.

In order to improve execution efficiency, PG chooses to iteratively execute in the way of projection evaluation in the execution phase. Therefore, in the execution initialization phase, it is necessary to adopt a method similar to post-order traversal of the tree, and add each subexpression to the array of evaluation steps.

The call stack for building the evaluation step is as follows:

-> ExecBuildProjectionInfo
 -> ExecInitExprSlots // EEOP_*_FETCHSOME
     ->ExprEvalPushStep
    for targetList:
      -> ExecInitExprRec(tle->expr, )
          scratch.resvalue = resv // 当前步骤的结果;上层通过指针传入我们应该存放的地址
          case T_FuncExpr:
            -> ExecInitFunc // 当前函数 [funcid=1344]
                fcinfo = scratch->d.func.finfo
                for args: // 这层有一个参数
                    -> ExecInitExprRec(arg, state, &fcinfo->args[argno].value) // resv - where to stor the result of the node into
                        case T_FuncExpr:
                          -> ExecInitFunc // 当前函数 [funcid=1368]
                              for args: // 这层有两个参数
                                  -> ExecInitExprRec()
                                      case T_FuncExpr:
                                          -> ExecInitFunc // 当前函数 [funcid=316]
                                              for args: // 这层有一个参数 Var [varno=1 varattno=1 vartype=23 varnosyn=1 varattnosyn=1]
                                                  -> ExecInitExprRec()
                                                      case T_Var:
                                                          // regular user column
                                                          scratch.d.var.attnum = variable->varattno - 1;
                                                          -> ExprEvalPushStep()
                                      ExprEvalPushStep()
                          ExprEvalPushStep()
            ExprEvalPushStep()   
   ExprEvalPushStep()
 scratch.opcode = EEOP_DONE;
 ExprEvalPushStep()                                              

The main body of the whole process is the recursive traversal of the expression tree. The expression tree usually contains several  T_FuncExpror other types of expression nodes in the middle. Each node has several parameters. The parameter may also be an expression. After all the child node evaluation steps are generated, the current step is generated into the array; The leaf nodes are usually T_Var or T_Const, and the processing method is consistent with the projection.

This section focuses on the step construction process of the T_FuncExpr type and the non-fast_path expression evaluation logic that was not mentioned before. It mainly contains two functions: ExecInitExprRec and  ExecInitFunc.

Among them  ExecInitExprRec is the key function of expression evaluation, and it is also where the recursive call occurs; the code will call different logic according to different expression types, and each branch will be called recursively according to the specific situation, and then push the current step into the step  ExecInitExprRecarray  ExprEvalPushStep . Among them, there is a very important step  scratch.resvalue = resv, so that the value calculated in the current step can be passed in by the upper-level caller in the form of a pointer (equivalent to the upper-level expression can get the evaluation result of the sub-expression), so that the The entire recursive calculation process is connected in series.

ExecInitFunc It is the process of calculating the function expression type. Due to its complexity, it is written as an independent function. Its main logic is to traverse the parameters of the current function, and ExecInitExprRecinitialize the evaluation steps by calling them respectively; the result of the subexpression evaluation can be  &fcinfo->args[argno].value obtained through; after completion, push the evaluation steps of the current function into the steps array.

The actual evaluation process of the above example is as follows. In the aforementioned distribution mechanism, the following steps are executed in sequence:

    EEO_CASE(EEOP_SCAN_FETCHSOME)

    EEO_CASE(EEOP_SCAN_VAR) // scan i
    EEO_CASE(EEOP_FUNCEXPR_STRICT) // i4tod

    EEO_CASE(EEOP_SCAN_VAR) // scan i
    EEO_CASE(EEOP_FUNCEXPR_STRICT) // i4tod
    
    EEO_CASE(EEOP_FUNCEXPR_STRICT) // dpow
   EEO_CASE(EEOP_FUNCEXPR_STRICT) // dsprt
    
    EEO_CASE(EEOP_ASSIGN_TMP) // 将计算结果赋值到resultslot
    
   EEO_CASE(EEOP_DONE)

constant precomputation optimization

For the constant part in the expression tree, we can calculate it in the optimization stage to avoid repeated evaluation. Also use an example to discuss this issue. In the following example, if POWER(2,3) can be replaced by 8 at the optimization stage, then at the execution stage, it is obviously possible to avoid repeated calculations of POWER(2,3) 5 times.

select i+POWER(2,3) from generate_series(1,5) i;

The call stack is as follows:

-> preprocess_expression
 -> eval_const_expressions
     -> eval_const_expressions_mutator
         -> expression_tree_mutator (case T_List)
             -> eval_const_expressions_mutator
                 -> expression_tree_mutator (case T_TargetEntry)
        -> eval_const_expressions_mutator (case T_OpExpr)
         -> simplify_function
                             // 对表达式列表递归调用 eval_const_expressions_mutator
                             -> args = expand_function_arguments()
          -> args = (List *) expression_tree_mutator(args,eval_const_expressions_mutator)
                             -> evaluate_function // try to pre-evaluate a function call 
           -> evaluate_expr // pre-evaluate a constant expression
            // 初始化表达式节点的执行状态信息
                                        -> ExecInitExpr
                                         -> ExecInitExprSlots() // Insert EEOP_*_FETCHSOME steps as needed
                                            -> ExecInitExprRec() // 将执行步骤填入 ExprState->steps 数组
                                             case T_FuncExpr:
             -> ExecInitFunc // 主要工作是将 argument 求值;并放入 state 的 list 中
                                                     foreach(lc, args)
                                                        if IsA(arg, Const)
                                                             fcinfo->args[argno].value = con->constvalue
                                                            else
                                                             ExecInitExprRec() // 递归对表达式参数求值
                                                    -> ExprEvalPushStep
                                        -> const_val = ExecEvalExprSwitchContext
                                         -> evalfunc()
                                             op = state->steps
                                                resultslot = state->resultslot
                                                outerslot = econtext->ecxt_outertuple
                                                EEO_DISPATCH() // goto op->opcode

The core logic of the code is to   traverse the expression tree through the sum function; if op_expr is encountered,  eval_const_expressions_mutator the   constant in the simplified sub-expression tree is called.  The subexpressions are checked for non-constants; if they are all constants, the simplification can continue.express_tree_mutatorsimplify_function()->evaluate_function()
evaluate_function

The essence of the simplified process is to advance the evaluation process of the executor stage to the optimization stage: first generate node execution status information and an array of evaluation steps; then call and execute sequentially; finally generate  ExecEvalExprSwitchContext a constant node through makeConst to replace the original complex expression child nodes.

So far, we have systematically introduced the implementation logic of projection and expression calculation in PG. Projection is almost a necessary operation in most cases, and one of the few optimization methods may be to push down the projection of upper-level operators, so we will not discuss it here.

Guess you like

Origin blog.csdn.net/m0_54979897/article/details/130829818