Calcite分析 - Rule - 代码天地

Calcite源码分析，参考：

http://matt33.com/2019/03/07/apache-calcite-process-flow/

https://matt33.com/2019/03/17/apache-calcite-planner/

Rule作为Calcite查询优化的核心，

具体看几个有代表性的Rule，看看是如何实现的

最简单的例子，Join结合律，JoinAssociateRule

首先所有的Rule都继承RelOptRule类

/**
 * Planner rule that changes a join based on the associativity rule.
 *
 * <p>((a JOIN b) JOIN c) &rarr; (a JOIN (b JOIN c))</p>
 *
 * <p>We do not need a rule to convert (a JOIN (b JOIN c)) &rarr;
 * ((a JOIN b) JOIN c) because we have
 * {@link JoinCommuteRule}.
 *
 * @see JoinCommuteRule
 */
public class JoinAssociateRule extends RelOptRule {

RelOptRule用于transform expression

它维护的Operand tree表明，该rule可以适用于何种树结构，具体看下面的例子

/**
 * A <code>RelOptRule</code> transforms an expression into another. It has a
 * list of {@link RelOptRuleOperand}s, which determine whether the rule can be
 * applied to a particular section of the tree.
 *
 * <p>The optimizer figures out which rules are applicable, then calls
 * {@link #onMatch} on each of them.</p>
 */
public abstract class RelOptRule {
 
  /**
   * Root of operand tree.
   */
  private final RelOptRuleOperand operand;

  /** Factory for a builder for relational expressions.
   *
   * <p>The actual builder is available via {@link RelOptRuleCall#builder()}. */
  public final RelBuilderFactory relBuilderFactory;

  /**
   * Flattened list of operands.
   */
  public final List<RelOptRuleOperand> operands;

  //~ Constructors -----------------------------------------------------------

  /**
   * Creates a rule with an explicit description.
   *
   * @param operand     root operand, must not be null
   * @param description Description, or null to guess description
   * @param relBuilderFactory Builder for relational expressions
   */
  public RelOptRule(RelOptRuleOperand operand,
      RelBuilderFactory relBuilderFactory, String description) {
    this.operand = Objects.requireNonNull(operand);
    this.relBuilderFactory = Objects.requireNonNull(relBuilderFactory);this.description = description;
    this.operands = flattenOperands(operand);
    assignSolveOrder();
  }

比如对于Join结合律，调用super，即RelOptRule的构造函数

  /**
   * Creates a JoinAssociateRule.
   */
  public JoinAssociateRule(RelBuilderFactory relBuilderFactory) {
    super(
        operand(Join.class,
            operand(Join.class, any()),
            operand(RelSubset.class, any())),
        relBuilderFactory, null);
  }

operand也是一个树形结构，Top的Operand的类型是Join，他有两个children，其中一个也是join，另一个是RelSubset

他们的children是any()

  /**
   * Creates a list of child operands that signifies that the operand matches
   * any number of child relational expressions.
   *
   * @return List of child operands that signifies that the operand matches
   *   any number of child relational expressions
   */
  public static RelOptRuleOperandChildren any() {
    return RelOptRuleOperandChildren.ANY_CHILDREN;
  }

然后要理解Rule如果被使用的，来看看HepPlaner里面的代码

因为HepPlaner逻辑比较简单，就是遍历所有的HepRelVertex，看看RelOptRule是否可以匹配，如果匹配就应用Rule

所以会调用到applyRule，输入包含Rule和Vertex

  private HepRelVertex applyRule(
      RelOptRule rule,
      HepRelVertex vertex,
      boolean forceConversions) {

    final List<RelNode> bindings = new ArrayList<>();
    final Map<RelNode, List<RelNode>> nodeChildren = new HashMap<>();
    boolean match =
        matchOperands(
            rule.getOperand(),
            vertex.getCurrentRel(),
            bindings,
            nodeChildren);

    if (!match) {
      return null;
    }

    HepRuleCall call =
        new HepRuleCall(
            this,
            rule.getOperand(),
            bindings.toArray(new RelNode[0]),
            nodeChildren,
            parents);

    // Allow the rule to apply its own side-conditions.
    if (!rule.matches(call)) {
      return null;
    }

    fireRule(call);

    if (!call.getResults().isEmpty()) {
      return applyTransformationResults(
          vertex,
          call,
          parentTrait);
    }
    return null;
  }

1. 首先调用，matchOperands，看看是否match，

  private boolean matchOperands(
      RelOptRuleOperand operand,  //Rule中的Operand
      RelNode rel,  //vertex中的RelNode
      List<RelNode> bindings,
      Map<RelNode, List<RelNode>> nodeChildren) {
    if (!operand.matches(rel)) { //先比较top node和operand是否match
      return false;
    }
    bindings.add(rel); //如果match，把top relnode加入bindings
    //接着来比较child是否match
    //child有几种类型：
    //Any，随意，直接返回true
    //Unordered，无序，对于每个operand只要有任何一个child RelNode可满足即可
    //Default，Some，Operand和RelNode严格有序匹配
    List<HepRelVertex> childRels = (List) rel.getInputs();
    switch (operand.childPolicy) {
    case ANY:
      return true;
    case UNORDERED:
      // For each operand, at least one child must match. If
      // matchAnyChildren, usually there's just one operand.
      for (RelOptRuleOperand childOperand : operand.getChildOperands()) {
        boolean match = false;
        for (HepRelVertex childRel : childRels) {
          match =
              matchOperands( //递归调用matchOperands
                  childOperand,
                  childRel.getCurrentRel(),
                  bindings,
                  nodeChildren);
          if (match) {
            break;
          }
        }
        if (!match) {
          return false;
        }
      }
      final List<RelNode> children = new ArrayList<>(childRels.size());
      for (HepRelVertex childRel : childRels) {
        children.add(childRel.getCurrentRel());
      }
      nodeChildren.put(rel, children);
      return true;
    default:
      int n = operand.getChildOperands().size();
      if (childRels.size() < n) {
        return false;
      }
      //一一按顺序对应match
      for (Pair<HepRelVertex, RelOptRuleOperand> pair
          : Pair.zip(childRels, operand.getChildOperands())) {
        boolean match =
            matchOperands(
                pair.right,
                pair.left.getCurrentRel(),
                bindings,
                nodeChildren);
        if (!match) {
          return false;
        }
      }
      return true;
    }
  }

逻辑是先比较自身，然后再递归比较children

比较函数，

可以看出，从类型，Trait，Predicate上来比较，是否匹配

 /**
   * Returns whether a relational expression matches this operand. It must be
   * of the right class and trait.
   */
  public boolean matches(RelNode rel) {
    if (!clazz.isInstance(rel)) {
      return false;
    }
    if ((trait != null) && !rel.getTraitSet().contains(trait)) {
      return false;
    }
    return predicate.test(rel);
  }

child的类型分为，

/**
 * Policy by which operands will be matched by relational expressions with
 * any number of children.
 */
public enum RelOptRuleOperandChildPolicy {
  /**
   * Signifies that operand can have any number of children.
   */
  ANY,

  /**
   * Signifies that operand has no children. Therefore it matches a
   * leaf node, such as a table scan or VALUES operator.
   *
   * <p>{@code RelOptRuleOperand(Foo.class, NONE)} is equivalent to
   * {@code RelOptRuleOperand(Foo.class)} but we prefer the former because
   * it is more explicit.</p>
   */
  LEAF,

  /**
   * Signifies that the operand's children must precisely match its
   * child operands, in order.
   */
  SOME,

  /**
   * Signifies that the rule matches any one of its parents' children.
   * The parent may have one or more children.
   */
  UNORDERED,
}

不同的类型按照注释中的规则进行match

2. 如果match，继续会把当前结果封装成HepRuleCall，继承自RelOptRuleCall

是RelOptRule的一次invocation，即会记录调用中用到的上下文数据和结果

/**
 * A <code>RelOptRuleCall</code> is an invocation of a {@link RelOptRule} with a
 * set of {@link RelNode relational expression}s as arguments.
 */
public abstract class RelOptRuleCall {

  /**
   * Generator for {@link #id} values.
   */
  private static int nextId = 0;

  //~ Instance fields --------------------------------------------------------

  public final int id;
  protected final RelOptRuleOperand operand0; //Rule的Operand树的root
  protected Map<RelNode, List<RelNode>> nodeInputs; //所有的RelNode和他们的inputs
  public final RelOptRule rule;
  public final RelNode[] rels; //match到的所有RelNodes
  private final RelOptPlanner planner;
  private final List<RelNode> parents; //对于Top RelNodes而言的parents

3. 调用rule.matches(call)

默认实现是，return true，即不检查，这里允许Rule加一下特定的match检测

4. 调用fireRule(call)

而其中主要是调用，ruleCall.getRule().onMatch(ruleCall);

所以做的事情是需要在Rule的onMatch里面定义的

下面看下JoinAssociateRule的实现，

  public void onMatch(final RelOptRuleCall call) {
    final Join topJoin = call.rel(0);
    final Join bottomJoin = call.rel(1);
    final RelNode relA = bottomJoin.getLeft();
    final RelNode relB = bottomJoin.getRight();
    final RelSubset relC = call.rel(2);
    final RelOptCluster cluster = topJoin.getCluster();
    final RexBuilder rexBuilder = cluster.getRexBuilder();

    if (relC.getConvention() != relA.getConvention()) {
      // relC could have any trait-set. But if we're matching say
      // EnumerableConvention, we're only interested in enumerable subsets.
      return;
    }

    //        topJoin
    //        /     \
    //   bottomJoin  C
    //    /    \
    //   A      B

    final int aCount = relA.getRowType().getFieldCount();
    final int bCount = relB.getRowType().getFieldCount();
    final int cCount = relC.getRowType().getFieldCount();
    final ImmutableBitSet aBitSet = ImmutableBitSet.range(0, aCount);
    final ImmutableBitSet bBitSet =
        ImmutableBitSet.range(aCount, aCount + bCount);

这部分都是在初始化，上面好理解，下面算这些count是干啥？

其实是在算RexInputRef，

RexNode和RelNode的共同点是，他们都代表一个树

差别是他们代表的东西不同，RelNode代表关系代数算子组成的数，而RexNode表示表达式树

比如RexLiteral表示常量，RexVariable表示变量，RexCall表示操作来连接Literal和Variable

RexVariable往往表示输入的某个field，为了效率，这里只会记录field的id，即这个RexInputRef

/**
 * Variable which references a field of an input relational expression.
 *
 * <p>Fields of the input are 0-based. If there is more than one input, they are
 * numbered consecutively. For example, if the inputs to a join are</p>
 *
 * <ul>
 * <li>Input #0: EMP(EMPNO, ENAME, DEPTNO) and</li>
 * <li>Input #1: DEPT(DEPTNO AS DEPTNO2, DNAME)</li>
 * </ul>
 *
 * <p>then the fields are:</p>
 *
 * <ul>
 * <li>Field #0: EMPNO</li>
 * <li>Field #1: ENAME</li>
 * <li>Field #2: DEPTNO (from EMP)</li>
 * <li>Field #3: DEPTNO2 (from DEPT)</li>
 * <li>Field #4: DNAME</li>
 * </ul>
 *
 * <p>So <code>RexInputRef(3, Integer)</code> is the correct reference for the
 * field DEPTNO2.</p>
 */
public class RexInputRef extends RexSlot {

而RexInputRef的index不是固定的，是按顺序排的，

所以按上面把fieldcount都算出来，就可以排出对应的index

接着做的一堆都是为了调整conditions的位置和相对应的index

    // Goal is to transform to
    //
    //       newTopJoin
    //        /     \
    //       A   newBottomJoin
    //               /    \
    //              B      C

    // Split the condition of topJoin and bottomJoin into a conjunctions. A
    // condition can be pushed down if it does not use columns from A.
    final List<RexNode> top = new ArrayList<>();
    final List<RexNode> bottom = new ArrayList<>();
    JoinPushThroughJoinRule.split(topJoin.getCondition(), aBitSet, top, bottom);
    JoinPushThroughJoinRule.split(bottomJoin.getCondition(), aBitSet, top,
        bottom);

    // Mapping for moving conditions from topJoin or bottomJoin to
    // newBottomJoin.
    // target: | B | C      |
    // source: | A       | B | C      |
    final Mappings.TargetMapping bottomMapping =
        Mappings.createShiftMapping(
            aCount + bCount + cCount,
            0, aCount, bCount,
            bCount, aCount + bCount, cCount);
    final List<RexNode> newBottomList = new ArrayList<>();
    new RexPermuteInputsShuttle(bottomMapping, relB, relC)
        .visitList(bottom, newBottomList);
    RexNode newBottomCondition =
        RexUtil.composeConjunction(rexBuilder, newBottomList);

1. 把原先TopJoin和BottomJoin里面的conditions，分成和A相关的和无关的

因为调整完以后，和A相关的conditions要放到TopJoin，而和A无关的需要放到BottomJoin

aBitSet里面记录了所有A的fields的index，

  /**
   * Splits a condition into conjunctions that do or do not intersect with
   * a given bit set.
   */
  static void split(
      RexNode condition,
      ImmutableBitSet bitSet,
      List<RexNode> intersecting,
      List<RexNode> nonIntersecting) {
    for (RexNode node : RelOptUtil.conjunctions(condition)) {//把conjunction的条件拆分开
      ImmutableBitSet inputBitSet = RelOptUtil.InputFinder.bits(node); //找出condition所用到的fields
      if (bitSet.intersects(inputBitSet)) {
        intersecting.add(node);
      } else {
        nonIntersecting.add(node);
      }
    }
  }

如何找出condition所用到的fields？

如下，在递归的过程做回把所有碰到的inputRef记录下来

    /**
     * Returns a bit set describing the inputs used by an expression.
     */
    public static ImmutableBitSet bits(RexNode node) {
      return analyze(node).inputBitSet.build();
    }

  /** Returns an input finder that has analyzed a given expression. */
    public static InputFinder analyze(RexNode node) {
      final InputFinder inputFinder = new InputFinder();
      node.accept(inputFinder); //Visitor模式
      return inputFinder;
    }

   //如果RexNode是InputRef，就记录下该Ref的index
    public Void visitInputRef(RexInputRef inputRef) {
      inputBitSet.set(inputRef.getIndex());
      return null;
    }
  
  //如果RexNode是Call，继续递归accept该Call的相关Operands
  public R visitCall(RexCall call) {
    if (!deep) {
      return null;
    }

    R r = null;
    for (RexNode operand : call.operands) {
      r = operand.accept(this);
    }
    return r;
  }

2. 由于树结构变了，会导致整个field的index发生改变

对于原先的，bottomJoin，A的field从0开始，B是从aCount开始
而对于变换后的，BottomJoin，B的field从0开始，C是从bCount开始

所以需要完成Ref的修正，

createShiftMapping，生成的mapping，包含原先的index，和当前新的index

第一个参数是index的范围，一共aCount + bCount + cCount，所以index需要小于这个值

然后，只有B，C的index需要调整，A的index不需要调整

其中，B的原先是从aCount开始，当前从0开始，一共bCount个fields；C的原先是从aCount + bCount开始，当前从bCount开始，一共cCount个fields

接着，RexPermuteInputsShuttle做具体的更新

/**
 * Shuttle which applies a permutation to its input fields.
 *
 * @see RexPermutationShuttle
 * @see RexUtil#apply(org.apache.calcite.util.mapping.Mappings.TargetMapping, RexNode)
 */
public class RexPermuteInputsShuttle extends RexShuttle {
  //~ Instance fields --------------------------------------------------------

  private final Mappings.TargetMapping mapping;
  private final ImmutableList<RelDataTypeField> fields;

  //对于每个InputRef，根据Mapping更新index
  @Override public RexNode visitInputRef(RexInputRef local) {
    final int index = local.getIndex();
    int target = mapping.getTarget(index);
    return new RexInputRef(
        target,
        local.getType());
  }

最后，构建新的join，

    RexNode newBottomCondition =
        RexUtil.composeConjunction(rexBuilder, newBottomList);

    final Join newBottomJoin =
        bottomJoin.copy(bottomJoin.getTraitSet(), newBottomCondition, relB,
            relC, JoinRelType.INNER, false);

    // Condition for newTopJoin consists of pieces from bottomJoin and topJoin.
    // Field ordinals do not need to be changed.
    RexNode newTopCondition = RexUtil.composeConjunction(rexBuilder, top);
    @SuppressWarnings("SuspiciousNameCombination")
    final Join newTopJoin =
        topJoin.copy(topJoin.getTraitSet(), newTopCondition, relA,
            newBottomJoin, JoinRelType.INNER, false);

    call.transformTo(newTopJoin);

Calcite分析 - Rule

猜你喜欢