Spark优化包含non deterministic Filter的条件下推

优化代码

优化代码,参考: CombineFilters Rule

val applyLocally: PartialFunction[LogicalPlan, LogicalPlan] = {
    
    
    // The query execution/optimization does not guarantee the expressions are evaluated in order.
    // We only can combine them if and only if both are deterministic.
    case Filter(fc, nf @ Filter(nc, grandChild)) if nc.deterministic =>
      val (combineCandidates, nonDeterministic) =
        splitConjunctivePredicates(fc).partition(_.deterministic)
      val mergedFilter = (ExpressionSet(combineCandidates) --
        ExpressionSet(splitConjunctivePredicates(nc))).reduceOption(And) match {
    
    
        case Some(ac) =>
          Filter(And(nc, ac), grandChild)
        case None =>
          nf
      }
      nonDeterministic.reduceOption(And).map(c => Filter(c, mergedFilter)).getOrElse(mergedFilter)
  }

实现原理

两个相邻的Filter,拆解出上层Filter中的deterministic的expresses,然后合并到下层的Filter中

Demo

优化前

Filter ((a#0 = 7) AND (rand(10) > 0.1))
+- Filter NOT a#0 IN (1,3,5)
   +- LocalRelation <empty>, [a#0, b#1, c#2]

优化后

Filter (rand(10) > 0.1)
+- Filter (NOT a#0 IN (1,3,5) AND (a#0 = 7))
   +- LocalRelation <empty>, [a#0, b#1, c#2]

猜你喜欢

转载自blog.csdn.net/wankunde/article/details/116717997