spark源码分析:catalyst 草稿

object Optimizer extends RuleExecutor[LogicalPlan] {
  val batches =
    Batch("ConstantFolding", Once,
      ConstantFolding,
      BooleanSimplification,
      SimplifyFilters,

      SimplifyCasts) ::
    Batch("Filter Pushdown", Once,
      CombineFilters,
      PushPredicateThroughProject,
      PushPredicateThroughInnerJoin,
      ColumnPruning) :: Nil
}

SimplifyFilters

object SimplifyFilters extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    case Filter(Literal(true, BooleanType), child) =>
      child
    case Filter(Literal(null, _), child) =>
      LocalRelation(child.output)
    case Filter(Literal(false, BooleanType), child) =>
      LocalRelation(child.output)
  }
}
起到削减一些逻辑判断,直接返回child或者child.output的作用,那么这些Literal(true, BooleanType)之类的模式是从哪里来的呢?查看Optimizer 的batches 可以发现,是SimplifyFilters前面的batch:BooleanSimplification,在这里面形成的


SQLContext.createSchemaRDD(RDD<A>, TypeTag<A>) line: 90
BaiJoin$.main(String[]) line: 26
BaiJoin.main(String[]) line: not available

看这句:SQLContext.createSchemaRDD(RDD<A>, TypeTag<A>)
当时的断点停在new SchemaRDD这一句:
  implicit def createSchemaRDD[A <: Product: TypeTag](rdd: RDD[A]) =
    new SchemaRDD(this, SparkLogicalPlan(ExistingRdd.fromProductRdd(rdd)))
当时的varible界面里有这样一个变量:evidence$1 TypeTags$TypeTagImpl<T>  (id=107)
它的值是 TypeTag[com.ailk.test.sql.tb],所以可以近似认为:A就是com.ailk.test.sql.tb(一个case class类型)
rdd则是:MappedRDD[2] at map at BaiJoin.scala:16
             MappedRDD[1] at textFile at BaiJoin.scala:16
                 HadoopRDD[0] at textFile at BaiJoin.scala:16
                
def fromProductRdd[A <: Product : TypeTag](productRdd: RDD[A]) = {
    ExistingRdd(ScalaReflection.attributesFor[A], productToRowRdd(productRdd))
  }
把A里面,所有的item都取出来,成为一个列表,就是com.ailk.test.sql.tb定义的所有列
可见ScalaReflection.attributesFor[A]的结果是一个Seq[Attribute],它的excute就是返回一个RDD[Row]
case class ExistingRdd(output: Seq[Attribute], rdd: RDD[Row]) extends LeafNode {
  override def execute() = rdd
}
输入是RDD[A],输出是RDD[Row]
  def productToRowRdd[A <: Product](data: RDD[A]): RDD[Row] = {
    data.mapPartitions { iterator =>
      if (iterator.isEmpty) {
        Iterator.empty
      } else {
        val bufferedIterator = iterator.buffered
        val mutableRow = new GenericMutableRow(bufferedIterator.head.productArity)

        bufferedIterator.map { r =>
          var i = 0
          while (i < mutableRow.length) {
            mutableRow(i) = r.productElement(i)
            i += 1
          }

          mutableRow
        }
      }
    }
  }

/////////////////////////////////////////////////////////////////////
heap jit-Compiler gc
dfs3
申请内存的操作必须是原子操作   线程的模式:tlab--为每个线程来  freeList  Bumpthepointer
复制算法
s0和s1复制的是eden中存活的对象
标记清除算法---内存碎片
标记压缩算法----内存拷贝比较严重

root的选择:class  thread  stacklocal   jnilocal  monitor  “held by jvm”
dfs3 标记法

猜你喜欢

转载自baishuo491.iteye.com/blog/2061964