Apach Dril的Logical Plan的详解2

原创，转载请注明出处

原文：
http://www.confusedcoders.com/bigdata/apache-drill/understanding-apache-drill-logical-plan

Apache Drill在内部会产生2个plan：一个是Logical Plan一个是Physical plan。
输入的query可以是SQL 2003的sql或者DrQL，或者是MongoQL，查询语句会首先被转成Logical plan，然后Drill会使用它的优化规则来对logical plan进行优化，最终产出physical plan，physical plan就是最后由execute engin执行的计划。
该过程流程如下：

Logical plan使用一种语言无关的方式描述了数据流。也就是说输入的查询不会依赖于查询使用的语言。Logical plan不关注优化。所以这一特性使得logical plan会比传统查询语句要冗杂。但是呢，这一特性的好处就是能允许利用一个自定义的高级查询语言（defining higher-level query language）实现很强的灵活性。Logical plan会被移交给优化器产生physical plan。physical plan是一个有data steam operator组成的DAG（有向无环图）。

Sample Logical Plan
博客的主人自己定义一个math函数（Contribute to Apache Drill: Implementing Math Functions.http://www.confusedcoders.com/bigdata/apache-drill/how-to-contribute-to-apache-drill-implementing-drill-math-functions
），并给出了该函数产生的logical plan的样例。
-----------------------------------------------------------------------------
{
       head: {
         type: "APACHE_DRILL_LOGICAL",
         version: "1",
         generator: {
            type: "manual",
            info: "na"
         }
       },

      storage: {
         console: {type:"console"},
         fs1: {type:"fs", root:"file:///"},
         cp: {type:"classpath"}
       },

      query: [
         {
          @id: 1,
          op: "scan",
          memo: "initial_scan",
          ref: "employees",
          storageengine: "cp",
          selection: {
              path: "/employees.json",
              type: "JSON"
              }
          },

         {
          op : "project",
          @id : 2,
          input : 1,
          projections : [
                 {
                  ref : "output.ceil",
                  expr : "ceil(1.7)"
                 },
                 {
                  ref : "output.floor",
                  expr : "floor(1.7)"
                 }
             ]
          },

         {
          input: 2,
          op: "store",
          memo: "output sink",
          storageengine: "console",
          target: {pipe: "STD_OUT"}
          }
     ]
}
-----------------------------------------------------------------------------

说明：
在该logical plan当中，有
　　Head
　　Storage
　　Query
三个部分的东东。
Head 部分的内容，是非常容易懂的。他是手动创建的。
Storage中定义了三种存储引擎。console, fs1 & cp，然后我们需要使用console来做作为我们查询中的引擎，我想该句的意思能够显示查询结果。

Query节点的内容就是会在dril中执行的查询的logical plan。该部分实际上由对data的操作的集合组成。
   Scan：第一部分是从数据文件employees.json中读取文件，这是Drill中一种查询操作。
　　Project：project operator会将data进行转换，例子当中，project operator的输入是上面scan的输出，所以你能看到project的input：1，这个1就等于scan的操作ID。另外一个值得注意的是，是projection的feild部分
这又是一部分需要运用到数据上面的转换操作的集合。Ref 标签中，标识了projection操作的输出的名称，expr标签，表明了实际的运算规则。
　　Store：最后的一个组件是storage operation。他运用一开始定义的storage部分的console作为转储查询输出内容的存储引擎，也就是说直接显示查询结果。通过查看Store的字段input:2，则表示引用了store的输入为projection的输出。

Apach Dril的Logical Plan的详解2

猜你喜欢