Apache Doris Profile&Explain详解

一、简述

Apache Doris中运行EXPLAIN + SQL就可以得到SQL对应的Query Plan,再结合Apche Doris的Profile可以了解Doris是如何处理SQL语句,用于分析查询语句或是结构的性能瓶颈,从而帮助选择更好的索引和写出更优化的查询语句。

二、Plan分析

2.1 sql准备

tpcds query96.sql为例

explain
-- explain graph 生成对应执行计划图表
select  count(*)
from store_sales
    ,household_demographics
    ,time_dim
    , store
where ss_sold_time_sk = time_dim.t_time_sk
    and ss_hdemo_sk = household_demographics.hd_demo_sk
    and ss_store_sk = s_store_sk
    and time_dim.t_hour = 8
    and time_dim.t_minute >= 30
    and household_demographics.hd_dep_count = 5
    and store.s_store_name = 'ese'
order by count(*) limit 100;

2.2 explain结果分析

Query Plan可以分为逻辑执行计划(Logical Query Plan)和物理执行计划(Physical Query Plan),当前讲述的Query Plan默认指逻辑执行计划;tpcds query96.sql对应的Query Plan展示如下。

-- graph 
                                         ┌───────────────┐
                                        │[8: ResultSink]│
                                        │[Fragment: 4]  │
                                        │RESULT SINK    │
                                        └───────────────┘
                                                │
                                                │
                                         ┌─────────────┐
                                         │[8: TOP-N]   │
                                         │[Fragment: 4]│
                                         └─────────────┘
                                                │
                                                │
                               ┌────────────────────────────────┐
                               │[13: AGGREGATE (merge finalize)]│
                               │[Fragment: 4]                   │
                               └────────────────────────────────┘
                                                │
                                                │
                                        ┌──────────────┐
                                        │[12: EXCHANGE]│
                                        │[Fragment: 4] │
                                        └──────────────┘
                                                │
                                                │
                                     ┌────────────────────┐
                                     │[12: DataStreamSink]│
                                     │[Fragment: 0]       │
                                     │STREAM DATA SINK    │
                                     │  EXCHANGE ID: 12   │
                                     │  UNPARTITIONED     │
                                     └────────────────────┘
                                                │
                                                │
                               ┌─────────────────────────────────┐
                               │[7: AGGREGATE (update serialize)]│
                               │[Fragment: 0]                    │
                               └─────────────────────────────────┘
                                                │
                                                │
                                ┌───────────────────────────────┐
                                │[6: HASH JOIN]                 │
                                │[Fragment: 0]                  │
                                │join op: INNER JOIN (BROADCAST)│
                                └───────────────────────────────┘
                                    ┌───────────┴─────────────────────────────────────┐
                                    │                                                 │
                    ┌───────────────────────────────┐                         ┌──────────────┐
                    │[4: HASH JOIN]                 │                         │[11: EXCHANGE]│
                    │[Fragment: 0]                  │                         │[Fragment: 0] │
                    │join op: INNER JOIN (BROADCAST)│                         └──────────────┘
                    └───────────────────────────────┘                                 │
                    ┌───────────────┴─────────────────────┐                           │
                    │                                     │                ┌────────────────────┐
    ┌───────────────────────────────┐             ┌──────────────┐         │[11: DataStreamSink]│
    │[2: HASH JOIN]                 │             │[10: EXCHANGE]│         │[Fragment: 3]       │
    │[Fragment: 0]                  │             │[Fragment: 0] │         │STREAM DATA SINK    │
    │join op: INNER JOIN (BROADCAST)│             └──────────────┘         │  EXCHANGE ID: 11   │
    └───────────────────────────────┘                     │                │  UNPARTITIONED     │
          ┌─────────┴──────────┐                          │                └────────────────────┘
          │                    │               ┌────────────────────┐                ┌┘
┌──────────────────┐    ┌─────────────┐        │[10: DataStreamSink]│                │
│[0: OlapScanNode] │    │[9: EXCHANGE]│        │[Fragment: 2]       │       ┌─────────────────┐
│[Fragment: 0]     │    │[Fragment: 0]│        │STREAM DATA SINK    │       │[5: OlapScanNode]│
│TABLE: store_sales│    └─────────────┘        │  EXCHANGE ID: 10   │       │[Fragment: 3]    │
└──────────────────┘           │               │  UNPARTITIONED     │       │TABLE: store     │
                               │               └────────────────────┘       └─────────────────┘
                     ┌───────────────────┐                │
                     │[9: DataStreamSink]│                │
                     │[Fragment: 1]      │ ┌─────────────────────────────┐
                     │STREAM DATA SINK   │ │[3: OlapScanNode]            │
                     │  EXCHANGE ID: 09  │ │[Fragment: 2]                │
                     │  UNPARTITIONED    │ │TABLE: household_demographics│
                     └───────────────────┘ └─────────────────────────────┘
                               │
                               │
                      ┌─────────────────┐
                      │[1: OlapScanNode]│
                      │[Fragment: 1]    │
                      │TABLE: time_dim  │
                      └─────────────────┘

-- 非graph 
PLAN FRAGMENT 0
 OUTPUT EXPRS:<slot 11> <slot 10> count(*)
  PARTITION: UNPARTITIONED

  RESULT SINK

  8:TOP-N
  |  order by: <slot 11> <slot 10> count(*) ASC
  |  offset: 0
  |  limit: 100
  |  
  13:AGGREGATE (merge finalize)
  |  output: count(<slot 10> count(*))
  |  group by: 
  |  cardinality=-1
  |  
  12:EXCHANGE

PLAN FRAGMENT 1
 OUTPUT EXPRS:
  PARTITION: HASH_PARTITIONED: `default_cluster:tpcds`.`store_sales`.`ss_item_sk`, `default_cluster:tpcds`.`store_sales`.`ss_ticket_number`

  STREAM DATA SINK
    EXCHANGE ID: 12
    UNPARTITIONED

  7:AGGREGATE (update serialize)
  |  output: count(*)
  |  group by: 
  |  cardinality=1
  |  
  6:HASH JOIN
  |  join op: INNER JOIN (BROADCAST)
  |  hash predicates:
  |  colocate: false, reason: Tables are not in the same group
  |  equal join conjunct: `ss_store_sk` = `s_store_sk`
  |  runtime filters: RF000[in] <- `s_store_sk`
  |  cardinality=2880403
  |  
  |----11:EXCHANGE
  |    
  4:HASH JOIN
  |  join op: INNER JOIN (BROADCAST)
  |  hash predicates:
  |  colocate: false, reason: Tables are not in the same group
  |  equal join conjunct: `ss_hdemo_sk` = `household_demographics`.`hd_demo_sk`
  |  runtime filters: RF001[in] <- `household_demographics`.`hd_demo_sk`
  |  cardinality=2880403
  |  
  |----10:EXCHANGE
  |    
  2:HASH JOIN
  |  join op: INNER JOIN (BROADCAST)
  |  hash predicates:
  |  colocate: false, reason: Tables are not in the same group
  |  equal join conjunct: `ss_sold_time_sk` = `time_dim`.`t_time_sk`
  |  runtime filters: RF002[in] <- `time_dim`.`t_time_sk`
  |  cardinality=2880403
  |  
  |----9:EXCHANGE
  |    
  0:OlapScanNode
     TABLE: store_sales
     PREAGGREGATION: OFF. Reason: conjunct on `ss_sold_time_sk` which is StorageEngine value column
     PREDICATES: `default_cluster:tpcds.store_sales`.`__DORIS_DELETE_SIGN__` = 0
     runtime filters: RF000[in] -> `ss_store_sk`, RF001[in] -> `ss_hdemo_sk`, RF002[in] -> `ss_sold_time_sk`
     partitions=1/1
     rollup: store_sales
     tabletRatio=3/3
     tabletList=20968,20972,20976
     cardinality=2880403
     avgRowSize=67.95811
     numNodes=3

PLAN FRAGMENT 2
 OUTPUT EXPRS:
  PARTITION: HASH_PARTITIONED: `default_cluster:tpcds`.`store`.`s_store_sk`

  STREAM DATA SINK
    EXCHANGE ID: 11
    UNPARTITIONED

  5:OlapScanNode
     TABLE: store
     PREAGGREGATION: OFF. Reason: null
     PREDICATES: `store`.`s_store_name` = 'ese', `default_cluster:tpcds.store`.`__DORIS_DELETE_SIGN__` = 0
     partitions=1/1
     rollup: store
     tabletRatio=3/3
     tabletList=20773,20777,20781
     cardinality=23
     avgRowSize=1798.8695
     numNodes=3

PLAN FRAGMENT 3
 OUTPUT EXPRS:
  PARTITION: HASH_PARTITIONED: `default_cluster:tpcds`.`household_demographics`.`hd_demo_sk`

  STREAM DATA SINK
    EXCHANGE ID: 10
    UNPARTITIONED

  3:OlapScanNode
     TABLE: household_demographics
     PREAGGREGATION: OFF. Reason: null
     PREDICATES: `household_demographics`.`hd_dep_count` = 5, `default_cluster:tpcds.household_demographics`.`__DORIS_DELETE_SIGN__` = 0
     partitions=1/1
     rollup: household_demographics
     tabletRatio=3/3
     tabletList=20848,20852,20856
     cardinality=14399
     avgRowSize=2.8781166
     numNodes=3

PLAN FRAGMENT 4
 OUTPUT EXPRS:
  PARTITION: HASH_PARTITIONED: `default_cluster:tpcds`.`time_dim`.`t_time_sk`

  STREAM DATA SINK
    EXCHANGE ID: 09
    UNPARTITIONED

  1:OlapScanNode
     TABLE: time_dim
     PREAGGREGATION: OFF. Reason: null
     PREDICATES: `time_dim`.`t_hour` = 8, `time_dim`.`t_minute` >= 30, `default_cluster:tpcds.time_dim`.`__DORIS_DELETE_SIGN__` = 0
     partitions=1/1
     rollup: time_dim
     tabletRatio=3/3
     tabletList=20713,20717,20721
     cardinality=172799
     avgRowSize=11.671202
     numNodes=3

2.2.1 常见属性说明

Colocate Join 适合几张表按照相同字段分桶,并高频根据相同字段 Join 的场景,比如电商的不少应用都按照商家 Id 分桶,并高频按照商家 Id 进行 Join。

在这里插入图片描述

2.2.2 plan分析

在这里插入图片描述

  • Query96的Query Plan分为五个Plan Fragment,编号从0~4

  • 分析Query Plan可以采用自底向上的方式进行,逐个进行分析

  • 最底部的Plan Fragment为Fragment 4分析

    • 主要负责扫描time_dim表,并提前执行相关查询条件,即谓词下推
    • 对于聚合表(Aggregate Key),doris会根据不同查询选择是否开启PREAGGREGATION,上图中time_dim的预聚合为关闭状态,关闭状态之下会读取time_dim的全部维度列,当表中维度列多的时候,这个可能会成为影响性能的一个关键因素
    • 如果time_dim表有选择Range Partition进行数据划分,Query Plan中的partitions会表征查询命中几个分区,无关分区被自动过滤会有效减少扫描数据量
    • 如果有物化视图,doris会根据查询去自动选择物化视图,如果没有物化视图,那么查询自动命中base table,也就是上图中展示的rollup: time_dim,可参考doris测试物化视图
    • 当time_dim数据扫描完成之后,Fragment 4的执行过程也就随之结束,此时它将扫描得到的数据传递给你其它Fragment,EXCHANGE ID : 09表示数据传递给了标号为9的接收节点,可通过graph查看
  • 对于Query96的Query Plan而言,Fragment 2, 3, 4功能类似,只是负责扫描的表不同;具体到查询中的Order/Aggregation/Join算子,都在Fragment 1中进行,着重分析Fragment 1

    • Fragment 1集成了三个Join算子的执行,采用默认的BROADCAST方式进行执行,也就是小表向大表广播的方式进行,如果两个Join的表都是大表,建议采用SHUFFLE的方式进行
    • 目前doris只支持HASH JOIN,也就是采用哈希算法进行Join
    • 其中有一个colocate字段,这个用来表述两张Join表采用同样的分区/分桶方式,如此执行Join的过程中可以直接在本地执行,不用进行数据的移动
    • Join执行完成之后,就是执行上成的Aggregation, Order by和TOP-N的算子

三、Doris-Profile简述

可通过8030页面的QueryProfile模块查看任务执行详情,以下为query96.sql实际执行的QueryProfile部分内容,各指标名详情可参考:Apache Doris查询分析

Query:
    Summary:
          -  Query  ID:  7dd4ba245012441c-b0aadbed39f80f20
          -  Start  Time:  2022-04-15  15:52:22
          -  End  Time:  2022-04-15  15:52:22
          -  Total:  611ms
          -  Query  Type:  Query
          -  Query  State:  EOF
          -  Doris  Version:  0.15.0-rc04
          -  User:  root
          -  Default  Db:  default_cluster:tpcds
          -  Sql  Statement:  /*  ApplicationName=DBeaver  Enterprise  7.0.0  -  SQLEditor  <20220321常用命令-doris.sql>  */  select    count(*)
from  store_sales
        ,household_demographics
        ,time_dim
        ,  store
where  ss_sold_time_sk  =  time_dim.t_time_sk
        and  ss_hdemo_sk  =  household_demographics.hd_demo_sk
        and  ss_store_sk  =  s_store_sk
        and  time_dim.t_hour  =  8
        and  time_dim.t_minute  >=  30
        and  household_demographics.hd_dep_count  =  5
        and  store.s_store_name  =  'ese'
order  by  count(*)  limit  100
          -  Is  Cached:  No
        Execution  Summary:
              -  Analysis  Time:  636.648us
              -  Plan  Time:  19.230ms
              -  Schedule  Time:  125.121ms
              -  Wait  and  Fetch  Result  Time:  466.30ms
    Execution  Profile  7dd4ba245012441c-b0aadbed39f80f20:(Active:  611.44ms,  %  non-child:  100.00%)
        Fragment  0:
            Instance  7dd4ba245012441c-b0aadbed39f80f2d  (host=TNetworkAddress(hostname:10.192.119.70,  port:9060)):(Active:  586.950ms,  %  non-child:  0.00%)
                  -  FragmentCpuTime:  756.962us
                  -  MemoryLimit:  2.00  GB
                  -  PeakMemoryUsage:  48.01  KB
                  -  PeakReservation:  0.00  
                  -  PeakUsedReservation:  0.00  
                  -  RowsProduced:  1
                BlockMgr:
                      -  BlockWritesOutstanding:  0
                      -  BlocksCreated:  0
                      -  BlocksRecycled:  0
                      -  BufferedPins:  0
                      -  BytesWritten:  0.00  
                      -  MaxBlockSize:  8.00  MB
                      -  TotalBufferWaitTime:  0ns
                      -  TotalEncryptionTime:  0ns
                      -  TotalIntegrityCheckTime:  0ns
                      -  TotalReadBlockTime:  0ns
                DataBufferSender  (dst_fragment_instance_id=7dd4ba245012441c-b0aadbed39f80f2d):
                      -  AppendBatchTime:  124.481us
                          -  ResultSendTime:  119.257us
                          -  TupleConvertTime:  4.217us
                      -  NumSentRows:  1
                SORT_NODE  (id=8):(Active:  587.36ms,  %  non-child:  0.01%)
                      -  PeakMemoryUsage:  16.00  KB
                      -  RowsReturned:  1
                      -  RowsReturnedRate:  1
                    AGGREGATION_NODE  (id=13):(Active:  586.958ms,  %  non-child:  0.10%)
                          -  Probe  Method:  HashTable  Linear  Probing
                          -  BuildTime:  10.533us
                          -  GetResultsTime:  0ns
                          -  HTResize:  0
                          -  HTResizeTime:  0ns
                          -  HashBuckets:  0
                          -  HashCollisions:  0
                          -  HashFailedProbe:  0
                          -  HashFilledBuckets:  0
                          -  HashProbe:  0
                          -  HashTravelLength:  0
                          -  LargestPartitionPercent:  0
                          -  MaxPartitionLevel:  0
                          -  NumRepartitions:  0
                          -  PartitionsCreated:  0
                          -  PeakMemoryUsage:  28.00  KB
                          -  RowsProcessed:  0
                          -  RowsRepartitioned:  0
                          -  RowsReturned:  1
                          -  RowsReturnedRate:  1
                          -  SpilledPartitions:  0
                        EXCHANGE_NODE  (id=12):(Active:  586.364ms,  %  non-child:  95.96%)
                              -  BytesReceived:  32.00  B
                              -  ConvertRowBatchTime:  7.320us
                              -  DataArrivalWaitTime:  586.282ms
                              -  DeserializeRowBatchTimer:  22.637us
                              -  FirstBatchArrivalWaitTime:  349.530ms
                              -  PeakMemoryUsage:  12.01  KB
                              -  RowsReturned:  3
                              -  RowsReturnedRate:  5
                              -  SendersBlockedTotalTimer(*):  0ns
        Fragment  1:
            Instance  7dd4ba245012441c-b0aadbed39f80f23  (host=TNetworkAddress(hostname:10.192.119.68,  port:9060)):(Active:  472.511ms,  %  non-child:  0.10%)
                  -  FragmentCpuTime:  5.714ms
                  -  MemoryLimit:  2.00  GB
                  -  PeakMemoryUsage:  610.00  KB
                  -  PeakReservation:  0.00  
                  -  PeakUsedReservation:  0.00  
                  -  RowsProduced:  1
                BlockMgr:
                      -  BlockWritesOutstanding:  0
                      -  BlocksCreated:  0
                      -  BlocksRecycled:  0
                      -  BufferedPins:  0
                      -  BytesWritten:  0.00  
                      -  MaxBlockSize:  8.00  MB
                      -  TotalBufferWaitTime:  0ns
                      -  TotalEncryptionTime:  0ns
                      -  TotalIntegrityCheckTime:  0ns
                      -  TotalReadBlockTime:  0ns
                DataStreamSender  (dst_id=12,  dst_fragments=[7dd4ba245012441c-b0aadbed39f80f2d]):(Active:  186.357us,  %  non-child:  0.03%)
                      -  BytesSent:  16.00  B
                      -  IgnoreRows:  0
                      -  LocalBytesSent:  0.00  
                      -  OverallThroughput:  83.84375  KB/sec
                      -  PeakMemoryUsage:  16.00  KB
                      -  SerializeBatchTime:  7.0us
                      -  UncompressedRowBatchSize:  16.00  B
                AGGREGATION_NODE  (id=7):(Active:  471.713ms,  %  non-child:  0.14%)
                      -  Probe  Method:  HashTable  Linear  Probing
                      -  BuildTime:  45.223us
                      -  GetResultsTime:  0ns
                      -  HTResize:  0
                      -  HTResizeTime:  0ns
                      -  HashBuckets:  0
                      -  HashCollisions:  0
                      -  HashFailedProbe:  0
                      -  HashFilledBuckets:  0
                      -  HashProbe:  0
                      -  HashTravelLength:  0
                      -  LargestPartitionPercent:  0
                      -  MaxPartitionLevel:  0
                      -  NumRepartitions:  0
                      -  PartitionsCreated:  0
                      -  PeakMemoryUsage:  280.00  KB
                      -  RowsProcessed:  0
                      -  RowsRepartitioned:  0
                      -  RowsReturned:  1
                      -  RowsReturnedRate:  2
                      -  SpilledPartitions:  0
                    HASH_JOIN_NODE  (id=6):(Active:  470.881ms,  %  non-child:  0.08%)
                          -  ExecOption:  Hash  Table  Built  Asynchronously
                          -  BuildBuckets:  1.024K  (1024)
                          -  BuildRows:  1
                          -  BuildTime:  1.129ms
                          -  HashTableMaxList:  1
                          -  HashTableMinList:  1
                          -  LoadFactor:  4562146422526312400.00
                          -  PeakMemoryUsage:  308.00  KB
                          -  ProbeRows:  341
                          -  ProbeTime:  34.697us
                          -  PushDownComputeTime:  156.171us
                          -  PushDownTime:  4.423us
                          -  RowsReturned:  341
                          -  RowsReturnedRate:  724
  • Active:表示该节点(包含其所有子节点)的执行时间
  • BuildTime:扫描右表并构建hash表的时间
  • ProbeTime:获取左表并搜索hashtable进行匹配并输出的时间

猜你喜欢

转载自blog.csdn.net/ith321/article/details/132006439