关于Hive执行计划简述

一般执行计划有两个部分:
stage dependencies 各个stage之间的依赖性
stage plan 各个stage的执行计划
一个stage并不一定是一个MR,有可能是Fetch Operator,也有可能是Move Operator。
一个MR的执行计划分为两个部分:
Map Operator Tree MAP端的执行计划
Reduce Operator Tree Reduce端的执行计划
一些常见的Operator:
TableScan 读取数据,常见的属性 alias
Select Operator 选取操作
Group By Operator 分组聚合, 常见的属性 aggregations、mode , 当没有keys属性时只有一个分组。
Reduce Output Operator 输出结果给Reduce , 常见的属性 sort order
Fetch Operator 客户端获取数据 , 常见属性 limit
常见的属性的取值及含义:
aggregations 用在Group By Operator中
count()计数
mode 用在Group By Operator中
hash:map端的预聚合
mergepartial:在reduce端的聚合
final
sort order 用于Reduce Output Operator中
+ 正序排序
不排序
++按两列正序排序,如果有两列
+- 正反排序,如果有两列
-反向排序
如此类推

  1. EXPLAIN
  2. FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;

stage依赖图:

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        src   #map输入扫描的表
            Reduce Output Operator  #shuffle write详情
              key expressions:
                    expr: key
                    type: string
              sort order: +
              Map-reduce partition columns:
                    expr: rand()
                    type: double
              tag: -1
              value expressions:
                    expr: substr(value, 4)
                    type: string
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: sum(UDFToDouble(VALUE.0))
          keys:
                expr: KEY.0
                type: string
          mode: partial1
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
                name: binary_table

  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        /tmp/hive-zshao/67494501/106593589.10001
          Reduce Output Operator
            key expressions:
                  expr: 0
                  type: string
            sort order: +
            Map-reduce partition columns:
                  expr: 0
                  type: string
            tag: -1
            value expressions:
                  expr: 1
                  type: double
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: sum(VALUE.0)
          keys:
                expr: KEY.0
                type: string
          mode: final
          Select Operator
            expressions:
                  expr: 0
                  type: string
                  expr: 1
                  type: double
            Select Operator
              expressions:
                    expr: UDFToInteger(0)
                    type: int
                    expr: 1
                    type: double
              File Output Operator
                compressed: false
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                    name: dest_g1

  Stage: Stage-0
    Move Operator
      tables:
            replace: true
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                name: dest_g1

``