关于Hive执行计划简述
一般执行计划有两个部分:
stage dependencies 各个stage之间的依赖性
stage plan 各个stage的执行计划
一个stage并不一定是一个MR,有可能是Fetch Operator,也有可能是Move Operator。
一个MR的执行计划分为两个部分:
Map Operator Tree MAP端的执行计划
Reduce Operator Tree Reduce端的执行计划
一些常见的Operator:
TableScan 读取数据,常见的属性 alias
Select Operator 选取操作
Group By Operator 分组聚合, 常见的属性 aggregations、mode , 当没有keys属性时只有一个分组。
Reduce Output Operator 输出结果给Reduce , 常见的属性 sort order
Fetch Operator 客户端获取数据 , 常见属性 limit
常见的属性的取值及含义:
aggregations 用在Group By Operator中
count()计数
mode 用在Group By Operator中
hash:map端的预聚合
mergepartial:在reduce端的聚合
final
sort order 用于Reduce Output Operator中
+ 正序排序
不排序
++按两列正序排序,如果有两列
+- 正反排序,如果有两列
-反向排序
如此类推
EXPLAINFROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;
stage依赖图:
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
src #map输入扫描的表
Reduce Output Operator #shuffle write详情
key expressions:
expr: key
type: string
sort order: +
Map-reduce partition columns:
expr: rand()
type: double
tag: -1
value expressions:
expr: substr(value, 4)
type: string
Reduce Operator Tree:
Group By Operator
aggregations:
expr: sum(UDFToDouble(VALUE.0))
keys:
expr: KEY.0
type: string
mode: partial1
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
name: binary_table
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
/tmp/hive-zshao/67494501/106593589.10001
Reduce Output Operator
key expressions:
expr: 0
type: string
sort order: +
Map-reduce partition columns:
expr: 0
type: string
tag: -1
value expressions:
expr: 1
type: double
Reduce Operator Tree:
Group By Operator
aggregations:
expr: sum(VALUE.0)
keys:
expr: KEY.0
type: string
mode: final
Select Operator
expressions:
expr: 0
type: string
expr: 1
type: double
Select Operator
expressions:
expr: UDFToInteger(0)
type: int
expr: 1
type: double
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
name: dest_g1
Stage: Stage-0
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
name: dest_g1
``
