TEZ UI issue
tez.history.logging.service.class
org.apache.tez.dag.history.logging.ats.ATSV15HistoryLoggingService 改为
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService(TEZ UI页面有些项目无法项目)
参数调节
hive.execution.engine=tez;
更换执行引擎
hive.auto.convert.join=true;
开启mapjoin
hive.auto.convert.join.noconditionaltask=true;
小数据集多个mapjoin合并
hive.mapjoin.smalltable.filesize=25000000;
小数据集多个mapjoin合并(触发条件)
hive.auto.convert.join.noconditionaltask.size=60000000;
30% of hive.tez.container.size 也受限制与Hadoop maximum Java heap size大小和文件是否压缩(orc file 得除以10,对于mapjoin out of memory 减低值)
hive.tez.container.size=4096
tez.am.resource.memory.mb=8192
设置不能太小,am container 不停full gc,最后报错终止
tez.runtime.io.sort.mb=512
40% of hive.tez.container.size
tez.runtime.unordered.output.buffer.size-mb=400
10% of hive.tez.container.size
tez.grouping.min-size
数据分片的大小(最小值),控制map数
tez.grouping.max-size
数据分片的大小(最大值),控制map数
tez.session.am.dag.submit.timeout.secs=10
am等待DAG提交的超时时间,dacp 一个流程的多个任务 使用不用的session 所以这个值应该设置的较小,避免资源浪费
hive.tez.auto.reducer.parallelism=true;
hive.exec.reducers.bytes.per.reducer
256000000,可以适当调小
hive.tez.min.partition.factor=0.05;
可以让下探到最低,小与最大reducer 1099,可以适当调小
hive.tez.max.partition.factor=2.0;
Hive/ Tez estimates number of reducers
Max(1, Min(hive.exec.reducers.max [1099], ReducerStage estimate/hive.exec.reducers.bytes.per.reducer)) x hive.tez.max.partition.factor [2],可以适当调大
hive.prewarm.enabled
hive.prewarm.numcontainers
每个am启动使用预启动container数 ,eg:3, 一个session打开将启动4个container
tez.shuffle-vertex-manager.min-src-fraction=0.25;
tez.shuffle-vertex-manager.max-src-fraction=0.75;
This indicates that the decision will be made between 25% of mappers finishing and 75% of mappers finishing, provided there’s at least 1Gb of data being output (i.e if 25% of mappers don’t send 1Gb of data, we will wait till at least 1Gb is sent out)
tez.am.container.idle.release-timeout-min.millis
tez.am.container.idle.release-timeout-max.millis
Int value. The maximum amount of time to hold on to a container if no task can be assigned to it imm