hive.optimize.skewjoin:hive 程序使用不同的mr job处理倾斜的key和正常的key,然后将处理的结果uion all。
对于倾斜的key,hive程序是使用map join进行处理的。开启hive.optimize.skewjoin=true参数,只对inner join有效,对于left join、right join、full join是无效的。
GenMRSkewJoinProcessor.processSkewJoin()关键代码:
if (!GenMRSkewJoinProcessor.skewJoinEnabled(parseCtx.getConf(), joinOp)) {return;}
GenMRSkewJoinProcessor.skewJoinEnabled()代码:
public static boolean skewJoinEnabled(HiveConf conf, JoinOperator joinOp) {if (conf != null && !conf.getBoolVar(HiveConf.ConfVars.HIVESKEWJOIN)) {return false;}//不支持outer joinif (!joinOp.getConf().isNoOuterJoin()) {return false;}byte pos = 0;for (Byte tag : joinOp.getConf().getTagOrder()) {if (tag != pos) {return false;}pos++;}return true;}
