十三、combiner局部聚合 - 《hadoop》

combiner与reducer的联系
combiner编程步骤如下

combiner与reducer的联系

combiner和reducer的父类都是reducer
combiner是在每一个maptask节点上运行
reducer是接收所有mapper输出的结果

combiner编程步骤如下

一、继承reducer类，重写reduce方法
二、在启动类设置combiner组件

/**
 * 局部聚合
 *      在发送给reduce之前进行map端的单机局部聚合，减少IO网络传输
 *      如 A B C 三台机械
 *      A机械 hello 1 hello 1 world 1  hello 1  ---> hello 3 world 1
 *      B机械 world 1 world 1 world 1  hello 1  ---> hello 1 world 3
 *      C机械 hello 1 world 1 hello 1 world 1   ---> hello 2 world 2
 */
public class WordCountCombiner extends Reducer<Text, IntWritable, Text, IntWritable> {
    //与reducer代码一样
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int count = 0;
        for(IntWritable v :values){
            count = count + v.get();
        }
        context.write(key, new IntWritable(count));
    }
}
注意：在启动类中设置combiner组件
 //设置combiner组件, combiner不用重写，因为与reduce逻辑一样，直接复用即可
 //使用局部聚合不要影响业务结果，比如求平均值就不适合
 //job.setCombinerClass(WordCountCombiner.class);
 job.setCombinerClass(WordcountReducer.class);