ChainMapper

通过ChainMapper可以将多个map类合并成一个map任务

  1. ChainMapper<LongWritable, Text, Text, NullWritable> chain = new ChainMapper<~>();
  2. chain.addMapper(job, readWordsMapper.class, LongWritable.class, Text.class, Text.class, NullWritable.class, conf);
  3. chain.addMapper(job, addPreMapper.class, Text.class, NullWritable.class, Text.class, NullWritable.class, conf);

JobControl

一个稍复杂点的处理逻辑往往需要多个mapreduce程序串联处理,多job的串联可以借助mapreduce框架的JobControl实现

  1. 我们可以用shell脚本,根据状态返回,来决定下一步的shell执行还是不执行
  2. 可以设置多个job他们的依赖关系
  1. ControlledJob cJob1 = new ControlledJob(job1.getConfiguration());
  2. ControlledJob cJob2 = new ControlledJob(job2.getConfiguration());
  3. ControlledJob cJob3 = new ControlledJob(job3.getConfiguration());
  4. cJob1.setJob(job1);
  5. cJob2.setJob(job2);
  6. cJob3.setJob(job3);
  7. // 设置作业依赖关系,job2执行依赖job1,job3依赖job2
  8. cJob2.addDependingJob(cJob1);
  9. cJob3.addDependingJob(cJob2);
  10. //设置JobControl,里面放一个组名
  11. JobControl jobControl = new JobControl("RecommendationJob");
  12. jobControl.addJob(cJob1);
  13. jobControl.addJob(cJob2);
  14. jobControl.addJob(cJob3);
  15. // 新建一个线程来运行已加入JobControl中的作业,开始进程并等待结束
  16. Thread jobControlThread = new Thread(jobControl);
  17. jobControlThread.start();
  18. //判断是不是已经finish了,没有finish就继续执行
  19. while (!jobControl.allFinished()) {
  20. Thread.sleep(500);
  21. }
  22. jobControl.stop();
  23. return 0;