ChainMapper
通过ChainMapper可以将多个map类合并成一个map任务
ChainMapper<LongWritable, Text, Text, NullWritable> chain = new ChainMapper<~>();
chain.addMapper(job, readWordsMapper.class, LongWritable.class, Text.class, Text.class, NullWritable.class, conf);
chain.addMapper(job, addPreMapper.class, Text.class, NullWritable.class, Text.class, NullWritable.class, conf);
JobControl
一个稍复杂点的处理逻辑往往需要多个mapreduce程序串联处理,多job的串联可以借助mapreduce框架的JobControl实现
- 我们可以用shell脚本,根据状态返回,来决定下一步的shell执行还是不执行
- 可以设置多个job他们的依赖关系
ControlledJob cJob1 = new ControlledJob(job1.getConfiguration());
ControlledJob cJob2 = new ControlledJob(job2.getConfiguration());
ControlledJob cJob3 = new ControlledJob(job3.getConfiguration());
cJob1.setJob(job1);
cJob2.setJob(job2);
cJob3.setJob(job3);
// 设置作业依赖关系,job2执行依赖job1,job3依赖job2
cJob2.addDependingJob(cJob1);
cJob3.addDependingJob(cJob2);
//设置JobControl,里面放一个组名
JobControl jobControl = new JobControl("RecommendationJob");
jobControl.addJob(cJob1);
jobControl.addJob(cJob2);
jobControl.addJob(cJob3);
// 新建一个线程来运行已加入JobControl中的作业,开始进程并等待结束
Thread jobControlThread = new Thread(jobControl);
jobControlThread.start();
//判断是不是已经finish了,没有finish就继续执行
while (!jobControl.allFinished()) {
Thread.sleep(500);
}
jobControl.stop();
return 0;