ChainMapper
通过ChainMapper可以将多个map类合并成一个map任务
ChainMapper<LongWritable, Text, Text, NullWritable> chain = new ChainMapper<~>();chain.addMapper(job, readWordsMapper.class, LongWritable.class, Text.class, Text.class, NullWritable.class, conf);chain.addMapper(job, addPreMapper.class, Text.class, NullWritable.class, Text.class, NullWritable.class, conf);
JobControl
一个稍复杂点的处理逻辑往往需要多个mapreduce程序串联处理,多job的串联可以借助mapreduce框架的JobControl实现
- 我们可以用shell脚本,根据状态返回,来决定下一步的shell执行还是不执行
- 可以设置多个job他们的依赖关系
ControlledJob cJob1 = new ControlledJob(job1.getConfiguration());ControlledJob cJob2 = new ControlledJob(job2.getConfiguration());ControlledJob cJob3 = new ControlledJob(job3.getConfiguration());cJob1.setJob(job1);cJob2.setJob(job2);cJob3.setJob(job3);// 设置作业依赖关系,job2执行依赖job1,job3依赖job2cJob2.addDependingJob(cJob1);cJob3.addDependingJob(cJob2);//设置JobControl,里面放一个组名JobControl jobControl = new JobControl("RecommendationJob");jobControl.addJob(cJob1);jobControl.addJob(cJob2);jobControl.addJob(cJob3);// 新建一个线程来运行已加入JobControl中的作业,开始进程并等待结束Thread jobControlThread = new Thread(jobControl);jobControlThread.start();//判断是不是已经finish了,没有finish就继续执行while (!jobControl.allFinished()) {Thread.sleep(500);}jobControl.stop();return 0;
