开始前准备

将airlinedata复制到虚拟机
image.png

hdfs dfs -mkdir -p /user/hdfs/airlinedata
hdfs dfs -mkdir -p /user/hdfs/sampledata
hdfs dfs -mkdir -p /user/hdfs/masterdata
image.png
cd /home/hadoop/proHadoop/
hdfs dfs -copyFromLocal airlinedata/ /user/hdfs/airlinedata/
hdfs dfs -copyFromLocal sampledata/
/user/hdfs/sampledata/
hdfs dfs -copyFromLocal masterdata/* /user/hdfs/masterdata/

hdfs dfs -ls /user/hdfs/airlinedata
image.png
hdfs dfs -ls /user/hdfs/sampledata
image.png

hdfs dfs -ls /user/hdfs/masterdata
image.png
image.png

TestSelect

1.启动eclipse

root用户转hadoop用户
[root@test ~]# su - hadoop
使用hadoop用户运行以下命令:
[hadoop@test ~]$ eclipse
如果出现以下错误:

  1. [hadoop@test ~]$ eclipse
  2. Eclipse: Cannot open display:
  3. org.eclipse.m2e.logback.configuration: The org.eclipse.m2e.logback.configuration bundle was activated before the state location was initialized. Will retry after the state location is initialized.
  4. Eclipse: Cannot open display:
  5. Eclipse:
  6. An error has occurred. See the log file
  7. /opt/eclipse/configuration/1588190319670.log.

上述错误没有解决
直接到/opt/eclipse中点击应用开启
image.png
eclipse过了一段时间自己可以运行了,具体原因不明 。 ̄□ ̄||

2.建立TestSelect项目

image.pngimage.png
image.png

3.创建org.apress.prohadoop.c5包和org.apress.prohadoop.utils包

image.png
image.png
image.png

4.将所需的输入文件拷贝过来,并且创建输出文件夹

输入文件位置是:/TestSelect/devairlinedataset/input/txt
image.png
image.png
将已有的select文件夹删除
输出文件夹是:/TestSelect/output/c5
image.png

5.新建SelectClauseMRJob类和AirlineDataUtils类

image.png
内容为:

package org.apress.prohadoop.c5;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apress.prohadoop.utils.AirlineDataUtils;


public class SelectClauseMRJob extends Configured implements Tool {
    public static class SelectClauseMapper extends
            Mapper<LongWritable, Text, NullWritable, Text> {
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            if (!AirlineDataUtils.isHeader(value)) {
                StringBuilder output = AirlineDataUtils.mergeStringArray(
                        AirlineDataUtils.getSelectResultsPerRow(value), ",");
                context.write(NullWritable.get(), new Text(output.toString()));
            }
        }
    }
    public int run(String[] allArgs) throws Exception {
        Job job = Job.getInstance(getConf());
        job.setJarByClass(SelectClauseMRJob.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(Text.class);
        job.setMapperClass(SelectClauseMapper.class);
        job.setNumReduceTasks(0);
        String[] args = new GenericOptionsParser(getConf(), allArgs)
                .getRemainingArgs();
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        boolean status = job.waitForCompletion(true);
        if (status) {
            return 0;
        } else {
            return 1;
        }
    }
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        ToolRunner.run(new SelectClauseMRJob(), args);
    }
}

3.本机运行

image.png
输入参数
/home/hadoop/eclipse-workspace/TestSelect/devairlinedataset/input/txt/ /home/hadoop/eclipse-workspace/TestSelect/output/c5/select/
image.png
image.png
运行成功结束image.png
查看结果
刷新output
image.png
image.png
image.png

4.集群环境运行

将java项目打成jar包

image.png
image.png
选一个jar包存放的位置,并且起一个名字
存放的位置/home/hadoop
jar名字:TestSelect.jar
image.png
image.png
image.png

运行jar

hdfs dfs -ls /user/hdfs
image.png

cd /home/hadoop
hadoop jar TestSelect.jar org.apress.prohadoop.c5.SelectClauseMRJob /user/hdfs/sampledata /user/hdfs/output/c5/select

出现
Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s)
解决:没有启动yarn
start-yarn.sh
hadoop jar TestSelect.jar org.apress.prohadoop.c5.SelectClauseMRJob /user/hdfs/sampledata /user/hdfs/output/c5/select

运行结果
hdfs dfs -ls /user/hdfs/output/c5/select
image.png
hdfs dfs -cat /user/hdfs/output/c5/select/part-m-00000
image.png
hdfs dfs -cat /user/hdfs/output/c5/select/part-m-00001
image.png

TestWhere

在TestSelect基础上将WhereClauseMRJob.java拷贝过去
image.png
打成jar包,过程同TestSelect
jar包命名TestWhere.jar
存在/home/hadoop下
image.png
运行
hadoop jar TestWhere.jar org.apress.prohadoop.c5.WhereClauseMRJob -D map.where.delay=10 /user/hdfs/sampledata /user/hdfs/output/c5/where
image.png

TestSUM

hadoop jar TestSUM.jar org.apress.prohadoop.c5.AggregationMRJob /user/hdfs/sampledata /user/hdfs/output/c5/aggregation
image.png
image.png

TestSUMWithCombiner

hadoop jar TestSUMWithCombiner.jar org.apress.prohadoop.c5.AggregationWithCombinerMRJob /user/hdfs/sampledata /user/hdfs/output/c5/combineraggregation
image.png

image.png

TestSplitByMonth

hadoop jar TestSplitByMonth.jar org.apress.prohadoop.c5.SplitByMonthMRJob /user/hdfs/sampledata /user/hdfs/output/c5/partitioner
image.png
image.png