基于Hadoop 3.3.0
特别提醒: 以下所有涉及到修改配置的操作,修改主节点配置后,都需要将配置同步到集群内的其他节点中。

1. Hadoop 自测试工具介绍

进入 {HADOOP_HOME} /share/hadoop/mapreduce 可以看到如下文件

  1. [root@master mapreduce]# ls
  2. hadoop-mapreduce-client-app-3.3.0.jar hadoop-mapreduce-client-nativetask-3.3.0.jar
  3. hadoop-mapreduce-client-common-3.3.0.jar hadoop-mapreduce-client-shuffle-3.3.0.jar
  4. hadoop-mapreduce-client-core-3.3.0.jar hadoop-mapreduce-client-uploader-3.3.0.jar
  5. hadoop-mapreduce-client-hs-3.3.0.jar hadoop-mapreduce-examples-3.3.0.jar
  6. hadoop-mapreduce-client-hs-plugins-3.3.0.jar jdiff
  7. hadoop-mapreduce-client-jobclient-3.3.0.jar lib-examples
  8. hadoop-mapreduce-client-jobclient-3.3.0-tests.jar sources

其中 hadoop-mapreduce-client-jobclient-3.3.0-tests.jar 是hdfs自带的测试工具。

[root@master mapreduce]#  hadoop jar hadoop-mapreduce-client-jobclient-3.3.0-tests.jar -help
Unknown program '-help' chosen.
Valid program names are:
  DFSCIOTest: Distributed i/o benchmark of libhdfs.
  DistributedFSCheck: Distributed checkup of the file system consistency.
  JHLogAnalyzer: Job History Log analyzer.
  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
  NNdataGenerator: Generate the data to be used by NNloadGenerator
  NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
  NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
  NNstructureGenerator: Generate the structure to be used by NNdataGenerator
  SliveTest: HDFS Stress Test and Live Data Verification.
  TestDFSIO: Distributed i/o benchmark.
  fail: a job that always fails
  filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
  gsleep: A sleep job whose mappers create 1MB buffer for every record.
  largesorter: Large-Sort tester
  loadgen: Generic map/reduce load generator
  mapredtest: A map/reduce test check.
  minicluster: Single process HDFS and MR cluster.
  mrbench: A map/reduce benchmark that can create many small jobs
  nnbench: A benchmark that stresses the namenode w/ MR.
  nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
  sleep: A job that sleeps at each map and reduce task.
  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
  testfilesystem: A test for FileSystem read/write.
  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
  testsequencefile: A test for flat files of binary key value pairs.
  testsequencefileinputformat: A test for sequence file input format.
  testtextinputformat: A test for text input format.
  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
  timelineperformance: A job that launches mappers to test timeline service performance.

其中 TestDFSIO 可以进行IO性能测试:

[root@master mapreduce]#  hadoop jar hadoop-mapreduce-client-jobclient-3.3.0-tests.jar TestDFSIO
2021-01-12 14:58:30,010 INFO fs.TestDFSIO: TestDFSIO.1.8
Missing arguments.
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-storagePolicy storagePolicyName] [-erasureCodePolicy erasureCodePolicyName]

2. 写测试

参考文章: https://help.aliyun.com/document_detail/134127.html

hadoop jar hadoop-mapreduce-client-jobclient-3.3.0-tests.jar TestDFSIO -write -nrFiles 20 -size 50MB

执行结果如下:

2021-01-13 11:34:34,916 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
2021-01-13 11:34:34,916 INFO fs.TestDFSIO:             Date & time: Wed Jan 13 11:34:34 CST 2021
2021-01-13 11:34:34,916 INFO fs.TestDFSIO:         Number of files: 20
2021-01-13 11:34:34,916 INFO fs.TestDFSIO:  Total MBytes processed: 4000
2021-01-13 11:34:34,916 INFO fs.TestDFSIO:       Throughput mb/sec: 4.45
2021-01-13 11:34:34,916 INFO fs.TestDFSIO:  Average IO rate mb/sec: 6.58
2021-01-13 11:34:34,916 INFO fs.TestDFSIO:   IO rate std deviation: 5.09
2021-01-13 11:34:34,917 INFO fs.TestDFSIO:      Test exec time sec: 123.87
2021-01-13 11:34:34,917 INFO fs.TestDFSIO:

如果前期Hadoop安装时没有配置好,会出现以下问题:

2.1 junit找不到

执行测试命令,提示如下错误:

Caused by: java.lang.ClassNotFoundException: junit.framework.TestCase
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

找不到junit依赖jar包,本文是将junit的jar包放到了 {HADOOP_HOME}/share/hadoop/common/ 路径下

[root@master mapreduce]# cd /opt/hadoop/hadoop-3.3.0/share/hadoop/common/
[root@master common]# ls
hadoop-common-3.3.0.jar hadoop-kms-3.3.0.jar  hadoop-registry-3.3.0.jar  junit-4.13.jar sources hadoop-common-3.3.0-tests.jar  hadoop-nfs-3.3.0.jar  jdiff               lib             webapps

2.2 无法加载主类 mapreduce.v2.app.MRAppMaster

错误提示:

[2021-01-12 14:44:48.713]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster

需要在yarn-site.xml中配置classspath

2.2.1 获取Hadoop的classpath

[root@master mapreduce]# hadoop classpath
/opt/hadoop/hadoop-3.3.0/etc/hadoop:/opt/hadoop/hadoop-3.3.0/share/hadoop/common/lib/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/common/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/hdfs:/opt/hadoop/hadoop-3.3.0/share/hadoop/hdfs/lib/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/hdfs/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/mapreduce/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/yarn:/opt/hadoop/hadoop-3.3.0/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/yarn/*

2.2.2 修改yarn-site.xml

进入 {HADOOP_HOME}/etc/hadoop/ ,修改yarn-site.xml文件
新增如下配置,中的值就是上面命令获取的值。

<property>
    <name>yarn.application.classpath</name>       
        <value>命令`hadoop classpath`得到的值</value>
</property>

2.2.3 重启yarn服务即可

注意,修改配置文件后需要同步到其他节点上

2.3 The auxService:mapreduce_shuffle does not exist

2021-03-08 11:05:51,095 INFO mapreduce.Job: Task Id : attempt_1615172676650_0001_m_000001_0, Status : FAILED
Container launch failed for container_1615172676650_0001_01_000003 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateExceptionImpl(SerializedExceptionPBImpl.java:171)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:182)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:163)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

2021-03-08 11:05:51,120 INFO mapreduce.Job: Task Id : attempt_1615172676650_0001_m_000000_0, Status : FAILED
Container launch failed for container_1615172676650_0001_01_000002 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateExceptionImpl(SerializedExceptionPBImpl.java:171)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:182)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:163)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

解决办法:在yarn-site-xml中补充以下配置:

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

3. 读测试

读测试前,应完成写测试。
命令:

hadoop jar hadoop-mapreduce-client-jobclient-3.3.0-tests.jar TestDFSIO -read -nrFiles 20 -size 50MB

结果:

2021-01-13 11:52:29,836 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
2021-01-13 11:52:29,836 INFO fs.TestDFSIO:             Date & time: Wed Jan 13 11:52:29 CST 2021
2021-01-13 11:52:29,836 INFO fs.TestDFSIO:         Number of files: 1
2021-01-13 11:52:29,836 INFO fs.TestDFSIO:  Total MBytes processed: 300
2021-01-13 11:52:29,837 INFO fs.TestDFSIO:       Throughput mb/sec: 702.58
2021-01-13 11:52:29,837 INFO fs.TestDFSIO:  Average IO rate mb/sec: 702.58
2021-01-13 11:52:29,837 INFO fs.TestDFSIO:   IO rate std deviation: 0.06
2021-01-13 11:52:29,837 INFO fs.TestDFSIO:      Test exec time sec: 20.78
2021-01-13 11:52:29,837 INFO fs.TestDFSIO:

4. 清除测试结果

hadoop jar hadoop-mapreduce-client-jobclient-3.3.0-tests.jar TestDFSIO -clean