基于Hadoop 3.3.0
特别提醒: 以下所有涉及到修改配置的操作,修改主节点配置后,都需要将配置同步到集群内的其他节点中。
1. Hadoop 自测试工具介绍
进入 {HADOOP_HOME} /share/hadoop/mapreduce 可以看到如下文件
[root@master mapreduce]# lshadoop-mapreduce-client-app-3.3.0.jar hadoop-mapreduce-client-nativetask-3.3.0.jarhadoop-mapreduce-client-common-3.3.0.jar hadoop-mapreduce-client-shuffle-3.3.0.jarhadoop-mapreduce-client-core-3.3.0.jar hadoop-mapreduce-client-uploader-3.3.0.jarhadoop-mapreduce-client-hs-3.3.0.jar hadoop-mapreduce-examples-3.3.0.jarhadoop-mapreduce-client-hs-plugins-3.3.0.jar jdiffhadoop-mapreduce-client-jobclient-3.3.0.jar lib-exampleshadoop-mapreduce-client-jobclient-3.3.0-tests.jar sources
其中 hadoop-mapreduce-client-jobclient-3.3.0-tests.jar 是hdfs自带的测试工具。
[root@master mapreduce]# hadoop jar hadoop-mapreduce-client-jobclient-3.3.0-tests.jar -help
Unknown program '-help' chosen.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
gsleep: A sleep job whose mappers create 1MB buffer for every record.
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode w/ MR.
nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
timelineperformance: A job that launches mappers to test timeline service performance.
其中 TestDFSIO 可以进行IO性能测试:
[root@master mapreduce]# hadoop jar hadoop-mapreduce-client-jobclient-3.3.0-tests.jar TestDFSIO
2021-01-12 14:58:30,010 INFO fs.TestDFSIO: TestDFSIO.1.8
Missing arguments.
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-storagePolicy storagePolicyName] [-erasureCodePolicy erasureCodePolicyName]
2. 写测试
参考文章: https://help.aliyun.com/document_detail/134127.html
hadoop jar hadoop-mapreduce-client-jobclient-3.3.0-tests.jar TestDFSIO -write -nrFiles 20 -size 50MB
执行结果如下:
2021-01-13 11:34:34,916 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
2021-01-13 11:34:34,916 INFO fs.TestDFSIO: Date & time: Wed Jan 13 11:34:34 CST 2021
2021-01-13 11:34:34,916 INFO fs.TestDFSIO: Number of files: 20
2021-01-13 11:34:34,916 INFO fs.TestDFSIO: Total MBytes processed: 4000
2021-01-13 11:34:34,916 INFO fs.TestDFSIO: Throughput mb/sec: 4.45
2021-01-13 11:34:34,916 INFO fs.TestDFSIO: Average IO rate mb/sec: 6.58
2021-01-13 11:34:34,916 INFO fs.TestDFSIO: IO rate std deviation: 5.09
2021-01-13 11:34:34,917 INFO fs.TestDFSIO: Test exec time sec: 123.87
2021-01-13 11:34:34,917 INFO fs.TestDFSIO:
2.1 junit找不到
执行测试命令,提示如下错误:
Caused by: java.lang.ClassNotFoundException: junit.framework.TestCase
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
找不到junit依赖jar包,本文是将junit的jar包放到了 {HADOOP_HOME}/share/hadoop/common/ 路径下
[root@master mapreduce]# cd /opt/hadoop/hadoop-3.3.0/share/hadoop/common/
[root@master common]# ls
hadoop-common-3.3.0.jar hadoop-kms-3.3.0.jar hadoop-registry-3.3.0.jar junit-4.13.jar sources hadoop-common-3.3.0-tests.jar hadoop-nfs-3.3.0.jar jdiff lib webapps
2.2 无法加载主类 mapreduce.v2.app.MRAppMaster
错误提示:
[2021-01-12 14:44:48.713]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
2.2.1 获取Hadoop的classpath
[root@master mapreduce]# hadoop classpath
/opt/hadoop/hadoop-3.3.0/etc/hadoop:/opt/hadoop/hadoop-3.3.0/share/hadoop/common/lib/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/common/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/hdfs:/opt/hadoop/hadoop-3.3.0/share/hadoop/hdfs/lib/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/hdfs/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/mapreduce/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/yarn:/opt/hadoop/hadoop-3.3.0/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-3.3.0/share/hadoop/yarn/*
2.2.2 修改yarn-site.xml
进入 {HADOOP_HOME}/etc/hadoop/ ,修改yarn-site.xml文件
新增如下配置,
<property>
<name>yarn.application.classpath</name>
<value>命令`hadoop classpath`得到的值</value>
</property>
2.2.3 重启yarn服务即可
注意,修改配置文件后需要同步到其他节点上
2.3 The auxService:mapreduce_shuffle does not exist
2021-03-08 11:05:51,095 INFO mapreduce.Job: Task Id : attempt_1615172676650_0001_m_000001_0, Status : FAILED
Container launch failed for container_1615172676650_0001_01_000003 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateExceptionImpl(SerializedExceptionPBImpl.java:171)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:182)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:163)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-03-08 11:05:51,120 INFO mapreduce.Job: Task Id : attempt_1615172676650_0001_m_000000_0, Status : FAILED
Container launch failed for container_1615172676650_0001_01_000002 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateExceptionImpl(SerializedExceptionPBImpl.java:171)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:182)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:163)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
解决办法:在yarn-site-xml中补充以下配置:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
3. 读测试
读测试前,应完成写测试。
命令:
hadoop jar hadoop-mapreduce-client-jobclient-3.3.0-tests.jar TestDFSIO -read -nrFiles 20 -size 50MB
结果:
2021-01-13 11:52:29,836 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
2021-01-13 11:52:29,836 INFO fs.TestDFSIO: Date & time: Wed Jan 13 11:52:29 CST 2021
2021-01-13 11:52:29,836 INFO fs.TestDFSIO: Number of files: 1
2021-01-13 11:52:29,836 INFO fs.TestDFSIO: Total MBytes processed: 300
2021-01-13 11:52:29,837 INFO fs.TestDFSIO: Throughput mb/sec: 702.58
2021-01-13 11:52:29,837 INFO fs.TestDFSIO: Average IO rate mb/sec: 702.58
2021-01-13 11:52:29,837 INFO fs.TestDFSIO: IO rate std deviation: 0.06
2021-01-13 11:52:29,837 INFO fs.TestDFSIO: Test exec time sec: 20.78
2021-01-13 11:52:29,837 INFO fs.TestDFSIO:
4. 清除测试结果
hadoop jar hadoop-mapreduce-client-jobclient-3.3.0-tests.jar TestDFSIO -clean
