1.压测命令
测试写性能
hadoop jar /export/server/hadoop2.7/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 128MB
测试读性能
hadoop jar /export/server/hadoop2.7/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 128MB
删除测试生成数据
hadoop jar /export/server/hadoop2.7/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5-tests.jar TestDFSIO -clean
2.Hadoop参数调优
HDFS参数调优hdfs-site.xml
dfs.namenode.handler.count=20 * log2(Cluster Size),比如集群规模为8台时,此参数设置为60
The number of Namenode RPC server threads that listen to requests from clients. If dfs.namenode.servicerpc-address is not configured then Namenode RPC server threads listen to requests from all nodes.
NameNode有一个工作线程池,用来处理不同DataNode的并发心跳以及客户端并发的元数据操作。对于大集群或者有大 量客户端的集群来说,通常需要增大参数dfs.namenode.handler.count的默认值10。设置该值的一般原则是将其设 置为集群大小的自然对数乘以20,即20logN,N为集群大小。
YARN参数调优yarn-site.xml:
yarn.nodemanager.resource.memory-mb表示该节点上YARN可使用的物理内存总量,默认是8192(MB)、
yarn.scheduler.maximum-allocation-mb 个任务可申请的最多物理内存量,默认是8192(MB)