摘要:本文详细记载hadoop-2.6.0-cdh5.7.0在生产中HA集群部署流程,可用于学习以及生产环境部署借鉴参考

@[toc]

1.环境需求以及部署规划

1.1 硬件环境

三台阿里云主机、每台2vcore、4G内存。

1.2 软件环境:

组件名称 组件版本
Hadoop Hadoop-2.6.0-cdh5.7.0
Zookeeper Zookeeper-3.4.5
jdk Jdk-8u45-linux-x64

1.3 进程部署规划图:

主机名称 ZK NN ZkFC JN DN RM(ZKFC) NM
Hadoop001 1 1 1 1 1 1 1
Hadoop002 1 1 1 1 1 1 1
Hadoop003 1 0 0 1 1 0 1

注意:1.、1表示部署在该主机上部署相应的进程,0表示不部署

2.Hadoop Ha架构剖析

2.1 HDFS HA架构详解

请参考:https://blog.csdn.net/qq_32641659/article/details/88964464

2.2 YARN HA架构详解

请参考:https://blog.csdn.net/qq_32641659/article/details/88965006

3.HA部署流程

3.1 上传相关安装包

安装包百度网盘地址:

  1. 安装包百度网盘地址:
  2. 链接:https://pan.baidu.com/s/1NfOv2ODV9ktKXM8zfaofzQ
  3. 提取码:mgwr
  4. 复制这段内容后打开百度网盘手机App,操作更方便哦

添加用户以及上传安装包:

  1. #####三台机器时执行如下命令########
  2. useradd hadoop
  3. su - hadoop
  4. mkdir app soft lib source data
  5. exit
  6. yum install -y lrzsz #安装lrzsz软件
  7. su - hadoop
  8. cd ~/soft/
  9. rz #上传安装包,先上传到hadoop001,个人测试xftp传输速度大于rz
  10. #scp,将安装包传到另外两台机器,注意使用是内网ip
  11. scp -r ~/soft/* root@172.19.121.241:/home/hadoop/soft
  12. scp -r ~/soft/* root@172.19.121.242:/home/hadoop/soft
  13. [hadoop@hadoop001 soft]$ ll
  14. total 490792
  15. -rw-r--r-- 1 root root 311585484 Apr 3 15:52 hadoop-2.6.0-cdh5.7.0.tar.gz
  16. -rw-r--r-- 1 root root 173271626 Apr 3 15:49 jdk-8u45-linux-x64.gz
  17. -rw-r--r-- 1 root root 17699306 Apr 3 15:50 zookeeper-3.4.6.tar.gz

3.2 关闭防火墙

  1. ##三台机器都需要执行如下命令
  2. #清空防火墙规则
  3. [root@hadoop001 ~]# iptables -F
  4. [root@hadoop001 ~]# iptables -L
  5. #永久关闭防火墙
  6. [root@hadoop001 ~]# service iptables stop
  7. [root@hadoop001 ~]# chkconfig iptables off
  8. [root@hadoop001 ~]# service iptables status
  9. iptables: Firewall is not running.

3.3 配置host文件

三台机器配置相同的host文件,如下(只列举了hadoop001):

  1. #采坑1:第一第二行的内容永远不要自作聪明去改动,不然后面会遇坑的
  2. [root@hadoop001 ~]# cat /etc/hosts
  3. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
  4. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  5. 172.19.121.243 hadoop001 hadoop001
  6. 172.19.121.241 hadoop002 hadoop002
  7. 172.19.121.242 hadoop003 hadoop003
  8. [root@hadoop001 ~]# ping hadoop001
  9. [root@hadoop001 ~]# ping hadoop002
  10. [root@hadoop001 ~]# ping hadoop003

3.4 配置SSH免密码通信

三台机器各自生成秘钥:

  1. [root@hadoop001 ~]# su - hadoop
  2. [hadoop@hadoop001 ~]$ rm -rf ./.ssh
  3. [hadoop@hadoop001 ~]$ ssh-keygen #连续生产四个回车
  4. [hadoop@hadoop001 ~]$ cd ~/.ssh
  5. [hadoop@hadoop001 .ssh]$ ll
  6. total 8
  7. -rw------- 1 hadoop hadoop 1675 Apr 3 16:26 id_rsa
  8. -rw-r--r-- 1 hadoop hadoop 398 Apr 3 16:26 id_rsa.pub

合成公钥(注意命令操作的机器)

  1. [hadoop@hadoop001 .ssh]$ cat id_rsa.pub >>authorized_keys
  2. [hadoop@hadoop002 .ssh]$ scp -r ~/.ssh/id_rsa.pub root@172.19.121.243:/home/hadoop/.ssh/id_rsa2
  3. [hadoop@hadoop003 .ssh]$ scp -r ~/.ssh/id_rsa.pub root@172.19.121.243:/home/hadoop/.ssh/id_rsa3
  4. [hadoop@hadoop001 .ssh]$ ll
  5. total 20
  6. -rw-rw-r-- 1 hadoop hadoop 398 Apr 3 16:37 authorized_keys
  7. -rw------- 1 hadoop hadoop 1675 Apr 3 16:37 id_rsa
  8. -rw-r--r-- 1 root root 398 Apr 3 16:38 id_rsa2
  9. -rw-r--r-- 1 root root 398 Apr 3 16:38 id_rsa3
  10. -rw-r--r-- 1 hadoop hadoop 398 Apr 3 16:37 id_rsa.pub
  11. [hadoop@hadoop001 .ssh]$ cat ./id_rsa2 >> authorized_keys
  12. [hadoop@hadoop001 .ssh]$ cat ./id_rsa3 >> authorized_keys
  13. [hadoop@hadoop001 .ssh]$ cat authorized_keys
  14. ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAwZuESml5aeRFyAmZPhzh0WG3waHqGChV4SHWBkjHrkcisLpqpXXotEn0Ap1yWuPYCUKNLIgyLD8tSubnLyj5nNdOXPYnzSyTw0NVIKzKkhLqrYMnpTrckodGjwkhSlaZbIRngBHGB7cUOW8AaWeA79UzEydr1/8Q/arizt82R/K8+t0SAIsk1MUu7+oUGJAzPXpNU76pq69ARb/hJUs0xRMMjOFetqrp8dh8pHoBjgcgUX+fyc5FB/dqJlaCXNJDmNtWclOo8flprB27qj4+1jfCs78wU6AAfewQqo4jJ/2NoD527Vu/SDGysQdlsKpSYBygLB1+/oR46sH1iUJTew== hadoop@hadoop001
  15. ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAykZ7nWRo+dmiMuaTALybK1S7XI/pgZgbpTmQAw3IIC1CwFWVZIRuF8eSCL4wgj16pKbKcfczN/9aYhOq0zsUgaa8LlzI6D2DKU1hzak43dCFcnNM/lBkF3QrkE0m9jfM6wmVozdflvRiM+GygEhydfbWSpJcMmPCmV+scRUFjRuH0AuWlwm7sRBxXbK3w4PpWfMF0ie4ZEbviO4PK+E3BxL4xT93N3fELF0s1ayK0mHOfDGBEkFBRp5vIVU//puFU0pW/2/db/laiA8xO1kHLPaFRwVl/I17yNkGUJjF0goeavtVMkxwckd5FsqFIdVecPZ5ReyObbasjbQlvL4uFQ== hadoop@hadoop002
  16. ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAw5v6nMHGJmzVHgC1gg/3QbP8qT2ljoBYcS9WaMdSNUjG/WVfvcRWSA1KlACwjG+8RmlHZkR4OTVAIBlMPMDObhjXK6J4hKGicINNsfB+E0etPczDneFCxZHwf9UQ/7J8g/KoAdmE+ROUWKzdw+q2QOcY5Yhbn7FSzF28CK826HPi5L6WXQlBolvlI4x6hn7vscwqpI7cu2YFLkp2bk5lEoatXShSxHi2MTxoyqrtuSpYZhybuExfDjDOPOXX0zpP/Gj7cUHTRuJrUtqiq+G71L+BhmD5cIsTwguBEXrWF+lsXOXTx2TyBXtc7kbvArE6XKee2sjshE52Kn7ko6ZhtQ== hadoop@hadoop003
  17. [hadoop@hadoop001 .ssh]$ rm -rf id_rsa2 id_rsa3
  18. [hadoop@hadoop001 .ssh]$ scp -r ~/.ssh/authorized_keys root@172.19.121.241:/home/hadoop/.ssh/
  19. [hadoop@hadoop001 .ssh]$ scp -r ~/.ssh/authorized_keys root@172.19.121.242:/home/hadoop/.ssh/
  20. #很重要,若authorized_keys属于非root用户必须将权限设置为600
  21. [hadoop@hadoop001 ~]$ chmod 600 ./.ssh/authorized_keys

互相ssh免秘钥测试,用户第一次ssh会有确认选项

  1. 规则:ssh 远程机器执行date命令,不需要输入密码则,则ssh免密码配置成功
  2. [hadoop@hadoop001 ~]$ ssh hadoop001 date
  3. [hadoop@hadoop001 ~]$ ssh hadoop002 date
  4. [hadoop@hadoop001 ~]$ ssh hadoop003 date
  5. [hadoop@hadoop002 ~]$ ssh hadoop001 date
  6. [hadoop@hadoop002 ~]$ ssh hadoop002 date
  7. [hadoop@hadoop002 ~]$ ssh hadoop003 date
  8. [hadoop@hadoop003 ~]$ ssh hadoop001 date
  9. [hadoop@hadoop003 ~]$ ssh hadoop002 date
  10. [hadoop@hadoop003 ~]$ ssh hadoop003 date

3.5 部署JDK

三台机器同时执行如下命令

  1. #采坑1: 必须为/usr/java/,该目录是cdh默认的jdk目录,若不为该目录,后面一定会采坑。
  2. [root@hadoop003 ~]# mkdir /usr/java/
  3. [root@hadoop001 ~]# tar -zxvf /home/hadoop/soft/jdk-8u45-linux-x64.gz -C /usr/java/
  4. #采坑2:权限必须变更,jdk解压的所属用户很奇怪,后续使用中可能会报类找不到错误
  5. [root@hadoop001 ~]# chown -R root:root /usr/java

配置JDK环境变量

  1. [root@hadoop001 ~]# vim /etc/profile #追加如下两行配置
  2. export JAVA_HOME=/usr/java/jdk1.8.0_45
  3. export PATH=$JAVA_HOME/bin:$PATH
  4. [root@hadoop001 ~]# source /etc/profile #更新环境变量文件
  5. [root@hadoop001 ~]# java -version
  6. java version "1.8.0_45"
  7. Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
  8. Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
  9. [root@hadoop001 ~]# which java
  10. /usr/java/jdk1.8.0_45/bin/java

3.6 部署ZK集群

解压ZK安装包

  1. [root@hadoop001 ~]$ su -hadoop
  2. [hadoop@hadoop001 ~]$ tar -zxvf ~/soft/zookeeper-3.4.6.tar.gz -C ~/app/
  3. [hadoop@hadoop001 ~]$ ln -s ~/app/zookeeper-3.4.6 ~/app/zookeeper

添加环境变量

  1. #编辑hadoop用户环境变量文件添加如下内容
  2. [hadoop@hadoop001 bin]$ vim ~/.bash_profile
  3. export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper
  4. export PATH=$ZOOKEEPER_HOME/bin:$PATH
  5. [hadoop@hadoop001 bin]$ source ~/.bash_profile
  6. [hadoop@hadoop001 bin]$ which zkServer.sh
  7. ~/app/zookeeper/bin/zkServer.sh

修改zookeeper配置

  1. [hadoop@hadoop001 conf]$ mkdir ~/data/zkdata/data
  2. [hadoop@hadoop001 app]$ cd ~/app/zookeeper/conf/
  3. [hadoop@hadoop001 conf]$ cp zoo_sample.cfg zoo.cfg
  4. #添加或修改如下配置
  5. [hadoop@hadoop001 conf]$ vim zoo.cfg
  6. dataDir=/home/hadoop/data/zkdata/data
  7. server.1=hadoop001:2888:3888
  8. server.2=hadoop002:2888:3888
  9. server.3=hadoop003:2888:3888
  10. #在数据目录创建myid文件,并将标识1传入
  11. [hadoop@hadoop001 conf]$ cd ~/data/zkdata/data/
  12. [hadoop@hadoop001 data]$ echo 1 >myid
  13. #将配置文件复制一份到hadoop002、hadoop003
  14. [hadoop@hadoop001 data]$ scp ~/app/zookeeper/conf/zoo.cfg hadoop002:~/app/zookeeper/conf/
  15. [hadoop@hadoop001 data]$ scp ~/app/zookeeper/conf/zoo.cfg hadoop003:~/app/zookeeper/conf/
  16. [hadoop@hadoop001 data]$ scp ~/data/zkdata/data/myid hadoop002:~/data/zkdata/data/
  17. [hadoop@hadoop001 data]$ scp ~/data/zkdata/data/myid hadoop003:~/data/zkdata/data/
  18. #更改hadoop002、hadoop003的myid文件,将标识改为如下内容
  19. [hadoop@hadoop002 ~]$ cat ~/data/zkdata/data/myid
  20. 2
  21. [hadoop@hadoop003 ~]$ cat ~/data/zkdata/data/myid
  22. 3

启动zk集群,三台集群都需要执行如下命令:

  1. [hadoop@hadoop001 data]$ cd ~/app/zookeeper/bin
  2. [hadoop@hadoop001 bin]$ ./zkServer.sh start

查询ZK集群状态:

  1. #查询zk节点状态
  2. [hadoop@hadoop003 bin]$ ./zkServer.sh status
  3. #查看QuorumPeerMain进程是否启动
  4. [hadoop@hadoop002 bin]$ jps -l
  5. 3026 org.apache.zookeeper.server.quorum.QuorumPeerMain

若发现集群状态异常,异常的报错以及解决方法如下:

  1. #异常信息
  2. [hadoop@hadoop003 bin]$ ./zkServer.sh status
  3. JMX enabled by default
  4. Using config: /home/hadoop/app/zookeeper/bin/../conf/status
  5. grep: /home/hadoop/app/zookeeper/bin/../conf/status: No such file or directory
  6. mkdir: cannot create directory `': No such file or directory
  7. Starting zookeeper ... ./zkServer.sh: line 113: /zookeeper_server.pid: Permission denied
  8. FAILED TO WRITE PID
  9. ###查询日志,观察详细的错误信息
  10. #寻找日志文件,日志文件名称是通过搜索启动脚本发现的
  11. [hadoop@hadoop001 bin]$ find /home/hadoop -name "zookeeper.out"
  12. /home/hadoop/app/zookeeper-3.4.6/bin/zookeeper.out
  13. [hadoop@hadoop001 bin]$ vim /home/hadoop/app/zookeeper-3.4.6/bin/zookeeper.out
  14. 2019-04-03 22:23:55,976 [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /home/hadoop/app/zookeeper/bin/../conf/status
  15. 2019-04-03 22:23:55,979 [myid:] - ERROR [main:QuorumPeerMain@85] - Invalid config, exiting abnormally
  16. org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /home/hadoop/app/zookeeper/bin/../conf/status
  17. at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:123)
  18. at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:101)
  19. at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
  20. Caused by: java.lang.IllegalArgumentException: /home/hadoop/app/zookeeper/bin/../conf/status file is missing
  21. at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:107)
  22. ... 2 more
  23. Invalid config, exiting abnormally
  24. #简单分析日志,发现读取的配置文件竟然是/home/hadoop/app/zookeeper/bin/../conf/status文件,很是奇怪(可能是我一开始没有配置环境变量的原因),重启 集群。 发现一切正常。hadoop002 是lead节点,其它为follower节点
  25. [hadoop@hadoop002 bin]$ ./zkServer.sh status
  26. JMX enabled by default
  27. Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
  28. Mode: leader

3.6 部署HADOOP HA集群

解压并添加环境变量,三台机器同时执行

  1. [hadoop@hadoop001 bin]$ tar -zxvf ~/soft/hadoop-2.6.0-cdh5.7.0.tar.gz -C ~/app
  2. [hadoop@hadoop001 bin]$ ln -s ~/app/hadoop-2.6.0-cdh5.7.0 ~/app/hadoop
  3. [hadoop@hadoop001 bin]$ vim ~/.bash_profile
  4. [hadoop@hadoop001 bin]$ cat ~/.bash_profile #添加或修改为如下内容
  5. PATH=$PATH:$HOME/bin
  6. export PATH
  7. export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper
  8. export HADOOP_HOME=/home/hadoop/app/hadoop
  9. export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$PATH
  10. [hadoop@hadoop001 bin]$ source ~/.bash_profile #更新环境变量

创建数据目录,三台机器同时执行

  1. [hadoop@hadoop001 ~]$ mkdir -p ~/app/hadoop-2.6.0-cdh5.7.0/tmp #创建临时目录,由core-site.xml文件配置hadoop.tmp.dir所配置
  2. [hadoop@hadoop003 ~]$ mkdir -p /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/name #创建hdfs的namenode数据(fsimage)目录,由hdfs-site.xml的dfs.namenode.name.dir所配置
  3. [hadoop@hadoop003 ~]$ mkdir -p /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/data #创建hdfs的datanode数据目录,由hdfs-site.xm所配置
  4. [hadoop@hadoop003 ~]$ mkdir -p /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/jn #创建hdfs的journalnode数据目录,由hdfs-site.xm所配置

修改五个配置文件,三台机器同时执行

  1. [hadoop@hadoop003 hadoop]$ cd ~/app/hadoop/etc/hadoop
  2. [hadoop@hadoop003 hadoop]$ rm -rf core-site.xml hdfs-site.xml yarn-site.xml slaves #删除已有的配置文件
  3. [hadoop@hadoop003 hadoop]$ rz
  4. [hadoop@hadoop003 hadoop]$ scp core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml slaves hadoop001:/home/hadoop/app/hadoop/etc/hadoop
  5. [hadoop@hadoop003 hadoop]$ scp core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml slaves hadoop002:/home/hadoop/app/hadoop/etc/hadoop
  6. [hadoop@hadoop003 hadoop]$ cat slaves #注意 这三行与最后一行并不连在一起,采坑
  7. hadoop001
  8. hadoop002
  9. hadoop003
  10. [hadoop@hadoop003 hadoop]$

五个配置文件百度网盘链接如下:

  1. 链接:https://pan.baidu.com/s/1lQCWc62nccn61gHEztSbyg
  2. 提取码:2rgm
  3. 复制这段内容后打开百度网盘手机App,操作更方便哦

core-site.xml配置如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <configuration>
  4. <!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
  5. <property>
  6. <name>fs.defaultFS</name>
  7. <value>hdfs://ruozeclusterg6</value>
  8. </property>
  9. <!--==============================Trash机制======================================= -->
  10. <property>
  11. <!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
  12. <name>fs.trash.checkpoint.interval</name>
  13. <value>0</value>
  14. </property>
  15. <property>
  16. <!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
  17. <name>fs.trash.interval</name>
  18. <value>1440</value>
  19. </property>
  20. <!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
  21. <property>
  22. <name>hadoop.tmp.dir</name>
  23. <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp</value>
  24. </property>
  25. <!-- 指定zookeeper地址 -->
  26. <property>
  27. <name>ha.zookeeper.quorum</name>
  28. <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
  29. </property>
  30. <!--指定ZooKeeper超时间隔,单位毫秒 -->
  31. <property>
  32. <name>ha.zookeeper.session-timeout.ms</name>
  33. <value>2000</value>
  34. </property>
  35. <!--使用hadoop用户以及用户组代理集群上所有的用户用户组,注意必须是进程启动用户 -->
  36. <property>
  37. <name>hadoop.proxyuser.hadoop.hosts</name>
  38. <value>*</value>
  39. </property>
  40. <property>
  41. <name>hadoop.proxyuser.hadoop.groups</name>
  42. <value>*</value>
  43. </property>
  44. <!--设置支持的压缩格式,若不支持,若组件不支持任何压缩格式,应当注销本配置 -->
  45. <!--<property>
  46. <name>io.compression.codecs</name>
  47. <value>org.apache.hadoop.io.compress.GzipCodec,
  48. org.apache.hadoop.io.compress.DefaultCodec,
  49. org.apache.hadoop.io.compress.BZip2Codec,
  50. org.apache.hadoop.io.compress.SnappyCodec
  51. </value>
  52. </property>-->
  53. </configuration>

hdfs-site.xml配置如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <configuration>
  4. <!--HDFS超级用户,必须是启动用户 -->
  5. <property>
  6. <name>dfs.permissions.superusergroup</name>
  7. <value>hadoop</value>
  8. </property>
  9. <!--开启web hdfs -->
  10. <property>
  11. <name>dfs.webhdfs.enabled</name>
  12. <value>true</value>
  13. </property>
  14. <property>
  15. <name>dfs.namenode.name.dir</name>
  16. <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/name</value>
  17. <description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
  18. </property>
  19. <property>
  20. <name>dfs.namenode.edits.dir</name>
  21. <value>${dfs.namenode.name.dir}</value>
  22. <description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
  23. </property>
  24. <property>
  25. <name>dfs.datanode.data.dir</name>
  26. <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/data</value>
  27. <description>datanode存放block本地目录(需要修改)</description>
  28. </property>
  29. <property>
  30. <name>dfs.replication</name>
  31. <value>3</value>
  32. </property>
  33. <!-- 块大小256M (默认128M) -->
  34. <property>
  35. <name>dfs.blocksize</name>
  36. <value>268435456</value>
  37. </property>
  38. <!--======================================================================= -->
  39. <!--HDFS高可用配置 -->
  40. <!--指定hdfs的nameservice为ruozeclusterg6,需要和core-site.xml中的保持一致 -->
  41. <property>
  42. <name>dfs.nameservices</name>
  43. <value>ruozeclusterg6</value>
  44. </property>
  45. <property>
  46. <!--设置NameNode IDs 此版本最大只支持两个NameNode -->
  47. <name>dfs.ha.namenodes.ruozeclusterg6</name>
  48. <value>nn1,nn2</value>
  49. </property>
  50. <!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
  51. <property>
  52. <name>dfs.namenode.rpc-address.ruozeclusterg6.nn1</name>
  53. <value>hadoop001:8020</value>
  54. </property>
  55. <property>
  56. <name>dfs.namenode.rpc-address.ruozeclusterg6.nn2</name>
  57. <value>hadoop002:8020</value>
  58. </property>
  59. <!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
  60. <property>
  61. <name>dfs.namenode.http-address.ruozeclusterg6.nn1</name>
  62. <value>hadoop001:50070</value>
  63. </property>
  64. <property>
  65. <name>dfs.namenode.http-address.ruozeclusterg6.nn2</name>
  66. <value>hadoop002:50070</value>
  67. </property>
  68. <!--==================Namenode editlog同步 ============================================ -->
  69. <!--保证数据恢复 -->
  70. <property>
  71. <name>dfs.journalnode.http-address</name>
  72. <value>0.0.0.0:8480</value>
  73. </property>
  74. <property>
  75. <name>dfs.journalnode.rpc-address</name>
  76. <value>0.0.0.0:8485</value>
  77. </property>
  78. <property>
  79. <!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
  80. <!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
  81. <name>dfs.namenode.shared.edits.dir</name>
  82. <value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/ruozeclusterg6</value>
  83. </property>
  84. <property>
  85. <!--JournalNode存放数据地址 -->
  86. <name>dfs.journalnode.edits.dir</name>
  87. <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/jn</value>
  88. </property>
  89. <!--==================DataNode editlog同步 ============================================ -->
  90. <property>
  91. <!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
  92. <!-- 配置失败自动切换实现方式 -->
  93. <name>dfs.client.failover.proxy.provider.ruozeclusterg6</name>
  94. <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  95. </property>
  96. <!--==================Namenode fencing:=============================================== -->
  97. <!--Failover后防止停掉的Namenode启动,造成两个服务 -->
  98. <property>
  99. <name>dfs.ha.fencing.methods</name>
  100. <value>sshfence</value>
  101. </property>
  102. <property>
  103. <name>dfs.ha.fencing.ssh.private-key-files</name>
  104. <value>/home/hadoop/.ssh/id_rsa</value>
  105. </property>
  106. <property>
  107. <!--多少milliseconds 认为fencing失败 -->
  108. <name>dfs.ha.fencing.ssh.connect-timeout</name>
  109. <value>30000</value>
  110. </property>
  111. <!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
  112. <!--开启基于Zookeeper -->
  113. <property>
  114. <name>dfs.ha.automatic-failover.enabled</name>
  115. <value>true</value>
  116. </property>
  117. <!--动态许可datanode连接namenode列表 -->
  118. <property>
  119. <name>dfs.hosts</name>
  120. <value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/slaves</value>
  121. </property>
  122. </configuration>

mapred-site.xml配置如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <configuration>
  4. <!-- 配置 MapReduce Applications -->
  5. <property>
  6. <name>mapreduce.framework.name</name>
  7. <value>yarn</value>
  8. </property>
  9. <!-- JobHistory Server ============================================================== -->
  10. <!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
  11. <property>
  12. <name>mapreduce.jobhistory.address</name>
  13. <value>hadoop001:10020</value>
  14. </property>
  15. <!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
  16. <property>
  17. <name>mapreduce.jobhistory.webapp.address</name>
  18. <value>hadoop001:19888</value>
  19. </property>
  20. <!-- 配置 Map段输出的压缩,snappy,注意若,为hadoop为编译集成压缩格式,应注销本配置-->
  21. <!-- <property>
  22. <name>mapreduce.map.output.compress</name>
  23. <value>true</value>
  24. </property>
  25. <property>
  26. <name>mapreduce.map.output.compress.codec</name>
  27. <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  28. </property>-->
  29. </configuration>

yarn-site.xml配置如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <configuration>
  4. <!-- nodemanager 配置 ================================================= -->
  5. <property>
  6. <name>yarn.nodemanager.aux-services</name>
  7. <value>mapreduce_shuffle</value>
  8. </property>
  9. <property>
  10. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  11. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  12. </property>
  13. <property>
  14. <name>yarn.nodemanager.localizer.address</name>
  15. <value>0.0.0.0:23344</value>
  16. <description>Address where the localizer IPC is.</description>
  17. </property>
  18. <property>
  19. <name>yarn.nodemanager.webapp.address</name>
  20. <value>0.0.0.0:23999</value>
  21. <description>NM Webapp address.</description>
  22. </property>
  23. <!-- HA 配置 =============================================================== -->
  24. <!-- Resource Manager Configs -->
  25. <property>
  26. <name>yarn.resourcemanager.connect.retry-interval.ms</name>
  27. <value>2000</value>
  28. </property>
  29. <property>
  30. <name>yarn.resourcemanager.ha.enabled</name>
  31. <value>true</value>
  32. </property>
  33. <property>
  34. <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
  35. <value>true</value>
  36. </property>
  37. <!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing -->
  38. <property>
  39. <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
  40. <value>true</value>
  41. </property>
  42. <!-- 集群名称,确保HA选举时对应的集群 -->
  43. <property>
  44. <name>yarn.resourcemanager.cluster-id</name>
  45. <value>yarn-cluster</value>
  46. </property>
  47. <property>
  48. <name>yarn.resourcemanager.ha.rm-ids</name>
  49. <value>rm1,rm2</value>
  50. </property>
  51. <!--这里RM主备结点需要单独指定,(可选)
  52. <property>
  53. <name>yarn.resourcemanager.ha.id</name>
  54. <value>rm2</value>
  55. </property>
  56. -->
  57. <property>
  58. <name>yarn.resourcemanager.scheduler.class</name>
  59. <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  60. </property>
  61. <property>
  62. <name>yarn.resourcemanager.recovery.enabled</name>
  63. <value>true</value>
  64. </property>
  65. <property>
  66. <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
  67. <value>5000</value>
  68. </property>
  69. <!-- ZKRMStateStore 配置 -->
  70. <property>
  71. <name>yarn.resourcemanager.store.class</name>
  72. <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  73. </property>
  74. <property>
  75. <name>yarn.resourcemanager.zk-address</name>
  76. <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
  77. </property>
  78. <property>
  79. <name>yarn.resourcemanager.zk.state-store.address</name>
  80. <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
  81. </property>
  82. <!-- Client访问RM的RPC地址 (applications manager interface) -->
  83. <property>
  84. <name>yarn.resourcemanager.address.rm1</name>
  85. <value>hadoop001:23140</value>
  86. </property>
  87. <property>
  88. <name>yarn.resourcemanager.address.rm2</name>
  89. <value>hadoop002:23140</value>
  90. </property>
  91. <!-- AM访问RM的RPC地址(scheduler interface) -->
  92. <property>
  93. <name>yarn.resourcemanager.scheduler.address.rm1</name>
  94. <value>hadoop001:23130</value>
  95. </property>
  96. <property>
  97. <name>yarn.resourcemanager.scheduler.address.rm2</name>
  98. <value>hadoop002:23130</value>
  99. </property>
  100. <!-- RM admin interface -->
  101. <property>
  102. <name>yarn.resourcemanager.admin.address.rm1</name>
  103. <value>hadoop001:23141</value>
  104. </property>
  105. <property>
  106. <name>yarn.resourcemanager.admin.address.rm2</name>
  107. <value>hadoop002:23141</value>
  108. </property>
  109. <!--NM访问RM的RPC端口 -->
  110. <property>
  111. <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
  112. <value>hadoop001:23125</value>
  113. </property>
  114. <property>
  115. <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
  116. <value>hadoop002:23125</value>
  117. </property>
  118. <!-- RM web application 地址 -->
  119. <property>
  120. <name>yarn.resourcemanager.webapp.address.rm1</name>
  121. <value>hadoop001:8088</value>
  122. </property>
  123. <property>
  124. <name>yarn.resourcemanager.webapp.address.rm2</name>
  125. <value>hadoop002:8088</value>
  126. </property>
  127. <property>
  128. <name>yarn.resourcemanager.webapp.https.address.rm1</name>
  129. <value>hadoop001:23189</value>
  130. </property>
  131. <property>
  132. <name>yarn.resourcemanager.webapp.https.address.rm2</name>
  133. <value>hadoop002:23189</value>
  134. </property>
  135. <property>
  136. <name>yarn.log-aggregation-enable</name>
  137. <value>true</value>
  138. </property>
  139. <property>
  140. <name>yarn.log.server.url</name>
  141. <value>http://hadoop001:19888/jobhistory/logs</value>
  142. </property>
  143. <property>
  144. <name>yarn.nodemanager.resource.memory-mb</name>
  145. <value>2048</value>
  146. </property>
  147. <property>
  148. <name>yarn.scheduler.minimum-allocation-mb</name>
  149. <value>1024</value>
  150. <discription>单个任务可申请最少内存,默认1024MB</discription>
  151. </property>
  152. <property>
  153. <name>yarn.scheduler.maximum-allocation-mb</name>
  154. <value>2048</value>
  155. <discription>单个任务可申请最大内存,默认8192MB</discription>
  156. </property>
  157. <property>
  158. <name>yarn.nodemanager.resource.cpu-vcores</name>
  159. <value>2</value>
  160. </property>
  161. </configuration>

slaves文件如下:

  1. hadoop001
  2. hadoop002
  3. hadoop003

设置JDK的绝对路径(采坑)。三台都需要设置

  1. [hadoop@hadoop001 hadoop]$ cat hadoop-env.sh |grep JAVA #如下 已设置jdk的绝对路径
  2. # The only required environment variable is JAVA_HOME. All others are
  3. # set JAVA_HOME in this file, so that it is correctly defined on
  4. export JAVA_HOME=/usr/java/jdk1.8.0_45
  5. #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"

启动HA集群:

  1. #确保zk集群是启动的
  2. [hadoop@hadoop003 hadoop]$ zkServer.sh status
  3. JMX enabled by default
  4. Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
  5. Mode: leader
  6. #启动journalNode守护进程,三台同时执行
  7. [hadoop@hadoop002 sbin]$ cd ~/app/hadoop/bin #删除所有的windows命令
  8. [hadoop@hadoop002 sbin]$ rm -rf *.cmd
  9. [hadoop@hadoop002 sbin]$ cd ~/app/hadoop/sbin
  10. [hadoop@hadoop002 sbin]$ rm -rf *.cmd
  11. [hadoop@hadoop002 sbin]$ ./hadoop-daemon.sh start journalnode
  12. [hadoop@hadoop003 sbin]$ jps
  13. 1868 JournalNode
  14. 1725 QuorumPeerMain
  15. 1919 Jps
  16. #格式化namenode,注意只要hadoop001格式化即可,格式化成功标志,日志输出successfully formatted信息如下
  17. [hadoop@hadoop001 sbin]$ hadoop namenode -format
  18. ......
  19. : Storage directory /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/name has been successfully formatted.
  20. 19/04/06 19:50:08 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
  21. 19/04/06 19:50:08 INFO util.ExitUtil: Exiting with status 0
  22. 19/04/06 19:50:08 INFO namenode.NameNode: SHUTDOWN_MSG:
  23. /************************************************************
  24. SHUTDOWN_MSG: Shutting down NameNode at hadoop001/172.19.121.243
  25. ************************************************************/
  26. [hadoop@hadoop001 sbin]$ scp -r ~/app/hadoop/data/ hadoop002:/home/hadoop/app/hadoop/ #将nn的数据发一份到hadoop002
  27. #格式化zkfc,只要hadoop001执行即可,成功后会在zk的创建hadoop-ha/ruozeclusterg6,如下信息:
  28. [hadoop@hadoop001 sbin]$ hdfs zkfc -formatZK
  29. ....
  30. 19/04/06 20:03:02 INFO ha.ActiveStandbyElector: Session connected.
  31. 19/04/06 20:03:02 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ruozeclusterg6 in ZK.
  32. #启动hdfs,只要hadoop001执行即可
  33. [hadoop@hadoop001 sbin]$ start-dfs.sh #若出现如下错误,且jps发现datanode进程未启动,原因是slaves文件被污染,删除,重新编辑一份。
  34. ·····
  35. : Name or service not knownstname hadoop003
  36. : Name or service not knownstname hadoop001
  37. : Name or service not knownstname hadoop002
  38. [hadoop@hadoop002 current]$ rm -rf ~/app/hadoop/etc/hadoop/slaves
  39. [hadoop@hadoop002 current]$ vim ~/app/hadoop/etc/hadoop/slaves #添加DN节点信息
  40. hadoop001
  41. had00p002
  42. hadoop003
  43. ·····
  44. #重新启动hdfs,会共启动NN、DN、JN、ZKFC四个守护进程,停止hdfs,stop--dfs.sh
  45. [hadoop@hadoop001 sbin]$ start-dfs.sh
  46. 19/04/06 20:51:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  47. Starting namenodes on [hadoop001 hadoop002]
  48. hadoop001: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hadoop001.out
  49. hadoop002: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hadoop002.out
  50. hadoop002: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hadoop002.out
  51. hadoop003: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hadoop003.out
  52. hadoop001: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hadoop001.out
  53. Starting journal nodes [hadoop001 hadoop002 hadoop003]
  54. hadoop001: starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-journalnode-hadoop001.out
  55. hadoop003: starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-journalnode-hadoop003.out
  56. hadoop002: starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-journalnode-hadoop002.out
  57. 19/04/06 20:52:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  58. Starting ZK Failover Controllers on NN hosts [hadoop001 hadoop002]
  59. hadoop002: starting zkfc, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-zkfc-hadoop002.out
  60. hadoop001: starting zkfc, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-zkfc-hadoop001.out
  61. [hadoop@hadoop001 sbin]$ jps
  62. 5504 NameNode
  63. 5797 JournalNode
  64. 5606 DataNode
  65. 6054 Jps
  66. 1625 QuorumPeerMain
  67. 5983 DFSZKFailoverController
  68. #启动yarn,首先在hadoop001执行即可,此时从日志中可以看出只启动了一台RM,
  69. #另一个RM需手动前往hadoop002去启动
  70. [hadoop@hadoop001 sbin]$ start-yarn.sh
  71. starting yarn daemons
  72. starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-hadoop001.out
  73. hadoop001: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hadoop001.out
  74. hadoop002: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hadoop002.out
  75. hadoop003: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hadoop003.out
  76. [hadoop@hadoop002 current]$ yarn-daemon.sh start resourcemanager
  77. starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-hadoop002.out
  78. #启动jobhistory服务,在hadoop001上执行即可
  79. [hadoop@hadoop001 sbin]$ mr-jobhistory-daemon.sh start historyserver
  80. starting historyserver, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/mapred-hadoop-historyserver-hadoop001.out
  81. [hadoop@hadoop001 sbin]$ jps
  82. 5504 NameNode
  83. 6211 NodeManager
  84. 6116 ResourceManager
  85. 5797 JournalNode
  86. 5606 DataNode
  87. 1625 QuorumPeerMain
  88. 7037 JobHistoryServer
  89. 7118 Jps
  90. 5983 DFSZKFailoverController

3.7测试集群是否部署成功

通过命令空间操作hdfs文件

  1. [hadoop@hadoop002 current]$ hdfs dfs -ls hdfs://ruozeclusterg6/
  2. [hadoop@hadoop002 current]$ hdfs dfs -put ~/app/hadoop/README.txt hdfs://ruozeclusterg6/
  3. 19/04/06 21:08:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  4. [hadoop@hadoop002 current]$ hdfs dfs -ls hdfs://ruozeclusterg6/
  5. 19/04/06 21:08:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  6. Found 1 items
  7. -rw-r--r-- 3 hadoop hadoop 1366 2019-04-06 21:08 hdfs://ruozeclusterg6/README.txt

web界面访问

  • 配置阿里云安全组规则,出入方向放行所有端口
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QJiC2HaG-1608564980222)(https://s2.ax1x.com/2019/04/07/Aflfln.md.png)]
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5pnacJrp-1608564980224)(https://s2.ax1x.com/2019/04/07/Af1S0K.md.png)]
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qWkdZwLb-1608564980226)(https://s2.ax1x.com/2019/04/07/Af1kpd.md.png)]
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IEMmPNJB-1608564980227)(https://s2.ax1x.com/2019/04/07/Af1VXt.md.png)]
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zigD9yWV-1608564980228)(https://s2.ax1x.com/2019/04/07/Af1m0f.md.png)]
  • 配置windos的hosts文件
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-k3cpCKxs-1608564980228)(https://s2.ax1x.com/2019/04/07/Af1RAO.md.png)]
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DbMqaHsC-1608564980229)(https://s2.ax1x.com/2019/04/07/Af1A1A.png)]
  • web访问hadoop001的hdfs页面,具体谁是active有ZK决定
    1.24 hadoop之HA生产集群部署 - 图1
  • web访问hadoop001的hdfs页面,具体谁是standby有ZK决定
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2VbXIEvg-1608564980234)(https://s2.ax1x.com/2019/04/07/Af131s.md.png)]
  • web访问hadoop001的yarn active界面
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KWklNH10-1608564980235)(https://s2.ax1x.com/2019/04/07/Af11pj.md.png)]
  • web访问hadoop002的yarn standby界面直接访问hadoop002:8088地址会被强制跳转hadoop001的地址。应通过如下地址(ip:8088/cluster/cluster)访问
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jCJ2PJiX-1608564980235)(https://s2.ax1x.com/2019/04/07/Af18cn.md.png)]
  • web访问jobhistroy页面,我启动在hadoop001,故访问地址为hadoop001,端口通过netstat进程可查询到,
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JwPAHGJa-1608564980236)(https://s2.ax1x.com/2019/04/07/Af1GXq.md.png)]

测试MR代码,此时可从yarn以及jobhistory的web界面上看到任务情况

  1. [hadoop@hadoop001 sbin]$ find ~/app/hadoop/* -name '*example*.jar'
  2. [hadoop@hadoop001 sbin]$ hadoop jar /home/hadoop/app/hadoop/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 5 10

4.卸载HADOOP HA集群

停止hadoop的守护进程

  1. stop-all.sh
  2. mr-jobhistory-daemon.sh stop historyserver
  3. #执行停止脚本后,查询是否还有hadoop相关进程,若有,直接kill -9
  4. [hadoop@hadoop001 sbin]$ ps -ef | grep hadoop

删除zk上所有关于hadoop的信息

  1. [hadoop@hadoop001 sbin]$ zkCli.sh #进入zk客户端,删除所有hadoop的配置
  2. [zk: localhost:2181(CONNECTED) 0] ls /
  3. [zookeeper, hadoop-ha]
  4. [zk: localhost:2181(CONNECTED) 1] rmr /hadoop-ha
  5. [zk: localhost:2181(CONNECTED) 1] quit
  6. Quitting...

清空data数据目录

  1. rm -rf ~/app/hadoop/data/*

扩展1:生产中若遇到两个节点同为stand by状态时(无法HA),通常是ZK夯住了,需检查ZK状态。

扩展2:生产中若某台机器秘钥文件发生变更,不要傻傻的将known_hosts的文件清空,只要找到变更的机器所属的信息,删除即可。清空会影响其他应用登录,正产使用(若known_hosts无改机器登录信息,第一次需要输入yes,写一份信息在known_hosts上),要背锅的。

扩展3:生产中若遇到异常,首先检查错误信息,再检查配置、其次分析运行日志。若是启动或关闭报错,可debug 启动的脚本。sh -x XXX.sh 方式来debug脚本。 注意没有+表示脚本的输出内容,一个+表示当前行执行语法执行结果,两个++表示当前行某部分语法的执行结果。

扩展4:hadoop chechnative 命令可检测hadoop支持的压缩格式,false表示不支持,CDH版本的hadoop不支持压缩,身产中需要编译支持压缩。map阶段通常选择snappy格式压缩,因为snappy压缩速度最快(快速输出,当然压缩比最低),reduce阶段通常选择gzip或bzip2(压缩比最大,占最小磁盘空间,当然压缩解压时间最久)

扩展5 可通过,start-all.sh 或者stop-all.sh,来启动关闭hadoop集群

扩展6 cat * |grep xxx 命令查找当前文件夹下所有的文件内容