Hadoop HBase 集群搭建 - 图1

1. 环境准备

说明:本次集群搭建使用系统版本Centos 7.5 ,软件版本 V3.1.1。

1.1 配置说明

本次集群搭建共三台机器,具体说明下:

主机名 IP 说明
hadoop01 10.0.0.10 DataNode、NodeManager、NameNode
hadoop02 10.0.0.11 DataNode、NodeManager、ResourceManager、SecondaryNameNode
hadoop03 10.0.0.12 DataNode、NodeManager

1.2 机器配置说明

  1. [clsn@hadoop01 /home/clsn]
  2. $cat /etc/redhat-release
  3. CentOS Linux release 7.5.1804 (Core)
  4. [clsn@hadoop01 /home/clsn]
  5. $uname -r
  6. 3.10.0-862.el7.x86_64
  7. [clsn@hadoop01 /home/clsn]
  8. $sestatus
  9. SELinux status: disabled
  10. [clsn@hadoop01 /home/clsn]
  11. $systemctl status firewalld.service
  12. firewalld.service - firewalld - dynamic firewall daemon
  13. Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
  14. Active: inactive (dead)
  15. Docs: man:firewalld(1)
  16. [clsn@hadoop01 /home/clsn]
  17. $id clsn
  18. uid=1000(clsn) gid=1000(clsn) 组=1000(clsn)
  19. [clsn@hadoop01 /home/clsn]
  20. $cat /etc/hosts
  21. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
  22. ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
  23. 10.0.0.10 hadoop01
  24. 10.0.0.11 hadoop02
  25. 10.0.0.12 hadoop03

注:本集群内所有进程均由clsn用户启动

1.3 ssh互信配置

  1. ssh-keygen
  2. ssh-copy-id -i ~/.ssh/id_rsa.pub 127.0.0.1
  3. scp -rp ~/.ssh hadoop02:/home/clsn
  4. scp -rp ~/.ssh hadoop03:/home/clsn

1.4 配置jdk

在三台机器上都需要操作

  1. tar xf jdk-8u191-linux-x64.tar.gz -C /usr/local/
  2. ln -s /usr/local/jdk1.8.0_191 /usr/local/jdk
  3. sed -i.ori '$a export JAVA_HOME=/usr/local/jdk\nexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH\nexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar' /etc/profile
  4. . /etc/profile

2. 安装hadoop

2.1 安装包下载(Binary)

  1. wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz

2.2 安装

  1. tar xf hadoop-3.1.1.tar.gz -C /usr/local/
  2. ln -s /usr/local/hadoop-3.1.1 /usr/local/hadoop
  3. sudo chown -R clsn.clsn /usr/local/hadoop-3.1.1/

3.修改hadoop配置

配置文件全部位于 /usr/local/hadoop/etc/hadoop 文件夹下

3.1 hadoop-env.sh

  1. [clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
  2. $ head hadoop-env.sh
  3. . /etc/profile
  4. #
  5. # Licensed to the Apache Software Foundation (ASF) under one
  6. # or more contributor license agreements. See the NOTICE file
  7. # distributed with this work for additional information
  8. # regarding copyright ownership. The ASF licenses this file
  9. # to you under the Apache License, Version 2.0 (the
  10. # "License"); you may not use this file except in compliance
  11. # with the License. You may obtain a copy of the License at

3.2 core-site.xml

  1. [clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
  2. $ cat core-site.xml
  3. <?xml version="1.0" encoding="UTF-8"?>
  4. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  5. <!-- Put site-specific property overrides in this file. -->
  6. <configuration>
  7. <!-- 指定HDFS老大(namenode)的通信地址 -->
  8. <property>
  9. <name>fs.defaultFS</name>
  10. <value>hdfs://hadoop01:9000</value>
  11. </property>
  12. <!-- 指定hadoop运行时产生文件的存储路径 -->
  13. <property>
  14. <name>hadoop.tmp.dir</name>
  15. <value>/data/tmp</value>
  16. </property>
  17. </configuration>

3.3 hdfs-site.xml

  1. [clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
  2. $ cat hdfs-site.xml
  3. <?xml version="1.0" encoding="UTF-8"?>
  4. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  5. <!-- Put site-specific property overrides in this file. -->
  6. <configuration>
  7. <!-- 设置namenodehttp通讯地址 -->
  8. <property>
  9. <name>dfs.namenode.http-address</name>
  10. <value>hadoop01:50070</value>
  11. </property>
  12. <!-- 设置secondarynamenodehttp通讯地址 -->
  13. <property>
  14. <name>dfs.namenode.secondary.http-address</name>
  15. <value>hadoop02:50090</value>
  16. </property>
  17. <!-- 设置namenode存放的路径 -->
  18. <property>
  19. <name>dfs.namenode.name.dir</name>
  20. <value>/data/name</value>
  21. </property>
  22. <!-- 设置hdfs副本数量 -->
  23. <property>
  24. <name>dfs.replication</name>
  25. <value>2</value>
  26. </property>
  27. <!-- 设置datanode存放的路径 -->
  28. <property>
  29. <name>dfs.datanode.data.dir</name>
  30. <value>/data/datanode</value>
  31. </property>
  32. <property>
  33. <name>dfs.permissions</name>
  34. <value>false</value>
  35. </property>
  36. </configuration>

3.4 mapred-site.xml

  1. [clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
  2. $ cat mapred-site.xml
  3. <?xml version="1.0"?>
  4. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  5. <!-- Put site-specific property overrides in this file. -->
  6. <configuration>
  7. <!-- 通知框架MR使用YARN -->
  8. <property>
  9. <name>mapreduce.framework.name</name>
  10. <value>yarn</value>
  11. </property>
  12. <property>
  13. <name>mapreduce.application.classpath</name>
  14. <value>
  15. /usr/local/hadoop/etc/hadoop,
  16. /usr/local/hadoop/share/hadoop/common/*,
  17. /usr/local/hadoop/share/hadoop/common/lib/*,
  18. /usr/local/hadoop/share/hadoop/hdfs/*,
  19. /usr/local/hadoop/share/hadoop/hdfs/lib/*,
  20. /usr/local/hadoop/share/hadoop/mapreduce/*,
  21. /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
  22. /usr/local/hadoop/share/hadoop/yarn/*,
  23. /usr/local/hadoop/share/hadoop/yarn/lib/*
  24. </value>
  25. </property>
  26. </configuration>

3.5 yarn-site.xml

  1. [clsn@hadoop01 /usr/local/hadoop/etc/hadoop]
  2. $ cat yarn-site.xml
  3. <?xml version="1.0"?>
  4. <configuration>
  5. <property>
  6. <name>yarn.resourcemanager.hostname</name>
  7. <value>hadoop02</value>
  8. </property>
  9. <property>
  10. <description>The http address of the RM web application.</description>
  11. <name>yarn.resourcemanager.webapp.address</name>
  12. <value>${yarn.resourcemanager.hostname}:8088</value>
  13. </property>
  14. <property>
  15. <description>The address of the applications manager interface in the RM.</description>
  16. <name>yarn.resourcemanager.address</name>
  17. <value>${yarn.resourcemanager.hostname}:8032</value>
  18. </property>
  19. <property>
  20. <description>The address of the scheduler interface.</description>
  21. <name>yarn.resourcemanager.scheduler.address</name>
  22. <value>${yarn.resourcemanager.hostname}:8030</value>
  23. </property>
  24. <property>
  25. <name>yarn.resourcemanager.resource-tracker.address</name>
  26. <value>${yarn.resourcemanager.hostname}:8031</value>
  27. </property>
  28. <property>
  29. <description>The address of the RM admin interface.</description>
  30. <name>yarn.resourcemanager.admin.address</name>
  31. <value>${yarn.resourcemanager.hostname}:8033</value>
  32. </property>
  33. </configuration>

3.6 masters & slaves

  1. echo 'hadoop02' >> /usr/local/hadoop/etc/hadoop/masters
  2. echo 'hadoop03
  3. hadoop01' >> /usr/local/hadoop/etc/hadoop/slaves

3.7 启动脚本修改

启动脚本文件全部位于 /usr/local/hadoop/sbin 文件夹下:
(1)修改 start-dfs.sh stop-dfs.sh 文件添加:

  1. HDFS_DATANODE_USER=clsn
  2. HADOOP_SECURE_DN_USER=hdfs
  3. HDFS_NAMENODE_USER=clsn
  4. HDFS_SECONDARYNAMENODE_USER=clsn

(2)修改start-yarn.sh 和 stop-yarn.sh文件添加:

  1. YARN_RESOURCEMANAGER_USER=clsn
  2. HADOOP_SECURE_DN_USER=yarn
  3. YARN_NODEMANAGER_USER=clsn

4. 启动前准备

4.1 创建文件目录

  1. mkdir -p /data/tmp
  2. mkdir -p /data/name
  3. mkdir -p /data/datanode
  4. chown -R clsn.clsn /data

在集群内所有机器上都进行创建,也可以复制文件夹

  1. for i in hadoop02 hadoop03
  2. do
  3. sudo scp -rp /data $i:/
  4. done

4.2 复制hadoop配置到其他机器

  1. for i in hadoop02 hadoop03
  2. do
  3. sudo scp -rp /usr/local/hadoop-3.1.1 $i:/usr/local/
  4. done

4.3 启动hadoop集群

(1)第一次启动前需要格式化

  1. /usr/local/hadoop/bin/hdfs namenode -format

(2)启动集群

  1. cd /usr/local/hadoop/sbin
  2. ./start-all.sh

5.集群启动成功

(1)使用jps查看集群中各个角色,是否与预期相一致

  1. [clsn@hadoop01 /home/clsn]
  2. $ pssh -ih cluster "`which jps`"
  3. [1] 11:30:31 [SUCCESS] hadoop03
  4. 7947 DataNode
  5. 8875 Jps
  6. 8383 NodeManager
  7. [2] 11:30:31 [SUCCESS] hadoop01
  8. 20193 DataNode
  9. 20665 NodeManager
  10. 21017 NameNode
  11. 22206 Jps
  12. [3] 11:30:31 [SUCCESS] hadoop02
  13. 8896 DataNode
  14. 9427 NodeManager
  15. 10883 Jps
  16. 9304 ResourceManager
  17. 10367 SecondaryNameNode

(2)浏览器访问http://hadoop02:8088/cluster/nodes
该页面为ResourceManager 管理界面,在上面可以看到集群中的三台Active Nodes。
Hadoop HBase 集群搭建 - 图2
(3) 浏览器访问http://hadoop01:50070/dfshealth.html#tab-datanode
该页面为NameNode管理页面
Hadoop HBase 集群搭建 - 图3

6.Hbase配置

Hadoop HBase 集群搭建 - 图4

6.1 部署Hbase包

  1. cd /opt/
  2. wget http://mirrors.tuna.tsinghua.edu.cn/apache/hbase/1.4.9/hbase-1.4.9-bin.tar.gz
  3. tar xf hbase-1.4.9-bin.tar.gz -C /usr/local/
  4. ln -s /usr/local/hbase-1.4.9 /usr/local/hbase

6.2 修改配置文件

6.2.1 hbase-env.sh

  1. # 添加一行
  2. . /etc/profile

6.2.2 hbase-site.xml

  1. [clsn@hadoop01 /usr/local/hbase/conf]
  2. $ cat hbase-site.xml
  3. <?xml version="1.0"?>
  4. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  5. <configuration>
  6. <property>
  7. <name>hbase.rootdir</name>
  8. <!-- hbase存放数据目录 -->
  9. <value>hdfs://hadoop01:9000/hbase/hbase_db</value>
  10. <!-- 端口要和Hadoopfs.defaultFS端口一致-->
  11. </property>
  12. <property>
  13. <name>hbase.cluster.distributed</name>
  14. <!-- 是否分布式部署 -->
  15. <value>true</value>
  16. </property>
  17. <property>
  18. <name>hbase.zookeeper.quorum</name>
  19. <!-- zookooper 服务启动的节点,只能为奇数个 -->
  20. <value>hadoop01,hadoop02,hadoop03</value>
  21. </property>
  22. <property>
  23. <!--zookooper配置、日志等的存储位置,必须为以存在 -->
  24. <name>hbase.zookeeper.property.dataDir</name>
  25. <value>/data/hbase/zookeeper</value>
  26. </property>
  27. <property>
  28. <!--hbase web 端口 -->
  29. <name>hbase.master.info.port</name>
  30. <value>16610</value>
  31. </property>
  32. </configuration>

注意:
> zookeeper有这样一个特性:

集群中只要有过半的机器是正常工作的,那么整个集群对外就是可用的。

也就是说如果有2个zookeeper,那么只要有1个死了zookeeper就不能用了,因为1没有过半,所以2个zookeeper的死亡容忍度为0;

同理,要是有3个zookeeper,一个死了,还剩下2个正常的,过半了,所以3个zookeeper的容忍度为1;

再多列举几个:2->0 ; 3->1 ; 4->1 ; 5->2 ; 6->2 会发现一个规律,2n和2n-1的容忍度是一样的,都是n-1,所以为了更加高效,何必增加那一个不必要的zookeeper

6.2.3 regionservers

  1. [clsn@hadoop01 /usr/local/hbase/conf]
  2. $ cat regionservers
  3. hadoop01
  4. hadoop02
  5. hadoop03

6.2.4 分发配置到其他节点

  1. for i in hadoop02 hadoop03
  2. do
  3. sudo scp -rp /usr/local/hbase-1.4.9 $i:/usr/local/
  4. done

6.3 启动hbase集群

6.3.1 启动hbase

  1. [clsn@hadoop01 /usr/local/hbase/bin]
  2. $ sudo ./start-hbase.sh
  3. hadoop03: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop03.out
  4. hadoop02: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop02.out
  5. hadoop01: running zookeeper, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-zookeeper-hadoop01.out
  6. running master, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-master-hadoop01.out
  7. hadoop02: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop02.out
  8. hadoop03: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop03.out
  9. hadoop01: running regionserver, logging to /usr/local/hbase-1.4.9/bin/../logs/hbase-root-regionserver-hadoop01.out

访问 http://hadoop01:16610/master-status 查看hbase状态
Hadoop HBase 集群搭建 - 图5

6.3.2 启动hbase 客户端

  1. [clsn@hadoop01 /usr/local/hbase/bin]
  2. $ ./hbase shell #启动hbase客户端
  3. SLF4J: Class path contains multiple SLF4J bindings.
  4. SLF4J: Found binding in [jar:file:/usr/local/hbase-1.4.9/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  8. HBase Shell
  9. Use "help" to get list of supported commands.
  10. Use "exit" to quit this interactive shell.
  11. Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec 5 11:54:10 PST 2018
  12. hbase(main):001:0> create 'clsn','cf' #创建一个clsn表,一个cf 列簇
  13. 0 row(s) in 7.8790 seconds
  14. => Hbase::Table - clsn
  15. hbase(main):003:0> list #查看hbase 所有表
  16. TABLE
  17. clsn
  18. 1 row(s) in 0.0860 seconds
  19. => ["clsn"]
  20. hbase(main):004:0> put 'clsn','1000000000','cf:name','clsn' #put一条记录到表clsn,rowkey 为 1000000000,放到 name列上
  21. 0 row(s) in 0.3390 seconds
  22. hbase(main):005:0> put 'clsn','1000000000','cf:sex','male' #put一条记录到表clsn,rowkey 为 1000000000,放到sex列上
  23. 0 row(s) in 0.0300 seconds
  24. hbase(main):006:0> put 'clsn','1000000000','cf:age','24' #put一条记录到表clsn,rowkey 为 1000000000,放到age列上
  25. 0 row(s) in 0.0290 seconds
  26. hbase(main):007:0> count 'clsn'
  27. 1 row(s) in 0.2100 seconds
  28. => 1
  29. hbase(main):008:0> get 'clsn','cf'
  30. COLUMN CELL
  31. 0 row(s) in 0.1050 seconds
  32. hbase(main):009:0> get 'clsn','1000000000' #获取数据
  33. COLUMN CELL
  34. cf:age timestamp=1545710530665, value=24
  35. cf:name timestamp=1545710495871, value=clsn
  36. cf:sex timestamp=1545710509333, value=male
  37. 1 row(s) in 0.0830 seconds
  38. hbase(main):010:0> list
  39. TABLE
  40. clsn
  41. 1 row(s) in 0.0240 seconds
  42. => ["clsn"]
  43. hbase(main):011:0> drop clsn
  44. NameError: undefined local variable or method `clsn' for #<Object:0x6f731759>
  45. hbase(main):012:0> drop 'clsn'
  46. ERROR: Table clsn is enabled. Disable it first.
  47. Here is some help for this command:
  48. Drop the named table. Table must first be disabled:
  49. hbase> drop 't1'
  50. hbase> drop 'ns1:t1'
  51. hbase(main):013:0> list
  52. TABLE
  53. clsn
  54. 1 row(s) in 0.0330 seconds
  55. => ["clsn"]
  56. hbase(main):015:0> disable 'clsn'
  57. 0 row(s) in 2.4710 seconds
  58. hbase(main):016:0> list
  59. TABLE
  60. clsn
  61. 1 row(s) in 0.0210 seconds
  62. => ["clsn"]

7. 参考文献

https://hadoop.apache.org/releases.html https://my.oschina.net/orrin/blog/1816023

https://www.yiibai.com/hadoop/

http://blog.fens.me/hadoop-family-roadmap/

http://www.cnblogs.com/Springmoon-venn/p/9054006.html

https://github.com/googlehosts/hosts

http://abloz.com/hbase/book.html