HBase
以hadoop01作为时间同步服务器，其他其节点同步hadoop01的时间
修改hadoop01 的/etc/ntp.conf文件，在内增加
修改其他节点的/etc/ntp.conf文件，添加
- server 192.168.127.128

HBase

HBASE是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统

数据模型

Name Space(命名空间)
命名空间，相当于关系型数据库中database，每个命名空间下有很多表，HBase自带了两个命名空间分别是default个HBase，HBase中存放了Hbase内置的表，而default是用户默认使用的命名空间。
Region（表）
类似于关系型数据库中表，HBase在定义表时只需要声明列簇，不需要声明具体的列
Row(行)
HBase中每行数据都由一个RowKey和多个（Column）列组成，数据是按照RowKey的字典序排序存储的，而查询时只能根据RowKey进行检索
Column（列）
HBase中每个列都是由列簇（Column Family）和列限定符(Column Qualifier)进行限定，
例：info:name,info:age；建表时，只需要指定列簇。
Time Stamp（时间戳）
用来标识数据的不同版本
Cell（单元）
由{rowkey,column Family:column Qualifier,TimeStamp} 唯一确定的单元，cell中的数据没有类型，是以字节码进行存储

HBase系统架构

Hbase的存储机制
Hbase 是一个面向列的数据库，在表中按行进行排序，表模式定义只能列簇，键值对形式。一个表有多个列簇和一个列簇有多个列；
- 表是行的集合
- 行是列簇的集合
- 列簇是列的集合
- 列是键值对的集合
Hbase系统架构体系图

HBase - 图1

HBase是一个分布式存储系统，有HMaster和HRegionServer;

Client：使用HBase RPC 机制与HMaster和HRegionServer进行通行；Client与HMaster进行管理类操作，与HRegionServer进行数据读写类操作；
HMaster：有多个节点的Hbase Master，根据zookeeper的Master Election的机制保证总有一个柱节点在运行；
HMaster主要负责Table和Region的管理：
1. 管理用户对表的增删改查操作（改是put操作，新增一条数据）
2. 管理HRegionServer的负载均衡，调整Region分布
3. Region Split后，负责新Region的分布
4. 在HRegionServer停机后，负责将失效HRegionServer上Region迁移
zookeeper： zookeeper集群存储-ROOT-表地址、HMaster地址；HRegionServer把自己以Ephedral方式注册到Zookeeper中，HMaster随时感知各个HRegionServer的健康状况
HRegionServer： HBase中最核心的模块，主要负责响应用户I/O请求，向HDFS文件系统中读写
通过上图可以了解到，HRegionServer管理很多HRegion对象；
- client访问hbase上的数据并不需要master参与，master仅仅维护table和region的元数据信息
- 每个HRegion对应Table中的一个Region，HRegion由多个HStore组成；
- 一个HRegion（表）有多少个列族就有多少个Store。一个HRegionServer会有多个HRegion和一个HLog
HRegion：

HBase常用命令

进入 hbase shell
退出 exit
查看hbase状态 status
创建表 create ‘表明’,’列簇名1’,’列簇名2’,’列簇名N’；
查看所有表 list
描述表 describe ‘表名’
判断表是否存在 exists ‘表名’
判断是否禁用启用表 is_enabled ‘表名’ is_disabled ‘表名’
添加记录 put ‘表名’,’rowkey’,’列簇:列’,’值’
查看记录rowkey 下所有数据 get ‘表名’,’rowkey’
查看所有记录 scan ‘表名’
查看表中的记录总数 count ‘表名’
获取某个列簇（获取某个列族的某个列） get ‘表名’,’rowkey’,’列簇:列’
删除记录 delete ‘表名’,’行名’,’列簇:列’
删除整行 deleteall ‘表名’,’行名’,’列簇:列’
删除一张表首先屏蔽该表，然后删除该表 disable ‘表名’ drop ‘表名’
清空表 truncate ‘表名’
查看某个表某个列中所有数据 scan ‘表名’,{COLUMNS=>’列族名：列名’}

Hbase集群安装

安装前准备
- zookeeper-3.4.14.tar.gz 安装包
- hbase-2.2.1-bin.tar.gz安装包
- Hadoop-3.1.2.tar.gz 安装包
- 3台虚拟机
安装Hadoop
请参考Hadoop.md文档，hadoop分布式集群部署安装步骤
安装zookeeper
请参考zookeeper.md文档，其中提供zookeeper分布式集群部署安装步骤
安装hbase
1. 把hbase-2.2.1-bin.tar.gz安装包分别上传到虚拟机hadoop01、hadoop02、hadoop03上
2. 解压hbase安装包
```
tar -zxvf hbase-2.2.1-bin.tar.gz
```

配置环境变量(所有节点上的环境变量)

HBASE_HOME=/opt/hbase-2.2.1
PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin
export PATH CLASSPATH JAVA_HOME HADOOP_HOME ZOOKEEPER_HOME HBASE_HOME

设置集群时间同步 ```shell yum -y install ntp ntpdate #安装ntpdate时间同步工具 sudo systemctl start ntpd #启动时间同步程序 sudo systemctl enable ntpd #允许时间同步程序开机启动

以hadoop01作为时间同步服务器，其他其节点同步hadoop01的时间

修改hadoop01 的/etc/ntp.conf文件，在内增加

server 127.0.0.1 #设置自己作为时间同步服务器 restrict 192.168.0.0

修改其他节点的/etc/ntp.conf文件，添加

server 192.168.127.128

sudo timedatectl set-ntp yes 所有节点启动时间同步 timedatectl #查看系统时间


   5. 
修改hbase配置文件
      - 
修改hbase-env.sh文件
```shell
#!/usr/bin/env bash
#
#/**
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */
# Set environment variables here.
# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)
# The java implementation to use.  Java 1.8+ required.
# export JAVA_HOME=/usr/java/jdk1.8.0/
export JAVA_HOME=/usr/java/jdk1.8.0_192-amd64
# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=
export HBASE_CLASSPATH=/opt/hadoop-3.1.2/etc/hadoop
# The maximum amount of heap to use. Default is left to JVM default.
# export HBASE_HEAPSIZE=1G
# Uncomment below if you intend to use off heap cache. For example, to allocate 8G of 
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G
# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://hbase.apache.org/book.html#performance
export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC"
# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.
# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
# Uncomment one of the below three options to enable java garbage collection logging for the client processes.
# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
# See the package documentation for org.apache.hadoop.hbase.io.hfile for other configurations
# needed setting up off-heap block caching. 
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# NOTE: HBase provides an alternative JMX implementation to fix the random ports issue, please see JMX
# section in HBase Reference Guide for instructions.
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"
# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"
# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs
# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8074"
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/opt/hbase-2.2.1/pids
# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of ZooKeeper or not.
# 设置为fasle使用自己的zookeeper，设置为true使用hbase自身zk
# export HBASE_MANAGES_ZK=true
export HBASE_MANAGES_ZK=false
# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.
# Tell HBase whether it should include Hadoop's lib when start up,
# the default value is false,means that includes Hadoop's lib.
# export HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP="true"

设置hbase-site.xml文件

<configuration>
    <!--hbasemaster的主机和端口-->
    <property>
            <name>hbase.master</name> 
            <value>hadoop1:60000</value>
    </property>
    <!--<property>
        <name>hbase.master.info.port</name>
        <value>60010</value>
    </property>
    -->
    <!--时间同步允许的时间差-->
    <property>
            <name>hbase.master.maxclockskew</name>
            <value>180000</value>
    </property>
    <!--hbase共享目录，持久化hbase数据-->
    <property>
            <name>hbase.rootdir</name>
            <value>hdfs://hadoop01:9000/hbase</value>
    </property>
    <!--是否分布式运行，false即为单机-->
    <property>
            <name>hbase.cluster.distributed</name>
            <value>true</value>
    </property>
    <!--zookeeper地址-->
    <property>
            <name>hbase.zookeeper.quorum</name>
            <value>hadoop01,hadoop02,hadoop03</value>
    </property>
    <!--zookeeper配置信息快照的位置-->
    <property>
        <name>hbase.zookeeper.property.dataDir</name>
            <value>/home/hbase/tmp/zookeeper</value>
    </property>
    <property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
    </property>
</configuration>

设置regionservers 文件

#该文件是配置hbase salves节点
hadoop02
hadoop03

将hadoop中的两个配置文件 core-site.xml和hdfs-site.xml文件复制到hbase下的配置文件夹中

cp /opt/hadoop-3.1.2/etc/hadoop/core-site.xml /opt/hbase-2.2.1/conf
cp /opt/hadoop-3.1.2/etc/hadoop/core-site.xml /opt/hbase-2.2.1/conf

将hadoop01机器上的/opt/hbase-2.2.1/conf/* 分发到hadoop02、hadoop03节点上

scp /opt/hbase-2.2.1/conf/* hadoop02:/opt/hbase-2.2.1/conf/
scp /opt/hbase-2.2.1/conf/* hadoop03:/opt/hbase-2.2.1/conf/

启动与关闭hbase

start-hbase.sh
stop-hbase.sh

查看启动的hbase服务

# 在hadoop01节点上只有HMaster在启动，作为主节点，hadoop02和hadoop03作为slaves节点只有HRegionServer启动
jps

进入hbase的shell

hbase shell