HBase
HBASE是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统
数据模型
Name Space(命名空间)
命名空间,相当于关系型数据库中database,每个命名空间下有很多表,HBase自带了两个命名空间分别是default个HBase,HBase中存放了Hbase内置的表,而default是用户默认使用的命名空间。Region(表)
类似于关系型数据库中表,HBase在定义表时只需要声明列簇,不需要声明具体的列Row(行)
HBase中每行数据都由一个RowKey和多个(Column)列组成,数据是按照RowKey的字典序排序存储的,而查询时只能根据RowKey进行检索Column(列)
HBase中每个列都是由列簇(Column Family)和列限定符(Column Qualifier)进行限定,
例:info:name,info:age;建表时,只需要指定列簇。Time Stamp(时间戳)
用来标识数据的不同版本Cell(单元)
由{rowkey,column Family:column Qualifier,TimeStamp} 唯一确定的单元,cell中的数据没有类型,是以字节码进行存储
HBase系统架构
Hbase的存储机制
Hbase 是一个面向列的数据库,在表中按行进行排序,表模式定义只能列簇,键值对形式。一个表有多个列簇和一个列簇有多个列;- 表是行的集合
- 行是列簇的集合
- 列簇是列的集合
- 列是键值对的集合
- Hbase系统架构体系图

HBase是一个分布式存储系统,有HMaster和HRegionServer;
Client:使用HBase RPC 机制与HMaster和HRegionServer进行通行;Client与HMaster进行管理类操作,与HRegionServer进行数据读写类操作;
HMaster:有多个节点的Hbase Master,根据zookeeper的Master Election的机制保证总有一个柱节点在运行;
HMaster主要负责Table和Region的管理:- 管理用户对表的增删改查操作(改是put操作,新增一条数据)
- 管理HRegionServer的负载均衡,调整Region分布
- Region Split后,负责新Region的分布
- 在HRegionServer停机后,负责将失效HRegionServer上Region迁移
zookeeper: zookeeper集群存储-ROOT-表地址、HMaster地址;HRegionServer把自己以Ephedral方式注册到Zookeeper中,HMaster随时感知各个HRegionServer的健康状况
HRegionServer: HBase中最核心的模块,主要负责响应用户I/O请求,向HDFS文件系统中读写
通过上图可以了解到,HRegionServer管理很多HRegion对象;client访问hbase上的数据并不需要master参与,master仅仅维护table和region的元数据信息
每个HRegion对应Table中的一个Region,HRegion由多个HStore组成;
一个HRegion(表)有多少个列族就有多少个Store。一个HRegionServer会有多个HRegion和一个HLog
HRegion:
HBase常用命令
- 进入 hbase shell
- 退出 exit
- 查看hbase状态 status
- 创建表 create ‘表明’,’列簇名1’,’列簇名2’,’列簇名N’;
- 查看所有表 list
- 描述表 describe ‘表名’
- 判断表是否存在 exists ‘表名’
- 判断是否禁用启用表 is_enabled ‘表名’ is_disabled ‘表名’
- 添加记录 put ‘表名’,’rowkey’,’列簇:列’,’值’
- 查看记录rowkey 下所有数据 get ‘表名’,’rowkey’
- 查看所有记录 scan ‘表名’
- 查看表中的记录总数 count ‘表名’
- 获取某个列簇(获取某个列族的某个列) get ‘表名’,’rowkey’,’列簇:列’
- 删除记录 delete ‘表名’,’行名’,’列簇:列’
- 删除整行 deleteall ‘表名’,’行名’,’列簇:列’
- 删除一张表 首先屏蔽该表,然后删除该表 disable ‘表名’ drop ‘表名’
- 清空表 truncate ‘表名’
- 查看某个表某个列中所有数据 scan ‘表名’,{COLUMNS=>’列族名:列名’}
Hbase集群安装
安装前准备
- zookeeper-3.4.14.tar.gz 安装包
- hbase-2.2.1-bin.tar.gz安装包
- Hadoop-3.1.2.tar.gz 安装包
- 3台虚拟机
安装Hadoop
请参考Hadoop.md文档,hadoop分布式集群部署安装步骤安装zookeeper
请参考zookeeper.md文档,其中提供zookeeper分布式集群部署安装步骤安装hbase
把hbase-2.2.1-bin.tar.gz安装包分别上传到虚拟机hadoop01、hadoop02、hadoop03上
解压hbase安装包
tar -zxvf hbase-2.2.1-bin.tar.gz
- 配置环境变量(所有节点上的环境变量)
HBASE_HOME=/opt/hbase-2.2.1PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/binexport PATH CLASSPATH JAVA_HOME HADOOP_HOME ZOOKEEPER_HOME HBASE_HOME
- 设置集群时间同步 ```shell yum -y install ntp ntpdate #安装ntpdate时间同步工具 sudo systemctl start ntpd #启动时间同步程序 sudo systemctl enable ntpd #允许时间同步程序开机启动
以hadoop01作为时间同步服务器,其他其节点同步hadoop01的时间
修改hadoop01 的/etc/ntp.conf文件,在内增加
server 127.0.0.1 #设置自己作为时间同步服务器 restrict 192.168.0.0
修改其他节点的/etc/ntp.conf文件,添加
server 192.168.127.128
sudo timedatectl set-ntp yes 所有节点启动时间同步 timedatectl #查看系统时间
5.修改hbase配置文件-修改hbase-env.sh文件```shell#!/usr/bin/env bash##/**# * Licensed to the Apache Software Foundation (ASF) under one# * or more contributor license agreements. See the NOTICE file# * distributed with this work for additional information# * regarding copyright ownership. The ASF licenses this file# * to you under the Apache License, Version 2.0 (the# * "License"); you may not use this file except in compliance# * with the License. You may obtain a copy of the License at# *# * http://www.apache.org/licenses/LICENSE-2.0# *# * Unless required by applicable law or agreed to in writing, software# * distributed under the License is distributed on an "AS IS" BASIS,# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# * See the License for the specific language governing permissions and# * limitations under the License.# */# Set environment variables here.# This script sets variables multiple times over the course of starting an hbase process,# so try to keep things idempotent unless you want to take an even deeper look# into the startup scripts (bin/hbase, etc.)# The java implementation to use. Java 1.8+ required.# export JAVA_HOME=/usr/java/jdk1.8.0/export JAVA_HOME=/usr/java/jdk1.8.0_192-amd64# Extra Java CLASSPATH elements. Optional.# export HBASE_CLASSPATH=export HBASE_CLASSPATH=/opt/hadoop-3.1.2/etc/hadoop# The maximum amount of heap to use. Default is left to JVM default.# export HBASE_HEAPSIZE=1G# Uncomment below if you intend to use off heap cache. For example, to allocate 8G of# offheap, set the value to "8G".# export HBASE_OFFHEAPSIZE=1G# Extra Java runtime options.# Below are what we set by default. May only work with SUN JVM.# For more on why as well as other possible settings,# see http://hbase.apache.org/book.html#performanceexport HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC"# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.# This enables basic gc logging to the .out file.# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"# This enables basic gc logging to its own file.# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"# Uncomment one of the below three options to enable java garbage collection logging for the client processes.# This enables basic gc logging to the .out file.# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"# This enables basic gc logging to its own file.# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"# See the package documentation for org.apache.hadoop.hbase.io.hfile for other configurations# needed setting up off-heap block caching.# Uncomment and adjust to enable JMX exporting# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html# NOTE: HBase provides an alternative JMX implementation to fix the random ports issue, please see JMX# section in HBase Reference Guide for instructions.# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident#HBASE_REGIONSERVER_MLOCK=true#HBASE_REGIONSERVER_UID="hbase"# File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default.# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters# Extra ssh options. Empty by default.# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"# Where log files are stored. $HBASE_HOME/logs by default.# export HBASE_LOG_DIR=${HBASE_HOME}/logs# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"# export HBASE_REST_OPTS="$HBASE_REST_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8074"# A string representing this instance of hbase. $USER by default.# export HBASE_IDENT_STRING=$USER# The scheduling priority for daemon processes. See 'man nice'.# export HBASE_NICENESS=10# The directory where pid files are stored. /tmp by default.export HBASE_PID_DIR=/opt/hbase-2.2.1/pids# Seconds to sleep between slave commands. Unset by default. This# can be useful in large clusters, where, e.g., slave rsyncs can# otherwise arrive faster than the master can service them.# export HBASE_SLAVE_SLEEP=0.1# Tell HBase whether it should manage it's own instance of ZooKeeper or not.# 设置为fasle使用自己的zookeeper,设置为true使用hbase自身zk# export HBASE_MANAGES_ZK=trueexport HBASE_MANAGES_ZK=false# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the# RFA appender. Please refer to the log4j.properties file to see more details on this appender.# In case one needs to do log rolling on a date change, one should set the environment property# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".# For example:# HBASE_ROOT_LOGGER=INFO,DRFA# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.# Tell HBase whether it should include Hadoop's lib when start up,# the default value is false,means that includes Hadoop's lib.# export HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP="true"
-
设置hbase-site.xml文件
<configuration><!--hbasemaster的主机和端口--><property><name>hbase.master</name><value>hadoop1:60000</value></property><!--<property><name>hbase.master.info.port</name><value>60010</value></property>--><!--时间同步允许的时间差--><property><name>hbase.master.maxclockskew</name><value>180000</value></property><!--hbase共享目录,持久化hbase数据--><property><name>hbase.rootdir</name><value>hdfs://hadoop01:9000/hbase</value></property><!--是否分布式运行,false即为单机--><property><name>hbase.cluster.distributed</name><value>true</value></property><!--zookeeper地址--><property><name>hbase.zookeeper.quorum</name><value>hadoop01,hadoop02,hadoop03</value></property><!--zookeeper配置信息快照的位置--><property><name>hbase.zookeeper.property.dataDir</name><value>/home/hbase/tmp/zookeeper</value></property><property><name>hbase.unsafe.stream.capability.enforce</name><value>false</value></property></configuration>
-
设置regionservers 文件
#该文件是配置hbase salves节点hadoop02hadoop03
-
将hadoop中的两个配置文件 core-site.xml和hdfs-site.xml文件复制到hbase下的配置文件夹中
cp /opt/hadoop-3.1.2/etc/hadoop/core-site.xml /opt/hbase-2.2.1/confcp /opt/hadoop-3.1.2/etc/hadoop/core-site.xml /opt/hbase-2.2.1/conf
-
将hadoop01机器上的/opt/hbase-2.2.1/conf/* 分发到hadoop02、hadoop03节点上
scp /opt/hbase-2.2.1/conf/* hadoop02:/opt/hbase-2.2.1/conf/scp /opt/hbase-2.2.1/conf/* hadoop03:/opt/hbase-2.2.1/conf/
-
启动与关闭hbase
start-hbase.shstop-hbase.sh
-
查看启动的hbase服务
# 在hadoop01节点上只有HMaster在启动,作为主节点,hadoop02和hadoop03作为slaves节点只有HRegionServer启动jps
-
进入hbase的shell
hbase shell
