基本shell命令

要符合jRuby的语法

进入hbase命令行

  1. ./hbase shell

帮助命令

  1. help

查看是谁

  1. whoami

查看状态(多少个server,master)

  1. status
  1. status master

查看版本

  1. version

创建表

create帮助命令

  1. create

指定版本

创建表,2个列族,第二个指定版本

  1. create 't_user_info', {NAME => 'base_info'}, {NAME => 'extra_info',VERSIONS=>2}

还可以指定region的个数和split的切分,在create帮助命令中查看

还可以指定最小版本

  1. create 'mytable', {NAME => 'colfam1',VERSIONS=>5,MIN_VERSIONS=>'1'}

如果当前存储的所有时间版本都早于TTL,那么至少MIN_VERSION个版本会被保留下来.这样确保你做查询的时候所有数据早于TTL时还有结果返回

避免热点,建表方法

  1. create 'tb_splits',{NAME=>'cf',VERSION=>3},{SPLITS=>['a','b','c']}
  2. --直接创建多个regiontable,每个regionstartkeyendkey由参数依次指定,第一个region没有开始startkey,最后一个没有endkey
  3. --rowkey参数的选择必须有意义,才能实现均匀分布,否则分多个region无意义
  4. --多个region可以分布在多个regionserver上,添加数据的时候数据可以均匀分布在region中,实现负载均衡

查看表的描述

  1. describe 't_user_info'

(enabled表示可用,NAME列族的信息,DATA_BLOCK_ENCODING块的编码,BLOOMFILTER布隆过滤器,REPLICATION_SCOPE对应的副本数,COMPRESSION是否压缩,MIN_VERSIONS最小的版本,TTL生命周期,KEEP_DELETED_CELLS删除数据是否保留,BLOCKSIZEblock块的大小,IN_MEMORY是否在内存,BLOCKCACHE块的缓存)

插入信息

帮助命令

  1. put

插入数据,row key是liu-20-001,base_info是列.后面name是表示符,值为liuyifei

  1. put 't_user_info', 'liu-20-001','base_info:name','liuyifei'
  1. put 't_user_info','liu-20-001','extra_info:boyfriends','jdxia'
  2. put 't_user_info','liu-20-001','extra_info:boyfriends','jdxia1'

当往这个列族中的某个列插入数据时,这个列族的某个列才会在里面

自增

计数器可以-1也可以是0

  1. #递增命中 步长默认为1
  2. hbase(main):002:0> incr 'counters', '20150101', 'daily:hits', 1
  3. COUNTER VALUE = 1
  4. 0 row(s) in 0.3320 seconds
  5. #使用了put去修改计数器 会导致后面的错误 原因是'1'会转换成Bytes.toBytes()
  6. hbase(main):020:0> put 'counters' ,'20150102','daily:hits','1'
  7. 0 row(s) in 0.0520 seconds
  8. hbase(main):021:0> incr 'counters', '20150102', 'daily:hits', 1
  9. ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: Field is not a long, it's 1 bytes wide
  10. at org.apache.hadoop.hbase.regionserver.HRegion.getLongValue(HRegion.java:7647)
  11. at org.apache.hadoop.hbase.regionserver.HRegion.applyIncrementsToColumnFamily(HRegion.java:7601)
  12. at org.apache.hadoop.hbase.regionserver.HRegion.doIncrement(HRegion.java:7480)
  13. at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:7440)
  14. at org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:551)
  15. at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2227)
  16. at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33646)
  17. at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2178)
  18. at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
  19. at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
  20. at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
  21. at java.lang.Thread.run(Thread.java:745)

获取计数器的值

  1. 通过命令行获取计数器的值
  2. get_counter 'wc', 'apple01', 'cf:hits'

查询

显示hbase中的表

  1. list

查看插入的数据,全表扫描

  1. scan 't_user_info'

根据row key的信息,查询

  1. get 't_user_info' , 'liu-20-001'

根据row key,和列族的标示符查找

  1. get 't_user_info' , 'liu-20-001', 'base_info:name'

根据版本查找

  1. get 't_user_info', 'liu-20-001',{COLUMN=>'extra_info:boyfriends',VERSIONS=>4}

filter

创建表

  1. create 'test1', 'lf', 'sf'
  2. lf: column family of LONG values (binary value)
  3. -- sf: column family of STRING values

导入数据

  1. put 'test1', 'user1|ts1', 'sf:c1', 'sku1'
  2. put 'test1', 'user1|ts2', 'sf:c1', 'sku188'
  3. put 'test1', 'user1|ts3', 'sf:s1', 'sku123'
  4. put 'test1', 'user2|ts4', 'sf:c1', 'sku2'
  5. put 'test1', 'user2|ts5', 'sf:c2', 'sku288'
  6. put 'test1', 'user2|ts6', 'sf:s1', 'sku222'

一个用户(userX),在什么时间(tsX),作为rowkey

对什么产品(value:skuXXX),做了什么操作作为列名,比如,c1: click from homepage; c2: click from ad; s1: search from homepage; b1: buy

查询案例

谁的值=sku188

  1. scan 'test1', FILTER=>"ValueFilter(=,'binary:sku188')"
  2. ROW COLUMN+CELL
  3. user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188

谁的值包含88

scan 'test1', FILTER=>"ValueFilter(=,'substring:88')"

ROW                          COLUMN+CELL    
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188
 user2|ts5                   column=sf:c2, timestamp=1409122355030, value=sku288

通过广告点击进来的(column为c2)值包含88的用户

scan 'test1', FILTER=>"ColumnPrefixFilter('c2') AND ValueFilter(=,'substring:88')"

ROW                          COLUMN+CELL
 user2|ts5                   column=sf:c2, timestamp=1409122355030, value=sku288

通过搜索进来的(column为s)值包含123或者222的用户

scan 'test1', FILTER=>"ColumnPrefixFilter('s') AND ( ValueFilter(=,'substring:123') OR ValueFilter(=,'substring:222') )"

ROW                          COLUMN+CELL
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123
 user2|ts6                   column=sf:s1, timestamp=1409122355970, value=sku222

rowkey为user1开头的

scan 'test1', FILTER => "PrefixFilter ('user1')"

ROW                          COLUMN+CELL
 user1|ts1                   column=sf:c1, timestamp=1409122354868, value=sku1
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123

FirstKeyOnlyFilter: 一个rowkey可以有多个version,同一个rowkey的同一个column也会有多个的值, 只拿出key中的第一个column的第一个version
KeyOnlyFilter: 只要key,不要value

scan 'test1', FILTER=>"FirstKeyOnlyFilter() AND ValueFilter(=,'binary:sku188') AND KeyOnlyFilter()"

ROW                          COLUMN+CELL
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=

从user1|ts2开始,找到所有的rowkey以user1开头的

scan 'test1', {STARTROW=>'user1|ts2', FILTER => "PrefixFilter ('user1')"}

ROW                          COLUMN+CELL
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123

从user1|ts2开始,找到所有的到rowkey以user2开头

scan 'test1', {STARTROW=>'user1|ts2', STOPROW=>'user2'}

ROW                          COLUMN+CELL
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123

查询rowkey里面包含ts3的

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.RowFilter
scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('ts3'))}
ROW                          COLUMN+CELL
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123

查询rowkey里面包含ts的

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.RowFilter
scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('ts'))}

ROW                          COLUMN+CELL
 user1|ts1                   column=sf:c1, timestamp=1409122354868, value=sku1
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123
 user2|ts4                   column=sf:c1, timestamp=1409122354998, value=sku2
 user2|ts5                   column=sf:c2, timestamp=1409122355030, value=sku288
 user2|ts6                   column=sf:s1, timestamp=1409122355970, value=sku222

加入一条测试数据

put 'test1', 'user2|err', 'sf:s1', 'sku999'

查询rowkey里面以user开头的,新加入的测试数据并不符合正则表达式的规则,故查询不出来

import org.apache.hadoop.hbase.filter.RegexStringComparator
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.RowFilter
scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),RegexStringComparator.new('^user\d+\|ts\d+$'))}

ROW                          COLUMN+CELL
 user1|ts1                   column=sf:c1, timestamp=1409122354868, value=sku1
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123
 user2|ts4                   column=sf:c1, timestamp=1409122354998, value=sku2
 user2|ts5                   column=sf:c2, timestamp=1409122355030, value=sku288
 user2|ts6                   column=sf:s1, timestamp=1409122355970, value=sku222

加入测试数据

put 'test1', 'user1|ts9', 'sf:b1', 'sku1'

b1开头的列中并且值为sku1的

scan 'test1', FILTER=>"ColumnPrefixFilter('b1') AND ValueFilter(=,'binary:sku1')"

ROW                          COLUMN+CELL                                                                       
 user1|ts9                   column=sf:b1, timestamp=1409124908668, value=sku1

SingleColumnValueFilter的使用,b1开头的列中并且值为sku1的

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
scan 'test1', {COLUMNS => 'sf:b1', FILTER => SingleColumnValueFilter.new(Bytes.toBytes('sf'), Bytes.toBytes('b1'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('sku1'))}

ROW                          COLUMN+CELL
 user1|ts9                   column=sf:b1, timestamp=1409124908668, value=sku1

zk的使用

hbase zkcli

ls /
[hbase, zookeeper]
[zk: hadoop000:2181(CONNECTED) 1] ls /hbase
[meta-region-server, backup-masters, table, draining, region-in-transition, running, table-lock, master, namespace, hbaseid, online-snapshot, replication, splitWAL, recovering-regions, rs]
[zk: hadoop000:2181(CONNECTED) 2] ls /hbase/table
[member, test1, hbase:meta, hbase:namespace]
[zk: hadoop000:2181(CONNECTED) 3] ls /hbase/table/test1
[]
[zk: hadoop000:2181(CONNECTED) 4] get /hbase/table/test1
?master:60000}l$??lPBUF
cZxid = 0x107
ctime = Wed Aug 27 14:52:21 HKT 2014
mZxid = 0x10b
mtime = Wed Aug 27 14:52:22 HKT 2014
pZxid = 0x107
cversion = 0
dataVersion = 2
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 31
numChildren = 0

显示当前的所有表

list

统计指定表的记录数

count 'hbase_book'

exist检查表是否存在

exist  'hbase_book'

is_enable/is_disabled

检查表是否启用或禁用

is_enable 'hbase_book'
is_disable 'hbase_book'

删除

###disable 'user'(新版本不用)
删除一个列族:

alter 'user', NAME => 'f1', METHOD => 'delete' 或 alter 'user', 'delete' => 'f1'

添加列族f1同时删除列族f2

alter 'user', {NAME => 'f1'}, {NAME => 'f2', METHOD => 'delete'}

将user表的f1列族版本号改为5

alter 'people', NAME => 'info', VERSIONS => 5

启用表

enable 'user'

删除t_user_info表,liu-20-001这个row key中,base_info这个列族中标示符为name的值

delete 't_user_info', 'liu-20-001','base_info:name'

删除表,要先disable,然后才能删除

disable 't_user_info'
drop 't_user_info'

修改表结构

首先停用user表(新版本不用)

disable 'user'

添加两个列族f1和f2

alter 'people', NAME => 'f1',VERSIONS => 2
alter 'user', NAME => 'f2'

启用表

enable 'user'

删除表

disable 'user'
drop 'user'

练习

创建user表,包含info、data两个列族

create 'user', 'info', 'data'
create 'user', {NAME => 'info', VERSIONS => '3'}

向user表中插入信息,row key为rk0001,列族info中添加name列标示符,值为zhangsan

put 'user', 'rk0001', 'info:name', 'zhangsan'

向user表中插入信息,row key为rk0001,列族info中添加gender列标示符,值为female

put 'user', 'rk0001', 'info:gender', 'female'

向user表中插入信息,row key为rk0001,列族info中添加age列标示符,值为20

put 'user', 'rk0001', 'info:age', 20

向user表中插入信息,row key为rk0001,列族data中添加pic列标示符,值为picture

put 'user', 'rk0001', 'data:pic', 'picture'

获取user表中row key为rk0001的所有信息

get 'user', 'rk0001'

获取user表中row key为rk0001,info列族的所有信息

get 'user', 'rk0001', 'info'

获取user表中row key为rk0001,info列族的name、age列标示符的信息

get 'user', 'rk0001', 'info:name', 'info:age'

获取user表中row key为rk0001,info、data列族的信息

get 'user', 'rk0001', 'info', 'data'
get 'user', 'rk0001', {COLUMN => ['info', 'data']}
get 'user', 'rk0001', {COLUMN => ['info:name', 'data:pic']}

获取user表中row key为rk0001,列族为info,版本号最新5个的信息

get 'user', 'rk0001', {COLUMN => 'info', VERSIONS => 2}
get 'user', 'rk0001', {COLUMN => 'info:name', VERSIONS => 5}
get 'user', 'rk0001', {COLUMN => 'info:name', VERSIONS => 5, TIMERANGE => [1392368783980, 1392380169184]}

获取user表中row key为rk0001,cell的值为zhangsan的信息

get 'user', 'rk0001', FILTER=>"ValueFilter(=,'binary:zhangsan')"

scan 'user', FILTER=>"ValueFilter(=,'binary:zhangsan')"

获取user表中row key为rk0001,列标示符中含有a的信息

get 'people', 'rk0001', {FILTER => "(QualifierFilter(=,'substring:a'))"}

put 'user', 'rk0002', 'info:name', 'fanbingbing'
put 'user', 'rk0002', 'info:gender', 'female'
put 'user', 'rk0002', 'info:nationality', '中国'
get 'user', 'rk0002', {FILTER => "ValueFilter(=, 'binary:中国')"}

查询user表中的所有信息

scan 'user'

查询user表中列族为info的信息

scan 'user', {COLUMNS => 'info'}
scan 'user', {COLUMNS => 'info', RAW => true, VERSIONS => 5}
scan 'persion', {COLUMNS => 'info', RAW => true, VERSIONS => 3}

查询user表中列族为info和data的信息

scan 'user', {COLUMNS => ['info', 'data']}
scan 'user', {COLUMNS => ['info:name', 'data:pic']}

查询user表中列族为info、列标示符为name的信息

scan 'user', {COLUMNS => 'info:name'}

查询user表中列族为info、列标示符为name的信息,并且版本最新的5个

scan 'user', {COLUMNS => 'info:name', VERSIONS => 5}

查询user表中列族为info和data且列标示符中含有a字符的信息

scan 'user', {COLUMNS => ['info', 'data'], FILTER => "(QualifierFilter(=,'substring:a'))"}

查询user表中列族为info,rk范围是[rk0001, rk0003)的数据

scan 'people', {COLUMNS => 'info', STARTROW => 'rk0001', ENDROW => 'rk0003'}

查询user表中row key以rk字符开头的

scan 'user',{FILTER=>"PrefixFilter('rk')"}

查询user表中指定范围的数据

scan 'user', {TIMERANGE => [1392368783980, 1392380169184]}

删除数据
删除user表row key为rk0001,列标示符为info:name的数据

delete 'people', 'rk0001', 'info:name'

删除user表row key为rk0001,列标示符为info:name,timestamp为1392383705316的数据

delete 'user', 'rk0001', 'info:name', 1392383705316

清空user表中的数据

truncate 'people'

修改表结构
首先停用user表(新版本不用)

disable 'user'

添加两个列族f1和f2

alter 'people', NAME => 'f1'
alter 'user', NAME => 'f2'

启用表

enable 'user'

disable ‘user’(新版本不用)

删除一个列族:

alter 'user', NAME => 'f1', METHOD => 'delete' 或 alter 'user', 'delete' => 'f1'

添加列族f1同时删除列族f2

alter 'user', {NAME => 'f1'}, {NAME => 'f2', METHOD => 'delete'}

将user表的f1列族版本号改为5

alter 'people', NAME => 'info', VERSIONS => 5

启用表

enable 'user'

删除表

disable 'user'
drop 'user'


get 'person', 'rk0001', {FILTER => "ValueFilter(=, 'binary:中国')"}
get 'person', 'rk0001', {FILTER => "(QualifierFilter(=,'substring:a'))"}
scan 'person', {COLUMNS => 'info:name'}
scan 'person', {COLUMNS => ['info', 'data'], FILTER => "(QualifierFilter(=,'substring:a'))"}
scan 'person', {COLUMNS => 'info', STARTROW => 'rk0001', ENDROW => 'rk0003'}

scan 'person', {COLUMNS => 'info', STARTROW => '20140201', ENDROW => '20140301'}
scan 'person', {COLUMNS => 'info:name', TIMERANGE => [1395978233636, 1395987769587]}
delete 'person', 'rk0001', 'info:name'

alter 'person', NAME => 'ffff'
alter 'person', NAME => 'info', VERSIONS => 10


get 'user', 'rk0002', {COLUMN => ['info:name', 'data:pic']}