The Apache HBase Shell is (J)Ruby‘s IRB with some HBase particular commands added. Anything you can do in IRB, you should be able to do in the HBase Shell.

To run the HBase shell, do as follows:

  1. $ ./bin/hbase shell

Type help and then <RETURN> to see a listing of shell commands and options. Browse at least the paragraphs at the end of the help output for the gist of how variables and command arguments are entered into the HBase shell; in particular note how table names, rows, and columns, etc., must be quoted.

See shell exercises for example basic shell operation.

Here is a nicely formatted listing of all shell commands by Rajeshbabu Chintaguntla.

14. Scripting with Ruby

For examples scripting Apache HBase, look in the HBase bin directory. Look at the files that end in *.rb. To run one of these files, do as follows:

  1. $ ./bin/hbase org.jruby.Main PATH_TO_SCRIPT

15. Running the Shell in Non-Interactive Mode

A new non-interactive mode has been added to the HBase Shell (HBASE-11658). Non-interactive mode captures the exit status (success or failure) of HBase Shell commands and passes that status back to the command interpreter. If you use the normal interactive mode, the HBase Shell will only ever return its own exit status, which will nearly always be 0 for success.

To invoke non-interactive mode, pass the -n or --non-interactive option to HBase Shell.

16. HBase Shell in OS Scripts

You can use the HBase shell from within operating system script interpreters like the Bash shell which is the default command interpreter for most Linux and UNIX distributions. The following guidelines use Bash syntax, but could be adjusted to work with C-style shells such as csh or tcsh, and could probably be modified to work with the Microsoft Windows script interpreter as well. Submissions are welcome.

Spawning HBase Shell commands in this way is slow, so keep that in mind when you are deciding when combining HBase operations with the operating system command line is appropriate.

Example 4. Passing Commands to the HBase Shell

You can pass commands to the HBase Shell in non-interactive mode (see hbase.shell.noninteractive) using the echo command and the | (pipe) operator. Be sure to escape characters in the HBase commands which would otherwise be interpreted by the shell. Some debug-level output has been truncated from the example below.

  1. $ echo "describe 'test1'" | ./hbase shell -n
  2. Version 0.98.3-hadoop2, rd5e65a9144e315bb0a964e7730871af32f5018d5, Sat May 31 19:56:09 PDT 2014
  3. describe 'test1'
  4. DESCRIPTION ENABLED
  5. 'test1', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NON true
  6. E', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0',
  7. VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIO
  8. NS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS =>
  9. 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false'
  10. , BLOCKCACHE => 'true'}
  11. 1 row(s) in 3.2410 seconds

To suppress all output, echo it to /dev/null:

  1. $ echo "describe 'test'" | ./hbase shell -n > /dev/null 2>&1

Example 5. Checking the Result of a Scripted Command

Since scripts are not designed to be run interactively, you need a way to check whether your command failed or succeeded. The HBase shell uses the standard convention of returning a value of 0 for successful commands, and some non-zero value for failed commands. Bash stores a command’s return value in a special environment variable called $?. Because that variable is overwritten each time the shell runs any command, you should store the result in a different, script-defined variable.

This is a naive script that shows one way to store the return value and make a decision based upon it.

  1. #!/bin/bash
  2. echo "describe 'test'" | ./hbase shell -n > /dev/null 2>&1
  3. status=$?
  4. echo "The status was " $status
  5. if ($status == 0); then
  6. echo "The command succeeded"
  7. else
  8. echo "The command may have failed."
  9. fi
  10. return $status

16.1. Checking for Success or Failure In Scripts

Getting an exit code of 0 means that the command you scripted definitely succeeded. However, getting a non-zero exit code does not necessarily mean the command failed. The command could have succeeded, but the client lost connectivity, or some other event obscured its success. This is because RPC commands are stateless. The only way to be sure of the status of an operation is to check. For instance, if your script creates a table, but returns a non-zero exit value, you should check whether the table was actually created before trying again to create it.

17. Read HBase Shell Commands from a Command File

You can enter HBase Shell commands into a text file, one command per line, and pass that file to the HBase Shell.

Example Command File

  1. create 'test', 'cf'
  2. list 'test'
  3. put 'test', 'row1', 'cf:a', 'value1'
  4. put 'test', 'row2', 'cf:b', 'value2'
  5. put 'test', 'row3', 'cf:c', 'value3'
  6. put 'test', 'row4', 'cf:d', 'value4'
  7. scan 'test'
  8. get 'test', 'row1'
  9. disable 'test'
  10. enable 'test'

Example 6. Directing HBase Shell to Execute the Commands

Pass the path to the command file as the only argument to the hbase shell command. Each command is executed and its output is shown. If you do not include the exit command in your script, you are returned to the HBase shell prompt. There is no way to programmatically check each individual command for success or failure. Also, though you see the output for each command, the commands themselves are not echoed to the screen so it can be difficult to line up the command with its output.

  1. $ ./hbase shell ./sample_commands.txt
  2. 0 row(s) in 3.4170 seconds
  3. TABLE
  4. test
  5. 1 row(s) in 0.0590 seconds
  6. 0 row(s) in 0.1540 seconds
  7. 0 row(s) in 0.0080 seconds
  8. 0 row(s) in 0.0060 seconds
  9. 0 row(s) in 0.0060 seconds
  10. ROW COLUMN+CELL
  11. row1 column=cf:a, timestamp=1407130286968, value=value1
  12. row2 column=cf:b, timestamp=1407130286997, value=value2
  13. row3 column=cf:c, timestamp=1407130287007, value=value3
  14. row4 column=cf:d, timestamp=1407130287015, value=value4
  15. 4 row(s) in 0.0420 seconds
  16. COLUMN CELL
  17. cf:a timestamp=1407130286968, value=value1
  18. 1 row(s) in 0.0110 seconds
  19. 0 row(s) in 1.5630 seconds
  20. 0 row(s) in 0.4360 seconds

18. Passing VM Options to the Shell

You can pass VM options to the HBase Shell using the HBASE_SHELL_OPTS environment variable. You can set this in your environment, for instance by editing ~/.bashrc, or set it as part of the command to launch HBase Shell. The following example sets several garbage-collection-related variables, just for the lifetime of the VM running the HBase Shell. The command should be run all on a single line, but is broken by the \ character, for readability.

  1. $ HBASE_SHELL_OPTS="-verbose:gc -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps \
  2. -XX:+PrintGCDetails -Xloggc:$HBASE_HOME/logs/gc-hbase.log" ./bin/hbase shell

19. Overriding configuration starting the HBase Shell

As of hbase-2.0.5/hbase-2.1.3/hbase-2.2.0/hbase-1.4.10/hbase-1.5.0, you can pass or override hbase configuration as specified in hbase-*.xml by passing your key/values prefixed with -D on the command-line as follows:

  1. $ ./bin/hbase shell -Dhbase.zookeeper.quorum=ZK0.remote.cluster.example.org,ZK1.remote.cluster.example.org,ZK2.remote.cluster.example.org -Draining=false
  2. ...
  3. hbase(main):001:0> @shell.hbase.configuration.get("hbase.zookeeper.quorum")
  4. => "ZK0.remote.cluster.example.org,ZK1.remote.cluster.example.org,ZK2.remote.cluster.example.org"
  5. hbase(main):002:0> @shell.hbase.configuration.get("raining")
  6. => "false"

20. Shell Tricks

20.1. Table variables

HBase 0.95 adds shell commands that provides jruby-style object-oriented references for tables. Previously all of the shell commands that act upon a table have a procedural style that always took the name of the table as an argument. HBase 0.95 introduces the ability to assign a table to a jruby variable. The table reference can be used to perform data read write operations such as puts, scans, and gets well as admin functionality such as disabling, dropping, describing tables.

For example, previously you would always specify a table name:

  1. hbase(main):000:0> create 't', 'f'
  2. 0 row(s) in 1.0970 seconds
  3. hbase(main):001:0> put 't', 'rold', 'f', 'v'
  4. 0 row(s) in 0.0080 seconds
  5. hbase(main):002:0> scan 't'
  6. ROW COLUMN+CELL
  7. rold column=f:, timestamp=1378473207660, value=v
  8. 1 row(s) in 0.0130 seconds
  9. hbase(main):003:0> describe 't'
  10. DESCRIPTION ENABLED
  11. 't', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_ true
  12. SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2
  13. 147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false
  14. ', BLOCKCACHE => 'true'}
  15. 1 row(s) in 1.4430 seconds
  16. hbase(main):004:0> disable 't'
  17. 0 row(s) in 14.8700 seconds
  18. hbase(main):005:0> drop 't'
  19. 0 row(s) in 23.1670 seconds
  20. hbase(main):006:0>

Now you can assign the table to a variable and use the results in jruby shell code.

  1. hbase(main):007 > t = create 't', 'f'
  2. 0 row(s) in 1.0970 seconds
  3. => Hbase::Table - t
  4. hbase(main):008 > t.put 'r', 'f', 'v'
  5. 0 row(s) in 0.0640 seconds
  6. hbase(main):009 > t.scan
  7. ROW COLUMN+CELL
  8. r column=f:, timestamp=1331865816290, value=v
  9. 1 row(s) in 0.0110 seconds
  10. hbase(main):010:0> t.describe
  11. DESCRIPTION ENABLED
  12. 't', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_ true
  13. SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2
  14. 147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false
  15. ', BLOCKCACHE => 'true'}
  16. 1 row(s) in 0.0210 seconds
  17. hbase(main):038:0> t.disable
  18. 0 row(s) in 6.2350 seconds
  19. hbase(main):039:0> t.drop
  20. 0 row(s) in 0.2340 seconds

If the table has already been created, you can assign a Table to a variable by using the get_table method:

  1. hbase(main):011 > create 't','f'
  2. 0 row(s) in 1.2500 seconds
  3. => Hbase::Table - t
  4. hbase(main):012:0> tab = get_table 't'
  5. 0 row(s) in 0.0010 seconds
  6. => Hbase::Table - t
  7. hbase(main):013:0> tab.put 'r1' ,'f', 'v'
  8. 0 row(s) in 0.0100 seconds
  9. hbase(main):014:0> tab.scan
  10. ROW COLUMN+CELL
  11. r1 column=f:, timestamp=1378473876949, value=v
  12. 1 row(s) in 0.0240 seconds
  13. hbase(main):015:0>

The list functionality has also been extended so that it returns a list of table names as strings. You can then use jruby to script table operations based on these names. The list_snapshots command also acts similarly.

  1. hbase(main):016 > tables = list('t.*')
  2. TABLE
  3. t
  4. 1 row(s) in 0.1040 seconds
  5. => #<#<Class:0x7677ce29>:0x21d377a4>
  6. hbase(main):017:0> tables.map { |t| disable t ; drop t}
  7. 0 row(s) in 2.2510 seconds
  8. => [nil]
  9. hbase(main):018:0>

20.2. irbrc

Create an .irbrc file for yourself in your home directory. Add customizations. A useful one is command history so commands are save across Shell invocations:

  1. $ more .irbrc
  2. require 'irb/ext/save-history'
  3. IRB.conf[:SAVE_HISTORY] = 100
  4. IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb-save-history"

If you’d like to avoid printing the result of evaluting each expression to stderr, for example the array of tables returned from the “list” command:

  1. $ echo "IRB.conf[:ECHO] = false" >>~/.irbrc

See the ruby documentation of .irbrc to learn about other possible configurations.

20.3. LOG data to timestamp

To convert the date ‘08/08/16 20:56:29’ from an hbase log into a timestamp, do:

  1. hbase(main):021:0> import java.text.SimpleDateFormat
  2. hbase(main):022:0> import java.text.ParsePosition
  3. hbase(main):023:0> SimpleDateFormat.new("yy/MM/dd HH:mm:ss").parse("08/08/16 20:56:29", ParsePosition.new(0)).getTime() => 1218920189000

To go the other direction:

  1. hbase(main):021:0> import java.util.Date
  2. hbase(main):022:0> Date.new(1218920189000).toString() => "Sat Aug 16 20:56:29 UTC 2008"

To output in a format that is exactly like that of the HBase log format will take a little messing with SimpleDateFormat.

20.4. Query Shell Configuration

  1. hbase(main):001:0> @shell.hbase.configuration.get("hbase.rpc.timeout")
  2. => "60000"

To set a config in the shell:

  1. hbase(main):005:0> @shell.hbase.configuration.setInt("hbase.rpc.timeout", 61010)
  2. hbase(main):006:0> @shell.hbase.configuration.get("hbase.rpc.timeout")
  3. => "61010"

20.5. Pre-splitting tables with the HBase Shell

You can use a variety of options to pre-split tables when creating them via the HBase Shell create command.

The simplest approach is to specify an array of split points when creating the table. Note that when specifying string literals as split points, these will create split points based on the underlying byte representation of the string. So when specifying a split point of ‘10’, we are actually specifying the byte split point ‘\x31\30’.

The split points will define n+1 regions where n is the number of split points. The lowest region will contain all keys from the lowest possible key up to but not including the first split point key. The next region will contain keys from the first split point up to, but not including the next split point key. This will continue for all split points up to the last. The last region will be defined from the last split point up to the maximum possible key.

  1. hbase>create 't1','f',SPLITS => ['10','20','30']

In the above example, the table ‘t1’ will be created with column family ‘f’, pre-split to four regions. Note the first region will contain all keys from ‘\x00’ up to ‘\x30’ (as ‘\x31’ is the ASCII code for ‘1’).

You can pass the split points in a file using following variation. In this example, the splits are read from a file corresponding to the local path on the local filesystem. Each line in the file specifies a split point key.

  1. hbase>create 't14','f',SPLITS_FILE=>'splits.txt'

The other options are to automatically compute splits based on a desired number of regions and a splitting algorithm. HBase supplies algorithms for splitting the key range based on uniform splits or based on hexadecimal keys, but you can provide your own splitting algorithm to subdivide the key range.

  1. # create table with four regions based on random bytes keys
  2. hbase>create 't2','f1', { NUMREGIONS => 4 , SPLITALGO => 'UniformSplit' }
  3. # create table with five regions based on hex keys
  4. hbase>create 't3','f1', { NUMREGIONS => 5, SPLITALGO => 'HexStringSplit' }

As the HBase Shell is effectively a Ruby environment, you can use simple Ruby scripts to compute splits algorithmically.

  1. # generate splits for long (Ruby fixnum) key range from start to end key
  2. hbase(main):070:0> def gen_splits(start_key,end_key,num_regions)
  3. hbase(main):071:1> results=[]
  4. hbase(main):072:1> range=end_key-start_key
  5. hbase(main):073:1> incr=(range/num_regions).floor
  6. hbase(main):074:1> for i in 1 .. num_regions-1
  7. hbase(main):075:2> results.push([i*incr+start_key].pack("N"))
  8. hbase(main):076:2> end
  9. hbase(main):077:1> return results
  10. hbase(main):078:1> end
  11. hbase(main):079:0>
  12. hbase(main):080:0> splits=gen_splits(1,2000000,10)
  13. => ["\000\003\r@", "\000\006\032\177", "\000\t'\276", "\000\f4\375", "\000\017B<", "\000\022O{", "\000\025\\\272", "\000\030i\371", "\000\ew8"]
  14. hbase(main):081:0> create 'test_splits','f',SPLITS=>splits
  15. 0 row(s) in 0.2670 seconds
  16. => Hbase::Table - test_splits

Note that the HBase Shell command truncate effectively drops and recreates the table with default options which will discard any pre-splitting. If you need to truncate a pre-split table, you must drop and recreate the table explicitly to re-specify custom split options.

20.6. Debug

20.6.1. Shell debug switch

You can set a debug switch in the shell to see more output — e.g. more of the stack trace on exception — when you run a command:

  1. hbase> debug <RETURN>

20.6.2. DEBUG log level

To enable DEBUG level logging in the shell, launch it with the -d option.

  1. $ ./bin/hbase shell -d

20.7. Commands

20.7.1. count

Count command returns the number of rows in a table. It’s quite fast when configured with the right CACHE

  1. hbase> count '<tablename>', CACHE => 1000

The above count fetches 1000 rows at a time. Set CACHE lower if your rows are big. Default is to fetch one row at a time.