77. Overview

In-memory Compaction (A.K.A Accordion) is a new feature in hbase-2.0.0. It was first introduced on the Apache HBase Blog at Accordion: HBase Breathes with In-Memory Compaction. Quoting the blog:

Accordion reapplies the LSM principal [Log-Structured-Merge Tree, the design pattern upon which HBase is based] to MemStore, in order to eliminate redundancies and other overhead while the data is still in RAM. Doing so decreases the frequency of flushes to HDFS, thereby reducing the write amplification and the overall disk footprint. With less flushes, the write operations are stalled less frequently as the MemStore overflows, therefore the write performance is improved. Less data on disk also implies less pressure on the block cache, higher hit rates, and eventually better read response times. Finally, having less disk writes also means having less compaction happening in the background, i.e., less cycles are stolen from productive (read and write) work. All in all, the effect of in-memory compaction can be envisioned as a catalyst that enables the system move faster as a whole.

A developer view is available at Accordion: Developer View of In-Memory Compaction.

In-memory compaction works best when high data churn; overwrites or over-versions can be eliminated while the data is still in memory. If the writes are all uniques, it may drag write throughput (In-memory compaction costs CPU). We suggest you test and compare before deploying to production.

In this section we describe how to enable Accordion and the available configurations.

78. Enabling

To enable in-memory compactions, set the IN_MEMORY_COMPACTION attribute on per column family where you want the behavior. The IN_MEMORY_COMPACTION attribute can have one of four values.

  • NONE: No in-memory compaction.

  • BASIC: Basic policy enables flushing and keeps a pipeline of flushes until we trip the pipeline maximum threshold and then we flush to disk. No in-memory compaction but can help throughput as data is moved from the profligate, native ConcurrentSkipListMap data-type to more compact (and efficient) data types.

  • EAGER: This is BASIC policy plus in-memory compaction of flushes (much like the on-disk compactions done to hfiles); on compaction we apply on-disk rules eliminating versions, duplicates, ttl’d cells, etc.

  • ADAPTIVE: Adaptive compaction adapts to the workload. It applies either index compaction or data compaction based on the ratio of duplicate cells in the data. Experimental.

To enable BASIC on the info column family in the table radish, disable the table and add the attribute to the info column family, and then reenable:

  1. hbase(main):002:0> disable 'radish'
  2. Took 0.5570 seconds
  3. hbase(main):003:0> alter 'radish', {NAME => 'info', IN_MEMORY_COMPACTION => 'BASIC'}
  4. Updating all regions with the new schema...
  5. All regions updated.
  6. Done.
  7. Took 1.2413 seconds
  8. hbase(main):004:0> describe 'radish'
  9. Table radish is DISABLED
  10. radish
  11. COLUMN FAMILIES DESCRIPTION
  12. {NAME => 'info', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {
  13. 'IN_MEMORY_COMPACTION' => 'BASIC'}}
  14. 1 row(s)
  15. Took 0.0239 seconds
  16. hbase(main):005:0> enable 'radish'
  17. Took 0.7537 seconds

Note how the INMEMORY_COMPACTION attribute shows as part of the _METADATA map.

There is also a global configuration, hbase.hregion.compacting.memstore.type which you can set in your hbase-site.xml file. Use it to set the default on creation of a new table (On creation of a column family Store, we look first to the column family configuration looking for the IN_MEMORY_COMPACTION setting, and if none, we then consult the hbase.hregion.compacting.memstore.type value using its content; default is BASIC).

By default, new hbase system tables will have BASIC in-memory compaction set. To specify otherwise, on new table-creation, set hbase.hregion.compacting.memstore.type to NONE (Note, setting this value post-creation of system tables will not have a retroactive effect; you will have to alter your tables to set the in-memory attribute to NONE).

When an in-memory flush happens is calculated by dividing the configured region flush size (Set in the table descriptor or read from hbase.hregion.memstore.flush.size) by the number of column families and then multiplying by hbase.memstore.inmemoryflush.threshold.factor. Default is 0.014.

The number of flushes carried by the pipeline is monitored so as to fit within the bounds of memstore sizing but you can also set a maximum on the number of flushes total by setting hbase.hregion.compacting.pipeline.segments.limit. Default is 2.

When a column family Store is created, it says what memstore type is in effect. As of this writing there is the old-school DefaultMemStore which fills a ConcurrentSkipListMap and then flushes to disk or the new CompactingMemStore that is the implementation that provides this new in-memory compactions facility. Here is a log-line from a RegionServer that shows a column family Store named family configured to use a CompactingMemStore:

  1. Note how the IN_MEMORY_COMPACTION attribute shows as part of the _METADATA_ map.
  2. 2018-03-30 11:02:24,466 INFO [Time-limited test] regionserver.HStore(325): Store=family, memstore type=CompactingMemStore, storagePolicy=HOT, verifyBulkLoads=false, parallelPutCountPrintThreshold=10

Enable TRACE-level logging on the CompactingMemStore class (org.apache.hadoop.hbase.regionserver.CompactingMemStore) to see detail on its operation.