This chapter contains information and guidelines for building and releasing HBase code and documentation. Being familiar with these guidelines will help the HBase committers to use your contributions more easily.

164. Getting Involved

Apache HBase gets better only when people contribute! If you are looking to contribute to Apache HBase, look for issues in JIRA tagged with the label ‘beginner’%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)). These are issues HBase contributors have deemed worthy but not of immediate priority and a good way to ramp on HBase internals. See What label is used for issues that are good on ramps for new contributors? from the dev mailing list for background.

Before you get started submitting code to HBase, please refer to developing.

As Apache HBase is an Apache Software Foundation project, see asf for more information about how the ASF functions.

164.1. Mailing Lists

Sign up for the dev-list and the user-list. See the mailing lists page. Posing questions - and helping to answer other people’s questions - is encouraged! There are varying levels of experience on both lists so patience and politeness are encouraged (and please stay on topic.)

164.2. Slack

The Apache HBase project has its own link: Slack Channel for real-time questions and discussion. Mail dev@hbase.apache.org to request an invite.

164.3. Internet Relay Chat (IRC)

(NOTE: Our IRC channel seems to have been deprecated in favor of the above Slack channel)

For real-time questions and discussions, use the #hbase IRC channel on the FreeNode IRC network. FreeNode offers a web-based client, but most people prefer a native client, and several clients are available for each operating system.

164.4. Jira

Check for existing issues in Jira. If it’s either a new feature request, enhancement, or a bug, file a ticket.

We track multiple types of work in JIRA:

  • Bug: Something is broken in HBase itself.

  • Test: A test is needed, or a test is broken.

  • New feature: You have an idea for new functionality. It’s often best to bring these up on the mailing lists first, and then write up a design specification that you add to the feature request JIRA.

  • Improvement: A feature exists, but could be tweaked or augmented. It’s often best to bring these up on the mailing lists first and have a discussion, then summarize or link to the discussion if others seem interested in the improvement.

  • Wish: This is like a new feature, but for something you may not have the background to flesh out yourself.

Bugs and tests have the highest priority and should be actionable.

164.4.1. Guidelines for reporting effective issues

  • Search for duplicates: Your issue may have already been reported. Have a look, realizing that someone else might have worded the summary differently.
    Also search the mailing lists, which may have information about your problem and how to work around it. Don’t file an issue for something that has already been discussed and resolved on a mailing list, unless you strongly disagree with the resolution and are willing to help take the issue forward.

    • Discuss in public: Use the mailing lists to discuss what you’ve discovered and see if there is something you’ve missed. Avoid using back channels, so that you benefit from the experience and expertise of the project as a whole.

    • Don’t file on behalf of others: You might not have all the context, and you don’t have as much motivation to see it through as the person who is actually experiencing the bug. It’s more helpful in the long term to encourage others to file their own issues. Point them to this material and offer to help out the first time or two.

    • Write a good summary: A good summary includes information about the problem, the impact on the user or developer, and the area of the code.

      • Good: Address new license dependencies from hadoop3-alpha4

      • Room for improvement: Canary is broken
        If you write a bad title, someone else will rewrite it for you. This is time they could have spent working on the issue instead.

    • Give context in the description: It can be good to think of this in multiple parts:

      • What happens or doesn’t happen?

      • How does it impact you?

      • How can someone else reproduce it?

      • What would “fixed” look like?
        You don’t need to know the answers for all of these, but give as much information as you can. If you can provide technical information, such as a Git commit SHA that you think might have caused the issue or a build failure on builds.apache.org where you think the issue first showed up, share that info.

    • Fill in all relevant fields: These fields help us filter, categorize, and find things.

    • One bug, one issue, one patch: To help with back-porting, don’t split issues or fixes among multiple bugs.

    • Add value if you can: Filing issues is great, even if you don’t know how to fix them. But providing as much information as possible, being willing to triage and answer questions, and being willing to test potential fixes is even better! We want to fix your issue as quickly as you want it to be fixed.

    • Don’t be upset if we don’t fix it: Time and resources are finite. In some cases, we may not be able to (or might choose not to) fix an issue, especially if it is an edge case or there is a workaround. Even if it doesn’t get fixed, the JIRA is a public record of it, and will help others out if they run into a similar issue in the future.

164.4.2. Working on an issue

To check for existing issues which you can tackle as a beginner, search for issues in JIRA tagged with the label ‘beginner’%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)).

JIRA Priorites

  • Blocker: Should only be used if the issue WILL cause data loss or cluster instability reliably.

  • Critical: The issue described can cause data loss or cluster instability in some cases.

  • Major: Important but not tragic issues, like updates to the client API that will add a lot of much-needed functionality or significant bugs that need to be fixed but that don’t cause data loss.

  • Minor: Useful enhancements and annoying but not damaging bugs.

  • Trivial: Useful enhancements but generally cosmetic.

Example 41. Code Blocks in Jira Comments

A commonly used macro in Jira is {code}. Everything inside the tags is preformatted, as in this example.

  1. {code}
  2. code snippet
  3. {code}

165. Apache HBase Repositories

Apache HBase consists of multiple repositories which are hosted on Apache GitBox. These are the following:

166. IDEs

166.1. Eclipse

166.1.1. Code Formatting

Under the dev-support/ folder, you will find hbase_eclipse_formatter.xml. We encourage you to have this formatter in place in eclipse when editing HBase code.

Go to Preferences→Java→Code Style→Formatter→Import to load the xml file. Go to Preferences→Java→Editor→Save Actions, and make sure ‘Format source code’ and ‘Format edited lines’ is selected.

In addition to the automatic formatting, make sure you follow the style guidelines explained in common.patch.feedback.

166.1.2. Eclipse Git Plugin

If you cloned the project via git, download and install the Git plugin (EGit). Attach to your local git repo (via the Git Repositories window) and you’ll be able to see file revision history, generate patches, etc.

166.1.3. HBase Project Setup in Eclipse using m2eclipse

The easiest way is to use the m2eclipse plugin for Eclipse. Eclipse Indigo or newer includes m2eclipse, or you can download it from http://www.eclipse.org/m2e/. It provides Maven integration for Eclipse, and even lets you use the direct Maven commands from within Eclipse to compile and test your project.

To import the project, click and select the HBase root directory. m2eclipse locates all the hbase modules for you.

If you install m2eclipse and import HBase in your workspace, do the following to fix your eclipse Build Path.

  1. Remove target folder

  2. Add target/generated-jamon and target/generated-sources/java folders.

  3. Remove from your Build Path the exclusions on the src/main/resources and src/test/resources to avoid error message in the console, such as the following:

    1. Failed to execute goal
    2. org.apache.maven.plugins:maven-antrun-plugin:1.6:run (default) on project hbase:
    3. 'An Ant BuildException has occurred: Replace: source file .../target/classes/hbase-default.xml
    4. doesn't exist


This will also reduce the eclipse build cycles and make your life easier when developing.

166.1.4. HBase Project Setup in Eclipse Using the Command Line

Instead of using m2eclipse, you can generate the Eclipse files from the command line.

  1. First, run the following command, which builds HBase. You only need to do this once.
    1. mvn clean install -DskipTests
  1. Close Eclipse, and execute the following command from the terminal, in your local HBase project directory, to generate new .project and .classpath files.
    1. mvn eclipse:eclipse
  1. Reopen Eclipse and import the .project file in the HBase directory to a workspace.

166.1.5. Maven Classpath Variable

The $M2_REPO classpath variable needs to be set up for the project. This needs to be set to your local Maven repository, which is usually ~/.m2/repository

If this classpath variable is not configured, you will see compile errors in Eclipse like this:

  1. Description Resource Path Location Type
  2. The project cannot be built until build path errors are resolved hbase Unknown Java Problem
  3. Unbound classpath variable: 'M2_REPO/asm/asm/3.1/asm-3.1.jar' in project 'hbase' hbase Build path Build Path Problem
  4. Unbound classpath variable: 'M2_REPO/com/google/guava/guava/r09/guava-r09.jar' in project 'hbase' hbase Build path Build Path Problem
  5. Unbound classpath variable: 'M2_REPO/com/google/protobuf/protobuf-java/2.3.0/protobuf-java-2.3.0.jar' in project 'hbase' hbase Build path Build Path Problem Unbound classpath variable:

166.1.6. Eclipse Known Issues

Eclipse will currently complain about Bytes.java. It is not possible to turn these errors off.

  1. Description Resource Path Location Type
  2. Access restriction: The method arrayBaseOffset(Class) from the type Unsafe is not accessible due to restriction on required library /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Classes/classes.jar Bytes.java /hbase/src/main/java/org/apache/hadoop/hbase/util line 1061 Java Problem
  3. Access restriction: The method arrayIndexScale(Class) from the type Unsafe is not accessible due to restriction on required library /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Classes/classes.jar Bytes.java /hbase/src/main/java/org/apache/hadoop/hbase/util line 1064 Java Problem
  4. Access restriction: The method getLong(Object, long) from the type Unsafe is not accessible due to restriction on required library /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Classes/classes.jar Bytes.java /hbase/src/main/java/org/apache/hadoop/hbase/util line 1111 Java Problem

166.1.7. Eclipse - More Information

For additional information on setting up Eclipse for HBase development on Windows, see Michael Morello’s blog on the topic.

166.2. IntelliJ IDEA

You can set up IntelliJ IDEA for similar functionality as Eclipse. Follow these steps.

  1. Select

  2. You do not need to select a profile. Be sure Maven project required is selected, and click Next.

  3. Select the location for the JDK.

Using the HBase Formatter in IntelliJ IDEA

Using the Eclipse Code Formatter plugin for IntelliJ IDEA, you can import the HBase code formatter described in eclipse.code.formatting.

166.3. Other IDEs

It would be useful to mirror the eclipse set-up instructions for other IDEs. If you would like to assist, please have a look at HBASE-11704.

167. Building Apache HBase

167.1. Basic Compile

HBase is compiled using Maven. You must use at least Maven 3.0.4. To check your Maven version, run the command mvn -version.

JDK Version Requirement

Starting with HBase 1.0 you must use Java 7 or later to build from source code. See java for more complete information about supported JDK versions.

167.1.1. Maven Build Commands

All commands are executed from the local HBase project directory.

Package

The simplest command to compile HBase from its java source code is to use the package target, which builds JARs with the compiled files.

  1. mvn package -DskipTests

Or, to clean up before compiling:

  1. mvn clean package -DskipTests

With Eclipse set up as explained above in eclipse, you can also use the Build command in Eclipse. To create the full installable HBase package takes a little bit more work, so read on.

Compile

The compile target does not create the JARs with the compiled files.

  1. mvn compile
  1. mvn clean compile

Install

To install the JARs in your ~/.m2/ directory, use the install target.

  1. mvn install
  1. mvn clean install
  1. mvn clean install -DskipTests

167.1.2. Running all or individual Unit Tests

See the hbase.unittests.cmds section in hbase.unittests

167.1.3. Building against various hadoop versions.

HBase supports building against Apache Hadoop versions: 2.y and 3.y (early release artifacts). By default we build against Hadoop 2.x.

To build against a specific release from the Hadoop 2.y line, set e.g. -Dhadoop-two.version=2.6.3.

  1. mvn -Dhadoop-two.version=2.6.3 ...

To change the major release line of Hadoop we build against, add a hadoop.profile property when you invoke mvn:

  1. mvn -Dhadoop.profile=3.0 ...

The above will build against whatever explicit hadoop 3.y version we have in our pom.xml as our ‘3.0’ version. Tests may not all pass so you may need to pass -DskipTests unless you are inclined to fix the failing tests.

To pick a particular Hadoop 3.y release, you’d set hadoop-three.version property e.g. -Dhadoop-three.version=3.0.0.

167.1.4. Build Protobuf

You may need to change the protobuf definitions that reside in the hbase-protocol module or other modules.

Previous to hbase-2.0.0, protobuf definition files were sprinkled across all hbase modules but now all to do with protobuf must reside in the hbase-protocol module; we are trying to contain our protobuf use so we can freely change versions without upsetting any downstream project use of protobuf.

The protobuf files are located in hbase-protocol/src/main/protobuf. For the change to be effective, you will need to regenerate the classes.

  1. mvn package -pl hbase-protocol -am

Similarly, protobuf definitions for internal use are located in the hbase-protocol-shaded module.

  1. mvn package -pl hbase-protocol-shaded -am

Typically, protobuf code generation is done using the native protoc binary. In our build we use a maven plugin for convenience; however, the plugin may not be able to retrieve appropriate binaries for all platforms. If you find yourself on a platform where protoc fails, you will have to compile protoc from source, and run it independent of our maven build. You can disable the inline code generation by specifying -Dprotoc.skip in your maven arguments, allowing your build to proceed further.

If you need to manually generate your protobuf files, you should not use clean in subsequent maven calls, as that will delete the newly generated files.

Read the hbase-protocol/README.txt for more details

167.1.5. Build Thrift

You may need to change the thrift definitions that reside in the hbase-thrift module or other modules.

The thrift files are located in hbase-thrift/src/main/resources. For the change to be effective, you will need to regenerate the classes. You can use maven profile compile-thrift to do this.

  1. mvn compile -Pcompile-thrift

You may also want to define thrift.path for the thrift binary, using the following command:

  1. mvn compile -Pcompile-thrift -Dthrift.path=/opt/local/bin/thrift

167.1.6. Build a Tarball

You can build a tarball without going through the release process described in releasing, by running the following command:

  1. mvn -DskipTests clean install && mvn -DskipTests package assembly:single

The distribution tarball is built in hbase-assembly/target/hbase—bin.tar.gz.

You can install or deploy the tarball by having the assembly:single goal before install or deploy in the maven command:

  1. mvn -DskipTests package assembly:single install
  1. mvn -DskipTests package assembly:single deploy

167.1.7. Build Gotchas

Maven Site failure

If you see Unable to find resource 'VM_global_library.vm', ignore it. It’s not an error. It is officially ugly though.

168. Releasing Apache HBase

Building against HBase 1.

HBase 1.x requires Java 7 to build. See java for Java requirements per HBase release.

Example 42. Example ~/.m2/settings.xml File

Publishing to maven requires you sign the artifacts you want to upload. For the build to sign them for you, you a properly configured settings.xml in your local repository under .m2, such as the following.

  1. <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
  2. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  3. xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
  4. http://maven.apache.org/xsd/settings-1.0.0.xsd">
  5. <servers>
  6. <!- To publish a snapshot of some part of Maven -->
  7. <server>
  8. <id>apache.snapshots.https</id>
  9. <username>YOUR_APACHE_ID
  10. </username>
  11. <password>YOUR_APACHE_PASSWORD
  12. </password>
  13. </server>
  14. <!-- To publish a website using Maven -->
  15. <!-- To stage a release of some part of Maven -->
  16. <server>
  17. <id>apache.releases.https</id>
  18. <username>YOUR_APACHE_ID
  19. </username>
  20. <password>YOUR_APACHE_PASSWORD
  21. </password>
  22. </server>
  23. </servers>
  24. <profiles>
  25. <profile>
  26. <id>apache-release</id>
  27. <properties>
  28. <gpg.keyname>YOUR_KEYNAME</gpg.keyname>
  29. <!--Keyname is something like this ... 00A5F21E... do gpg --list-keys to find it-->
  30. <gpg.passphrase>YOUR_KEY_PASSWORD
  31. </gpg.passphrase>
  32. </properties>
  33. </profile>
  34. </profiles>
  35. </settings>

168.1. Making a Release Candidate

Only committers may make releases of hbase artifacts.

Before You Begin

Make sure your environment is properly set up. Maven and Git are the main tooling used in the below. You’ll need a properly configured settings.xml file in your local ~/.m2 maven repository with logins for apache repos (See Example ~/.m2/settings.xml File). You will also need to have a published signing key. Browse the Hadoop How To Release wiki page on how to release. It is a model for most of the instructions below. It often has more detail on particular steps, for example, on adding your code signing key to the project KEYS file up in Apache or on how to update JIRA in preparation for release.

Before you make a release candidate, do a practice run by deploying a SNAPSHOT. Check to be sure recent builds have been passing for the branch from where you are going to take your release. You should also have tried recent branch tips out on a cluster under load, perhaps by running the hbase-it integration test suite for a few hours to ‘burn in’ the near-candidate bits.

Specifying the Heap Space for Maven

You may run into OutOfMemoryErrors building, particularly building the site and documentation. Up the heap for Maven by setting the MAVEN_OPTS variable. You can prefix the variable to the Maven command, as in the following example:

  1. MAVEN_OPTS="-Xmx4g -XX:MaxPermSize=256m" mvn package

You could also set this in an environment variable or alias in your shell.

The script dev-support/make_rc.sh automates many of the below steps. It will checkout a tag, clean the checkout, build src and bin tarballs, and deploy the built jars to repository.apache.org. It does NOT do the modification of the CHANGES.txt for the release, the checking of the produced artifacts to ensure they are ‘good’ — e.g. extracting the produced tarballs, verifying that they look right, then starting HBase and checking that everything is running correctly — or the signing and pushing of the tarballs to people.apache.org. Take a look. Modify/improve as you see fit.

Procedure: Release Procedure

  1. Update the CHANGES.txt file and the POM files.
    Update CHANGES.txt with the changes since the last release. Make sure the URL to the JIRA points to the proper location which lists fixes for this release. Adjust the version in all the POM files appropriately. If you are making a release candidate, you must remove the -SNAPSHOT label from all versions in all pom.xml files. If you are running this receipe to publish a snapshot, you must keep the -SNAPSHOT suffix on the hbase version. The Versions Maven Plugin can be of use here. To set a version in all the many poms of the hbase multi-module project, use a command like the following:
    1. $ mvn clean org.codehaus.mojo:versions-maven-plugin:2.5:set -DnewVersion=2.1.0-SNAPSHOT


Make sure all versions in poms are changed! Checkin the CHANGES.txt and any maven version changes.

  1. Update the documentation.
    Update the documentation under src/main/asciidoc. This usually involves copying the latest from master branch and making version-particular adjustments to suit this release candidate version.

  2. Clean the checkout dir

    1. $ mvn clean
    2. $ git clean -f -x -d
  1. Run Apache-Rat Check licenses are good
    1. $ mvn apache-rat


If the above fails, check the rat log.

  1. $ grep 'Rat check' patchprocess/mvn_apache_rat.log
  1. Create a release tag. Presuming you have run basic tests, the rat check, passes and all is looking good, now is the time to tag the release candidate (You always remove the tag if you need to redo). To tag, do what follows substituting in the version appropriate to your build. All tags should be signed tags; i.e. pass the -s option (See Signing Your Work for how to set up your git environment for signing).
    1. $ git tag -s 2.0.0-alpha4-RC0 -m "Tagging the 2.0.0-alpha4 first Releae Candidate (Candidates start at zero)"

Or, if you are making a release, tags should have a rel/ prefix to ensure they are preserved in the Apache repo as in:

  1. +$ git tag -s rel/2.0.0-alpha4 -m "Tagging the 2.0.0-alpha4 Release"

Push the (specific) tag (only) so others have access.

    1. $ git push origin 2.0.0-alpha4-RC0
  • For how to delete tags, see How to Delete a Tag. Covers deleting tags that have not yet been pushed to the remote Apache repo as well as delete of tags pushed to Apache.

  1. Build the source tarball.
    Now, build the source tarball. Lets presume we are building the source tarball for the tag 2.0.0-alpha4-RC0 into /tmp/hbase-2.0.0-alpha4-RC0/ (This step requires that the mvn and git clean steps described above have just been done).
    1. $ git archive --format=tar.gz --output="/tmp/hbase-2.0.0-alpha4-RC0/hbase-2.0.0-alpha4-src.tar.gz" --prefix="hbase-2.0.0-alpha4/" $git_tag

Above we generate the hbase-2.0.0-alpha4-src.tar.gz tarball into the /tmp/hbase-2.0.0-alpha4-RC0 build output directory (We don’t want the RC0 in the name or prefix. These bits are currently a release candidate but if the VOTE passes, they will become the release so we do not taint the artifact names with RCX).

  1. Build the binary tarball. Next, build the binary tarball. Add the -Prelease profile when building. It runs the license apache-rat check among other rules that help ensure all is wholesome. Do it in two steps.

First install into the local repository

  1. $ mvn clean install -DskipTests -Prelease

Next, generate documentation and assemble the tarball. Be warned, this next step can take a good while, a couple of hours generating site documentation.

  1. $ mvn install -DskipTests site assembly:single -Prelease
  • Otherwise, the build complains that hbase modules are not in the maven repository when you try to do it all in one step, especially on a fresh repository. It seems that you need the install goal in both steps.

  • Extract the generated tarball — you’ll find it under hbase-assembly/target and check it out. Look at the documentation, see if it runs, etc. If good, copy the tarball beside the source tarball in the build output directory.

  1. Deploy to the Maven Repository.
    Next, deploy HBase to the Apache Maven repository. Add the apache-releaseprofile when running themvn deploy` command. This profile comes from the Apache parent pom referenced by our pom files. It does signing of your artifacts published to Maven, as long as the settings.xml is configured correctly, as described in Example ~/.m2/settings.xml File. This step depends on the local repository having been populate by the just-previous bin tarball build.
    1. $ mvn deploy -DskipTests -Papache-release -Prelease


This command copies all artifacts up to a temporary staging Apache mvn repository in an ‘open’ state. More work needs to be done on these maven artifacts to make them generally available.
We do not release HBase tarball to the Apache Maven repository. To avoid deploying the tarball, do not include the assembly:single goal in your mvn deploy command. Check the deployed artifacts as described in the next section.

make_rc.s

If you run the dev-support/make_rc.sh script, this is as far as it takes you. To finish the release, take up the script from here on out.

  1. Make the Release Candidate available.
    The artifacts are in the maven repository in the staging area in the ‘open’ state. While in this ‘open’ state you can check out what you’ve published to make sure all is good. To do this, log in to Apache’s Nexus at repository.apache.org using your Apache ID. Find your artifacts in the staging repository. Click on ‘Staging Repositories’ and look for a new one ending in “hbase” with a status of ‘Open’, select it. Use the tree view to expand the list of repository contents and inspect if the artifacts you expect are present. Check the POMs. As long as the staging repo is open you can re-upload if something is missing or built incorrectly.
    If something is seriously wrong and you would like to back out the upload, you can use the ‘Drop’ button to drop and delete the staging repository. Sometimes the upload fails in the middle. This is another reason you might have to ‘Drop’ the upload from the staging repository.
    If it checks out, close the repo using the ‘Close’ button. The repository must be closed before a public URL to it becomes available. It may take a few minutes for the repository to close. Once complete you’ll see a public URL to the repository in the Nexus UI. You may also receive an email with the URL. Provide the URL to the temporary staging repository in the email that announces the release candidate. (Folks will need to add this repo URL to their local poms or to their local settings.xml file to pull the published release candidate artifacts.)
    When the release vote concludes successfully, return here and click the ‘Release’ button to release the artifacts to central. The release process will automatically drop and delete the staging repository.

    hbase-downstreamer

See the hbase-downstreamer test for a simple example of a project that is downstream of HBase an depends on it. Check it out and run its simple test to make sure maven artifacts are properly deployed to the maven repository. Be sure to edit the pom to point to the proper staging repository. Make sure you are pulling from the repository when tests run and that you are not getting from your local repository, by either passing the -U flag or deleting your local repo content and check maven is pulling from remote out of the staging repository.

See Publishing Maven Artifacts for some pointers on this maven staging process.

  • If the HBase version ends in -SNAPSHOT, the artifacts go elsewhere. They are put into the Apache snapshots repository directly and are immediately available. Making a SNAPSHOT release, this is what you want to happen.

  • At this stage, you have two tarballs in your ‘build output directory’ and a set of artifacts in a staging area of the maven repository, in the ‘closed’ state. Next sign, fingerprint and then ‘stage’ your release candiate build output directory via svnpubsub by committing your directory to The dev distribution directory (See comments on HBASE-10554 Please delete old releases from mirroring system but in essence it is an svn checkout of dev/hbase — releases are at release/hbase). In the version directory run the following commands:

  1. $ for i in *.tar.gz; do echo $i; gpg --print-md MD5 $i > $i.md5 ; done
  2. $ for i in *.tar.gz; do echo $i; gpg --print-md SHA512 $i > $i.sha ; done
  3. $ for i in *.tar.gz; do echo $i; gpg --armor --output $i.asc --detach-sig $i ; done
  4. $ cd ..
  5. # Presuming our 'build output directory' is named 0.96.0RC0, copy it to the svn checkout of the dist dev dir
  6. # in this case named hbase.dist.dev.svn
  7. $ cd /Users/stack/checkouts/hbase.dist.dev.svn
  8. $ svn info
  9. Path: .
  10. Working Copy Root Path: /Users/stack/checkouts/hbase.dist.dev.svn
  11. URL: https://dist.apache.org/repos/dist/dev/hbase
  12. Repository Root: https://dist.apache.org/repos/dist
  13. Repository UUID: 0d268c88-bc11-4956-87df-91683dc98e59
  14. Revision: 15087
  15. Node Kind: directory
  16. Schedule: normal
  17. Last Changed Author: ndimiduk
  18. Last Changed Rev: 15045
  19. Last Changed Date: 2016-08-28 11:13:36 -0700 (Sun, 28 Aug 2016)
  20. $ mv 0.96.0RC0 /Users/stack/checkouts/hbase.dist.dev.svn
  21. $ svn add 0.96.0RC0
  22. $ svn commit ...

Announce the release candidate on the mailing list and call a vote.

168.2. Publishing a SNAPSHOT to maven

Make sure your settings.xml is set up properly (see Example ~/.m2/settings.xml File). Make sure the hbase version includes -SNAPSHOT as a suffix. Following is an example of publishing SNAPSHOTS of a release that had an hbase version of 0.96.0 in its poms.

  1. $ mvn clean install -DskipTests javadoc:aggregate site assembly:single -Prelease
  2. $ mvn -DskipTests deploy -Papache-release

The make_rc.sh script mentioned above (see maven.release) can help you publish SNAPSHOTS. Make sure your hbase.version has a -SNAPSHOT suffix before running the script. It will put a snapshot up into the apache snapshot repository for you.

169. Voting on Release Candidates

Everyone is encouraged to try and vote on HBase release candidates. Only the votes of PMC members are binding. PMC members, please read this WIP doc on policy voting for a release candidate, Release Policy. [quote]Before casting 1 binding votes, individuals are required to download the signed source code package onto their own hardware, compile it as provided, and test the resulting executable on their own platform, along with also validating cryptographic signatures and verifying that the package meets the requirements of the ASF policy on releases. Regards the latter, run +mvn apache-rat:check to verify all files are suitably licensed. See HBase, mail # dev - On recent discussion clarifying ASF release policy. for how we arrived at this process.

170. Announcing Releases

Once an RC has passed successfully and the needed artifacts have been staged for disribution, you’ll need to let everyone know about our shiny new release. It’s not a requirement, but to make things easier for release managers we have a template you can start with. Be sure you replace version and other markers with the relevant version numbers. You should manually verify all links before sending.

  1. The HBase team is happy to announce the immediate availability of HBase _version_.
  2. Apache HBase is an open-source, distributed, versioned, non-relational database.
  3. Apache HBase gives you low latency random access to billions of rows with
  4. millions of columns atop non-specialized hardware. To learn more about HBase,
  5. see https://hbase.apache.org/.
  6. HBase _version_ is the _nth_ minor release in the HBase _major_.x line, which aims to
  7. improve the stability and reliability of HBase. This release includes roughly
  8. XXX resolved issues not covered by previous _major_.x releases.
  9. Notable new features include:
  10. - List text descriptions of features that fit on one line
  11. - Including if JDK or Hadoop support versions changes
  12. - If the "stable" pointer changes, call that out
  13. - For those with obvious JIRA IDs, include them (HBASE-YYYYY)
  14. The full list of issues can be found in the included CHANGES.md and RELEASENOTES.md,
  15. or via our issue tracker:
  16. https://s.apache.org/hbase-_version_-jira
  17. To download please follow the links and instructions on our website:
  18. https://hbase.apache.org/downloads.html
  19. Question, comments, and problems are always welcome at: dev@hbase.apache.org.
  20. Thanks to all who contributed and made this release possible.
  21. Cheers,
  22. The HBase Dev Team

You should sent this message to the following lists: dev@hbase.apache.org, user@hbase.apache.org, announce@apache.org. If you’d like a spot check before sending, feel free to ask via jira or the dev list.

171. Generating the HBase Reference Guide

The manual is marked up using Asciidoc. We then use the Asciidoctor maven plugin to transform the markup to html. This plugin is run when you specify the site goal as in when you run mvn site. See appendix contributing to documentation for more information on building the documentation.

172. Updating hbase.apache.org

172.1. Contributing to hbase.apache.org

See appendix contributing to documentation for more information on contributing to the documentation or website.

172.2. Publishing hbase.apache.org

See Publishing the HBase Website and Documentation for instructions on publishing the website and documentation.

173. Tests

Developers, at a minimum, should familiarize themselves with the unit test detail; unit tests in HBase have a character not usually seen in other projects.

This information is about unit tests for HBase itself. For developing unit tests for your HBase applications, see unit.tests.

173.1. Apache HBase Modules

As of 0.96, Apache HBase is split into multiple modules. This creates “interesting” rules for how and where tests are written. If you are writing code for hbase-server, see hbase.unittests for how to write your tests. These tests can spin up a minicluster and will need to be categorized. For any other module, for example hbase-common, the tests must be strict unit tests and just test the class under test - no use of the HBaseTestingUtility or minicluster is allowed (or even possible given the dependency tree).

173.1.1. Testing the HBase Shell

The HBase shell and its tests are predominantly written in jruby.

In order to make these tests run as a part of the standard build, there are a few JUnit test classes that take care of loading the jruby implemented tests and running them. The tests were split into separate classes to accomodate class level timeouts (see Unit Tests for specifics). You can run all of these tests from the top level with:

  1. mvn clean test -Dtest=Test*Shell

If you have previously done a mvn install, then you can instruct maven to run only the tests in the hbase-shell module with:

  1. mvn clean test -pl hbase-shell

Alternatively, you may limit the shell tests that run using the system variable shell.test. This value should specify the ruby literal equivalent of a particular test case by name. For example, the tests that cover the shell commands for altering tables are contained in the test case AdminAlterTableTest and you can run them with:

  1. mvn clean test -pl hbase-shell -Dshell.test=/AdminAlterTableTest/

You may also use a Ruby Regular Expression literal (in the /pattern/ style) to select a set of test cases. You can run all of the HBase admin related tests, including both the normal administration and the security administration, with the command:

  1. mvn clean test -pl hbase-shell -Dshell.test=/.*Admin.*Test/

In the event of a test failure, you can see details by examining the XML version of the surefire report results

  1. vim hbase-shell/target/surefire-reports/TEST-org.apache.hadoop.hbase.client.TestShell.xml

173.1.2. Running Tests in other Modules

If the module you are developing in has no other dependencies on other HBase modules, then you can cd into that module and just run:

  1. mvn test

which will just run the tests IN THAT MODULE. If there are other dependencies on other modules, then you will have run the command from the ROOT HBASE DIRECTORY. This will run the tests in the other modules, unless you specify to skip the tests in that module. For instance, to skip the tests in the hbase-server module, you would run:

  1. mvn clean test -PskipServerTests

from the top level directory to run all the tests in modules other than hbase-server. Note that you can specify to skip tests in multiple modules as well as just for a single module. For example, to skip the tests in hbase-server and hbase-common, you would run:

  1. mvn clean test -PskipServerTests -PskipCommonTests

Also, keep in mind that if you are running tests in the hbase-server module you will need to apply the maven profiles discussed in hbase.unittests.cmds to get the tests to run properly.

173.2. Unit Tests

Apache HBase unit tests must carry a Category annotation and as of hbase-2.0.0, must be stamped with the HBase ClassRule. Here is an example of what a Test Class looks like with a Category and ClassRule included:

  1. ...
  2. @Category(SmallTests.class)
  3. public class TestHRegionInfo {
  4. @ClassRule
  5. public static final HBaseClassTestRule CLASS_RULE =
  6. HBaseClassTestRule.forClass(TestHRegionInfo.class);
  7. @Test
  8. public void testCreateHRegionInfoName() throws Exception {
  9. // ...
  10. }
  11. }

Here the Test Class is TestHRegionInfo. The CLASS_RULE has the same form in every test class only the .class you pass is that of the local test; i.e. in the TestTimeout Test Class, you’d pass TestTimeout.class to the CLASS_RULE instead of the TestHRegionInfo.class we have above. The CLASS_RULE is where we’ll enforce timeouts (currently set at a hard-limit of thirteen! minutes for all tests — 780 seconds) and other cross-unit test facility. The test is in the SmallTest Category.

Categories can be arbitrary and provided as a list but each test MUST carry one from the following list of sizings: small, medium, large, and integration. The test sizing is designated using the JUnit categories: SmallTests, MediumTests, LargeTests, IntegrationTests. JUnit Categories are denoted using java annotations (a special unit test looks for the presence of the @Category annotation in all unit tess and will fail if it finds a test suite missing a sizing marking).

The first three categories, small, medium, and large, are for test cases which run when you type $ mvn test. In other words, these three categorizations are for HBase unit tests. The integration category is not for unit tests, but for integration tests. These are normally run when you invoke $ mvn verify. Integration tests are described in integration.tests.

Keep reading to figure which annotation of the set small, medium, and large to put on your new HBase test case.

Categorizing Tests

Small Tests

Small test cases are executed in a shared JVM and each test suite/test class should run in 15 seconds or less; i.e. a junit test fixture, a java object made up of test methods, should finish in under 15 seconds, no matter how many or how few test methods it has. These test cases should not use a minicluster.

Medium Tests

Medium test cases are executed in separate JVM and individual test suites or test classes or in junit parlance, test fixture, should run in 50 seconds or less. These test cases can use a mini cluster.

Large Tests

Large test cases are everything else. They are typically large-scale tests, regression tests for specific bugs, timeout tests, or performance tests. No large test suite can take longer than ten minutes. It will be killed as timed out. Cast your test as an Integration Test if it needs to run longer.

Integration Tests

Integration tests are system level tests. See integration.tests for more info. If you invoke $ mvn test on integration tests, there is no timeout for the test.

173.3. Running tests

173.3.1. Default: small and medium category tests

Running mvn test will execute all small tests in a single JVM (no fork) and then medium tests in a separate JVM for each test instance. Medium tests are NOT executed if there is an error in a small test. Large tests are NOT executed.

173.3.2. Running all tests

Running mvn test -P runAllTests will execute small tests in a single JVM then medium and large tests in a separate JVM for each test. Medium and large tests are NOT executed if there is an error in a small test.

173.3.3. Running a single test or all tests in a package

To run an individual test, e.g. MyTest, rum mvn test -Dtest=MyTest You can also pass multiple, individual tests as a comma-delimited list:

  1. mvn test -Dtest=MyTest1,MyTest2,MyTest3

You can also pass a package, which will run all tests under the package:

  1. mvn test '-Dtest=org.apache.hadoop.hbase.client.*'

When -Dtest is specified, the localTests profile will be used. Each junit test is executed in a separate JVM (A fork per test class). There is no parallelization when tests are running in this mode. You will see a new message at the end of the -report: "[INFO] Tests are skipped". It’s harmless. However, you need to make sure the sum of Tests run: in the Results: section of test reports matching the number of tests you specified because no error will be reported when a non-existent test case is specified.

173.3.4. Other test invocation permutations

Running mvn test -P runSmallTests will execute “small” tests only, using a single JVM.

Running mvn test -P runMediumTests will execute “medium” tests only, launching a new JVM for each test-class.

Running mvn test -P runLargeTests will execute “large” tests only, launching a new JVM for each test-class.

For convenience, you can run mvn test -P runDevTests to execute both small and medium tests, using a single JVM.

173.3.5. Running tests faster

By default, $ mvn test -P runAllTests runs 5 tests in parallel. It can be increased on a developer’s machine. Allowing that you can have 2 tests in parallel per core, and you need about 2GB of memory per test (at the extreme), if you have an 8 core, 24GB box, you can have 16 tests in parallel. but the memory available limits it to 12 (24/2), To run all tests with 12 tests in parallel, do this: mvn test -P runAllTests -Dsurefire.secondPartForkCount=12. If using a version earlier than 2.0, do: +mvn test -P runAllTests -Dsurefire.secondPartThreadCount=12 +. To increase the speed, you can as well use a ramdisk. You will need 2GB of memory to run all tests. You will also need to delete the files between two test run. The typical way to configure a ramdisk on Linux is:

  1. $ sudo mkdir /ram2G
  2. sudo mount -t tmpfs -o size=2048M tmpfs /ram2G

You can then use it to run all HBase tests on 2.0 with the command:

  1. mvn test
  2. -P runAllTests -Dsurefire.secondPartForkCount=12
  3. -Dtest.build.data.basedirectory=/ram2G

On earlier versions, use:

  1. mvn test
  2. -P runAllTests -Dsurefire.secondPartThreadCount=12
  3. -Dtest.build.data.basedirectory=/ram2G

173.3.6. hbasetests.sh

It’s also possible to use the script hbasetests.sh. This script runs the medium and large tests in parallel with two maven instances, and provides a single report. This script does not use the hbase version of surefire so no parallelization is being done other than the two maven instances the script sets up. It must be executed from the directory which contains the pom.xml.

For example running ./dev-support/hbasetests.sh will execute small and medium tests. Running ./dev-support/hbasetests.sh runAllTests will execute all tests. Running ./dev-support/hbasetests.sh replayFailed will rerun the failed tests a second time, in a separate jvm and without parallelisation.

173.3.7. Test Timeouts

The HBase unit test sizing Categorization timeouts are not strictly enforced.

Any test that runs longer than ten minutes will be timedout/killed.

As of hbase-2.0.0, we have purged all per-test-method timeouts: i.e.

  1. ...
  2. @Test(timeout=30000)
  3. public void testCreateHRegionInfoName() throws Exception {
  4. // ...
  5. }

They are discouraged and don’t make much sense given we are timing base of how long the whole Test Fixture/Class/Suite takes and that the variance in how long a test method takes varies wildly dependent upon context (loaded Apache Infrastructure versus developer machine with nothing else running on it).

173.3.8. Test Resource Checker

A custom Maven SureFire plugin listener checks a number of resources before and after each HBase unit test runs and logs its findings at the end of the test output files which can be found in target/surefire-reports per Maven module (Tests write test reports named for the test class into this directory. Check the *-out.txt files). The resources counted are the number of threads, the number of file descriptors, etc. If the number has increased, it adds a LEAK? comment in the logs. As you can have an HBase instance running in the background, some threads can be deleted/created without any specific action in the test. However, if the test does not work as expected, or if the test should not impact these resources, it’s worth checking these log lines …hbase.ResourceChecker(157): before… and …hbase.ResourceChecker(157): after…. For example:

  1. 2012-09-26 09:22:15,315 INFO [pool-1-thread-1]
  2. hbase.ResourceChecker(157): after:
  3. regionserver.TestColumnSeeking#testReseeking Thread=65 (was 65),
  4. OpenFileDescriptor=107 (was 107), MaxFileDescriptor=10240 (was 10240),
  5. ConnectionCount=1 (was 1)

173.4. Writing Tests

173.4.1. General rules

  • As much as possible, tests should be written as category small tests.

  • All tests must be written to support parallel execution on the same machine, hence they should not use shared resources as fixed ports or fixed file names.

  • Tests should not overlog. More than 100 lines/second makes the logs complex to read and use i/o that are hence not available for the other tests.

  • Tests can be written with HBaseTestingUtility. This class offers helper functions to create a temp directory and do the cleanup, or to start a cluster.

173.4.2. Categories and execution time

  • All tests must be categorized, if not they could be skipped.

  • All tests should be written to be as fast as possible.

  • See < for test case categories and corresponding timeouts. This should ensure a good parallelization for people using it, and ease the analysis when the test fails.

173.4.3. Sleeps in tests

Whenever possible, tests should not use Thread.sleep, but rather waiting for the real event they need. This is faster and clearer for the reader. Tests should not do a Thread.sleep without testing an ending condition. This allows understanding what the test is waiting for. Moreover, the test will work whatever the machine performance is. Sleep should be minimal to be as fast as possible. Waiting for a variable should be done in a 40ms sleep loop. Waiting for a socket operation should be done in a 200 ms sleep loop.

173.4.4. Tests using a cluster

Tests using a HRegion do not have to start a cluster: A region can use the local file system. Start/stopping a cluster cost around 10 seconds. They should not be started per test method but per test class. Started cluster must be shutdown using HBaseTestingUtility#shutdownMiniCluster, which cleans the directories. As most as possible, tests should use the default settings for the cluster. When they don’t, they should document it. This will allow to share the cluster later.

173.4.5. Tests Skeleton Code

Here is a test skeleton code with Categorization and a Category-based timeout rule to copy and paste and use as basis for test contribution.

  1. /**
  2. * Describe what this testcase tests. Talk about resources initialized in @BeforeClass (before
  3. * any test is run) and before each test is run, etc.
  4. */
  5. // Specify the category as explained in <<hbase.unittests,hbase.unittests>>.
  6. @Category(SmallTests.class)
  7. public class TestExample {
  8. // Replace the TestExample.class in the below with the name of your test fixture class.
  9. private static final Log LOG = LogFactory.getLog(TestExample.class);
  10. // Handy test rule that allows you subsequently get the name of the current method. See
  11. // down in 'testExampleFoo()' where we use it to log current test's name.
  12. @Rule public TestName testName = new TestName();
  13. // The below rule does two things. It decides the timeout based on the category
  14. // (small/medium/large) of the testcase. This @Rule requires that the full testcase runs
  15. // within this timeout irrespective of individual test methods' times. The second
  16. // feature is we'll dump in the log when the test is done a count of threads still
  17. // running.
  18. @Rule public static TestRule timeout = CategoryBasedTimeout.builder().
  19. withTimeout(this.getClass()).withLookingForStuckThread(true).build();
  20. @Before
  21. public void setUp() throws Exception {
  22. }
  23. @After
  24. public void tearDown() throws Exception {
  25. }
  26. @Test
  27. public void testExampleFoo() {
  28. LOG.info("Running test " + testName.getMethodName());
  29. }
  30. }

173.5. Integration Tests

HBase integration/system tests are tests that are beyond HBase unit tests. They are generally long-lasting, sizeable (the test can be asked to 1M rows or 1B rows), targetable (they can take configuration that will point them at the ready-made cluster they are to run against; integration tests do not include cluster start/stop code), and verifying success, integration tests rely on public APIs only; they do not attempt to examine server internals asserting success/fail. Integration tests are what you would run when you need to more elaborate proofing of a release candidate beyond what unit tests can do. They are not generally run on the Apache Continuous Integration build server, however, some sites opt to run integration tests as a part of their continuous testing on an actual cluster.

Integration tests currently live under the src/test directory in the hbase-it submodule and will match the regex: /IntegrationTest**.java. All integration tests are also annotated with @Category(IntegrationTests.class).

Integration tests can be run in two modes: using a mini cluster, or against an actual distributed cluster. Maven failsafe is used to run the tests using the mini cluster. IntegrationTestsDriver class is used for executing the tests against a distributed cluster. Integration tests SHOULD NOT assume that they are running against a mini cluster, and SHOULD NOT use private API’s to access cluster state. To interact with the distributed or mini cluster uniformly, IntegrationTestingUtility, and HBaseCluster classes, and public client API’s can be used.

On a distributed cluster, integration tests that use ChaosMonkey or otherwise manipulate services thru cluster manager (e.g. restart regionservers) use SSH to do it. To run these, test process should be able to run commands on remote end, so ssh should be configured accordingly (for example, if HBase runs under hbase user in your cluster, you can set up passwordless ssh for that user and run the test also under it). To facilitate that, hbase.it.clustermanager.ssh.user, hbase.it.clustermanager.ssh.opts and hbase.it.clustermanager.ssh.cmd configuration settings can be used. “User” is the remote user that cluster manager should use to perform ssh commands. “Opts” contains additional options that are passed to SSH (for example, “-i /tmp/my-key”). Finally, if you have some custom environment setup, “cmd” is the override format for the entire tunnel (ssh) command. The default string is {/usr/bin/ssh %1$s %2$s%3$s%4$s "%5$s"} and is a good starting point. This is a standard Java format string with 5 arguments that is used to execute the remote command. The argument 1 (%1$s) is SSH options set the via opts setting or via environment variable, 2 is SSH user name, 3 is “@” if username is set or “” otherwise, 4 is the target host name, and 5 is the logical command to execute (that may include single quotes, so don’t use them). For example, if you run the tests under non-hbase user and want to ssh as that user and change to hbase on remote machine, you can use:

  1. /usr/bin/ssh %1$s %2$s%3$s%4$s "su hbase - -c \"%5$s\""

That way, to kill RS (for example) integration tests may run:

  1. {/usr/bin/ssh some-hostname "su hbase - -c \"ps aux | ... | kill ...\""}

The command is logged in the test logs, so you can verify it is correct for your environment.

To disable the running of Integration Tests, pass the following profile on the command line -PskipIntegrationTests. For example,

  1. $ mvn clean install test -Dtest=TestZooKeeper -PskipIntegrationTests

173.5.1. Running integration tests against mini cluster

HBase 0.92 added a verify maven target. Invoking it, for example by doing mvn verify, will run all the phases up to and including the verify phase via the maven failsafe plugin, running all the above mentioned HBase unit tests as well as tests that are in the HBase integration test group. After you have completed mvn install -DskipTests You can run just the integration tests by invoking:

  1. cd hbase-it
  2. mvn verify

If you just want to run the integration tests in top-level, you need to run two commands. First: mvn failsafe:integration-test This actually runs ALL the integration tests.

This command will always output BUILD SUCCESS even if there are test failures.

At this point, you could grep the output by hand looking for failed tests. However, maven will do this for us; just use: mvn failsafe:verify The above command basically looks at all the test results (so don’t remove the ‘target’ directory) for test failures and reports the results.

Running a subset of Integration tests

This is very similar to how you specify running a subset of unit tests (see above), but use the property it.test instead of test. To just run IntegrationTestClassXYZ.java, use: mvn failsafe:integration-test -Dit.test=IntegrationTestClassXYZ The next thing you might want to do is run groups of integration tests, say all integration tests that are named IntegrationTestClassX.java: mvn failsafe:integration-test -Dit.test=_ClassX This runs everything that is an integration test that matches ClassX. This means anything matching: “/IntegrationTestClassX“. You can also run multiple groups of integration tests using comma-delimited lists (similar to unit tests). Using a list of matches still supports full regex matching for each of the groups. This would look something like: mvn failsafe:integration-test -Dit.test=ClassX, _ClassY

173.5.2. Running integration tests against distributed cluster

If you have an already-setup HBase cluster, you can launch the integration tests by invoking the class IntegrationTestsDriver. You may have to run test-compile first. The configuration will be picked by the bin/hbase script.

  1. mvn test-compile

Then launch the tests with:

  1. bin/hbase [--config config_dir] org.apache.hadoop.hbase.IntegrationTestsDriver

Pass -h to get usage on this sweet tool. Running the IntegrationTestsDriver without any argument will launch tests found under hbase-it/src/test, having @Category(IntegrationTests.class) annotation, and a name starting with IntegrationTests. See the usage, by passing -h, to see how to filter test classes. You can pass a regex which is checked against the full class name; so, part of class name can be used. IntegrationTestsDriver uses Junit to run the tests. Currently there is no support for running integration tests against a distributed cluster using maven (see HBASE-6201).

The tests interact with the distributed cluster by using the methods in the DistributedHBaseCluster (implementing HBaseCluster) class, which in turn uses a pluggable ClusterManager. Concrete implementations provide actual functionality for carrying out deployment-specific and environment-dependent tasks (SSH, etc). The default ClusterManager is HBaseClusterManager, which uses SSH to remotely execute start/stop/kill/signal commands, and assumes some posix commands (ps, etc). Also assumes the user running the test has enough “power” to start/stop servers on the remote machines. By default, it picks up HBASE_SSH_OPTS, HBASE_HOME, HBASE_CONF_DIR from the env, and uses bin/hbase-daemon.sh to carry out the actions. Currently tarball deployments, deployments which uses hbase-daemons.sh, and Apache Ambari deployments are supported. /etc/init.d/ scripts are not supported for now, but it can be easily added. For other deployment options, a ClusterManager can be implemented and plugged in.

173.5.3. Destructive integration / system tests (ChaosMonkey)

HBase 0.96 introduced a tool named ChaosMonkey, modeled after same-named tool by Netflix’s Chaos Monkey tool. ChaosMonkey simulates real-world faults in a running cluster by killing or disconnecting random servers, or injecting other failures into the environment. You can use ChaosMonkey as a stand-alone tool to run a policy while other tests are running. In some environments, ChaosMonkey is always running, in order to constantly check that high availability and fault tolerance are working as expected.

ChaosMonkey defines Actions and Policies.

Actions

Actions are predefined sequences of events, such as the following:

  • Restart active master (sleep 5 sec)

  • Restart random regionserver (sleep 5 sec)

  • Restart random regionserver (sleep 60 sec)

  • Restart META regionserver (sleep 5 sec)

  • Restart ROOT regionserver (sleep 5 sec)

  • Batch restart of 50% of regionservers (sleep 5 sec)

  • Rolling restart of 100% of regionservers (sleep 5 sec)

Policies

A policy is a strategy for executing one or more actions. The default policy executes a random action every minute based on predefined action weights. A given policy will be executed until ChaosMonkey is interrupted.

Most ChaosMonkey actions are configured to have reasonable defaults, so you can run ChaosMonkey against an existing cluster without any additional configuration. The following example runs ChaosMonkey with the default configuration:

  1. $ bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey
  2. 12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000
  3. 12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter
  4. 12/11/19 23:22:24 INFO util.ChaosMonkey: Performing action: Restart active master
  5. 12/11/19 23:22:24 INFO util.ChaosMonkey: Killing master:master.example.com,60000,1353367210440
  6. 12/11/19 23:22:24 INFO hbase.HBaseCluster: Aborting Master: master.example.com,60000,1353367210440
  7. 12/11/19 23:22:24 INFO hbase.ClusterManager: Executing remote command: ps aux | grep master | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:master.example.com
  8. 12/11/19 23:22:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
  9. 12/11/19 23:22:25 INFO hbase.HBaseCluster: Waiting service:master to stop: master.example.com,60000,1353367210440
  10. 12/11/19 23:22:25 INFO hbase.ClusterManager: Executing remote command: ps aux | grep master | grep -v grep | tr -s ' ' | cut -d ' ' -f2 , hostname:master.example.com
  11. 12/11/19 23:22:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
  12. 12/11/19 23:22:25 INFO util.ChaosMonkey: Killed master server:master.example.com,60000,1353367210440
  13. 12/11/19 23:22:25 INFO util.ChaosMonkey: Sleeping for:5000
  14. 12/11/19 23:22:30 INFO util.ChaosMonkey: Starting master:master.example.com
  15. 12/11/19 23:22:30 INFO hbase.HBaseCluster: Starting Master on: master.example.com
  16. 12/11/19 23:22:30 INFO hbase.ClusterManager: Executing remote command: /homes/enis/code/hbase-0.94/bin/../bin/hbase-daemon.sh --config /homes/enis/code/hbase-0.94/bin/../conf start master , hostname:master.example.com
  17. 12/11/19 23:22:31 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting master, logging to /homes/enis/code/hbase-0.94/bin/../logs/hbase-enis-master-master.example.com.out
  18. ....
  19. 12/11/19 23:22:33 INFO util.ChaosMonkey: Started master: master.example.com,60000,1353367210440
  20. 12/11/19 23:22:33 INFO util.ChaosMonkey: Sleeping for:51321
  21. 12/11/19 23:23:24 INFO util.ChaosMonkey: Performing action: Restart random region server
  22. 12/11/19 23:23:24 INFO util.ChaosMonkey: Killing region server:rs3.example.com,60020,1353367027826
  23. 12/11/19 23:23:24 INFO hbase.HBaseCluster: Aborting RS: rs3.example.com,60020,1353367027826
  24. 12/11/19 23:23:24 INFO hbase.ClusterManager: Executing remote command: ps aux | grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:rs3.example.com
  25. 12/11/19 23:23:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
  26. 12/11/19 23:23:25 INFO hbase.HBaseCluster: Waiting service:regionserver to stop: rs3.example.com,60020,1353367027826
  27. 12/11/19 23:23:25 INFO hbase.ClusterManager: Executing remote command: ps aux | grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 , hostname:rs3.example.com
  28. 12/11/19 23:23:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
  29. 12/11/19 23:23:25 INFO util.ChaosMonkey: Killed region server:rs3.example.com,60020,1353367027826\. Reported num of rs:6
  30. 12/11/19 23:23:25 INFO util.ChaosMonkey: Sleeping for:60000
  31. 12/11/19 23:24:25 INFO util.ChaosMonkey: Starting region server:rs3.example.com
  32. 12/11/19 23:24:25 INFO hbase.HBaseCluster: Starting RS on: rs3.example.com
  33. 12/11/19 23:24:25 INFO hbase.ClusterManager: Executing remote command: /homes/enis/code/hbase-0.94/bin/../bin/hbase-daemon.sh --config /homes/enis/code/hbase-0.94/bin/../conf start regionserver , hostname:rs3.example.com
  34. 12/11/19 23:24:26 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting regionserver, logging to /homes/enis/code/hbase-0.94/bin/../logs/hbase-enis-regionserver-rs3.example.com.out
  35. 12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826\. Reported num of rs:6

The output indicates that ChaosMonkey started the default PeriodicRandomActionPolicy policy, which is configured with all the available actions. It chose to run RestartActiveMaster and RestartRandomRs actions.

173.5.4. Available Policies

HBase ships with several ChaosMonkey policies, available in the hbase/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/policies/ directory.

173.5.5. Configuring Individual ChaosMonkey Actions

ChaosMonkey integration tests can be configured per test run. Create a Java properties file in the HBase CLASSPATH and pass it to ChaosMonkey using the -monkeyProps configuration flag. Configurable properties, along with their default values if applicable, are listed in the org.apache.hadoop.hbase.chaos.factories.MonkeyConstants class. For properties that have defaults, you can override them by including them in your properties file.

The following example uses a properties file called monkey.properties.

  1. $ bin/hbase org.apache.hadoop.hbase.IntegrationTestIngest -m slowDeterministic -monkeyProps monkey.properties

The above command will start the integration tests and chaos monkey. It will look for the properties file monkey.properties on the HBase CLASSPATH; e.g. inside the HBASE conf dir.

Here is an example chaos monkey file:

Example ChaosMonkey Properties File

  1. sdm.action1.period=120000
  2. sdm.action2.period=40000
  3. move.regions.sleep.time=80000
  4. move.regions.max.time=1000000
  5. move.regions.sleep.time=80000
  6. batch.restart.rs.ratio=0.4f

Periods/time are expressed in milliseconds.

HBase 1.0.2 and newer adds the ability to restart HBase’s underlying ZooKeeper quorum or HDFS nodes. To use these actions, you need to configure some new properties, which have no reasonable defaults because they are deployment-specific, in your ChaosMonkey properties file, which may be hbase-site.xml or a different properties file.

  1. <property>
  2. <name>hbase.it.clustermanager.hadoop.home</name>
  3. <value>$HADOOP_HOME</value>
  4. </property>
  5. <property>
  6. <name>hbase.it.clustermanager.zookeeper.home</name>
  7. <value>$ZOOKEEPER_HOME</value>
  8. </property>
  9. <property>
  10. <name>hbase.it.clustermanager.hbase.user</name>
  11. <value>hbase</value>
  12. </property>
  13. <property>
  14. <name>hbase.it.clustermanager.hadoop.hdfs.user</name>
  15. <value>hdfs</value>
  16. </property>
  17. <property>
  18. <name>hbase.it.clustermanager.zookeeper.user</name>
  19. <value>zookeeper</value>
  20. </property>

174. Developer Guidelines

174.1. Branches

We use Git for source code management and latest development happens on master branch. There are branches for past major/minor/maintenance releases and important features and bug fixes are often back-ported to them.

174.2. Code Standards

174.2.1. Interface Classifications

Interfaces are classified both by audience and by stability level. These labels appear at the head of a class. The conventions followed by HBase are inherited by its parent project, Hadoop.

The following interface classifications are commonly used:

InterfaceAudience

@InterfaceAudience.Public

APIs for users and HBase applications. These APIs will be deprecated through major versions of HBase.

@InterfaceAudience.Private

APIs for HBase internals developers. No guarantees on compatibility or availability in future versions. Private interfaces do not need an @InterfaceStability classification.

@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC)

APIs for HBase coprocessor writers.

No @InterfaceAudience Classification

Packages without an @InterfaceAudience label are considered private. Mark your new packages if publicly accessible.

Excluding Non-Public Interfaces from API Documentatio

Only interfaces classified @InterfaceAudience.Public should be included in API documentation (Javadoc). Committers must add new package excludes ExcludePackageNames section of the pom.xml for new packages which do not contain public classes.

@InterfaceStability

@InterfaceStability is important for packages marked @InterfaceAudience.Public.

@InterfaceStability.Stable

Public packages marked as stable cannot be changed without a deprecation path or a very good reason.

@InterfaceStability.Unstable

Public packages marked as unstable can be changed without a deprecation path.

@InterfaceStability.Evolving

Public packages marked as evolving may be changed, but it is discouraged.

No @InterfaceStability Label

Public classes with no @InterfaceStability label are discouraged, and should be considered implicitly unstable.

If you are unclear about how to mark packages, ask on the development list.

174.2.2. Code Formatting Conventions

Please adhere to the following guidelines so that your patches can be reviewed more quickly. These guidelines have been developed based upon common feedback on patches from new contributors.

See the Code Conventions for the Java Programming Language for more information on coding conventions in Java. See eclipse.code.formatting to setup Eclipse to check for some of these guidelines automatically.

Space Invaders

Do not use extra spaces around brackets. Use the second style, rather than the first.

  1. if ( foo.equals( bar ) ) { // don't do this
  1. if (foo.equals(bar)) {
  1. foo = barArray[ i ]; // don't do this
  1. foo = barArray[i];

Auto Generated Code

Auto-generated code in Eclipse often uses bad variable names such as arg0. Use more informative variable names. Use code like the second example here.

  1. public void readFields(DataInput arg0) throws IOException { // don't do this
  2. foo = arg0.readUTF(); // don't do this
  1. public void readFields(DataInput di) throws IOException {
  2. foo = di.readUTF();

Long Lines

Keep lines less than 100 characters. You can configure your IDE to do this automatically.

  1. Bar bar = foo.veryLongMethodWithManyArguments(argument1, argument2, argument3, argument4, argument5, argument6, argument7, argument8, argument9); // don't do this
  1. Bar bar = foo.veryLongMethodWithManyArguments(
  2. argument1, argument2, argument3,argument4, argument5, argument6, argument7, argument8, argument9);

Trailing Spaces

Be sure there is a line break after the end of your code, and avoid lines with nothing but whitespace. This makes diffs more meaningful. You can configure your IDE to help with this.

  1. Bar bar = foo.getBar(); <--- imagine there is an extra space(s) after the semicolon.

API Documentation (Javadoc)

Don’t forget Javadoc!

Javadoc warnings are checked during precommit. If the precommit tool gives you a ‘-1’, please fix the javadoc issue. Your patch won’t be committed if it adds such warnings.

Also, no @author tags - that’s a rule.

Findbugs

Findbugs is used to detect common bugs pattern. It is checked during the precommit build. If errors are found, please fix them. You can run findbugs locally with mvn findbugs:findbugs, which will generate the findbugs files locally. Sometimes, you may have to write code smarter than findbugs. You can annotate your code to tell findbugs you know what you’re doing, by annotating your class with the following annotation:

  1. @edu.umd.cs.findbugs.annotations.SuppressWarnings(
  2. value="HE_EQUALS_USE_HASHCODE",
  3. justification="I know what I'm doing")

It is important to use the Apache-licensed version of the annotations. That generally means using annotations in the edu.umd.cs.findbugs.annotations package so that we can rely on the cleanroom reimplementation rather than annotations in the javax.annotations package.

Javadoc - Useless Defaults

Don’t just leave javadoc tags the way IDE generates them, or fill redundant information in them.

  1. /**
  2. * @param table <---- don't leave them empty!
  3. * @param region An HRegion object. <---- don't fill redundant information!
  4. * @return Foo Object foo just created. <---- Not useful information
  5. * @throws SomeException <---- Not useful. Function declarations already tell that!
  6. * @throws BarException when something went wrong <---- really?
  7. */
  8. public Foo createFoo(Bar bar);

Either add something descriptive to the tags, or just remove them. The preference is to add something descriptive and useful.

One Thing At A Time, Folks

If you submit a patch for one thing, don’t do auto-reformatting or unrelated reformatting of code on a completely different area of code.

Likewise, don’t add unrelated cleanup or refactorings outside the scope of your Jira.

Ambiguous Unit Tests

Make sure that you’re clear about what you are testing in your unit tests and why.

Implementing Writable

Applies pre-0.96 onl

In 0.96, HBase moved to protocol buffers (protobufs). The below section on Writables applies to 0.94.x and previous, not to 0.96 and beyond.

Every class returned by RegionServers must implement the Writable interface. If you are creating a new class that needs to implement this interface, do not forget the default constructor.

174.2.3. Garbage-Collection Conserving Guidelines

The following guidelines were borrowed from http://engineering.linkedin.com/performance/linkedin-feed-faster-less-jvm-garbage. Keep them in mind to keep preventable garbage collection to a minimum. Have a look at the blog post for some great examples of how to refactor your code according to these guidelines.

  • Be careful with Iterators

  • Estimate the size of a collection when initializing

  • Defer expression evaluation

  • Compile the regex patterns in advance

  • Cache it if you can

  • String Interns are useful but dangerous

174.3. Invariants

We don’t have many but what we have we list below. All are subject to challenge of course but until then, please hold to the rules of the road.

174.3.1. No permanent state in ZooKeeper

ZooKeeper state should transient (treat it like memory). If ZooKeeper state is deleted, hbase should be able to recover and essentially be in the same state.

  • .Exceptions: There are currently a few exceptions that we need to fix around whether a table is enabled or disabled.

  • Replication data is currently stored only in ZooKeeper. Deleting ZooKeeper data related to replication may cause replication to be disabled. Do not delete the replication tree, /hbase/replication/.

    Replication may be disrupted and data loss may occur if you delete the replication tree (/hbase/replication/) from ZooKeeper. Follow progress on this issue at HBASE-10295.

174.4. Running In-Situ

If you are developing Apache HBase, frequently it is useful to test your changes against a more-real cluster than what you find in unit tests. In this case, HBase can be run directly from the source in local-mode. All you need to do is run:

  1. ${HBASE_HOME}/bin/start-hbase.sh

This will spin up a full local-cluster, just as if you had packaged up HBase and installed it on your machine.

Keep in mind that you will need to have installed HBase into your local maven repository for the in-situ cluster to work properly. That is, you will need to run:

  1. mvn clean install -DskipTests

to ensure that maven can find the correct classpath and dependencies. Generally, the above command is just a good thing to try running first, if maven is acting oddly.

174.5. Adding Metrics

After adding a new feature a developer might want to add metrics. HBase exposes metrics using the Hadoop Metrics 2 system, so adding a new metric involves exposing that metric to the hadoop system. Unfortunately the API of metrics2 changed from hadoop 1 to hadoop 2. In order to get around this a set of interfaces and implementations have to be loaded at runtime. To get an in-depth look at the reasoning and structure of these classes you can read the blog post located here. To add a metric to an existing MBean follow the short guide below:

174.5.1. Add Metric name and Function to Hadoop Compat Interface.

Inside of the source interface the corresponds to where the metrics are generated (eg MetricsMasterSource for things coming from HMaster) create new static strings for metric name and description. Then add a new method that will be called to add new reading.

174.5.2. Add the Implementation to Both Hadoop 1 and Hadoop 2 Compat modules.

Inside of the implementation of the source (eg. MetricsMasterSourceImpl in the above example) create a new histogram, counter, gauge, or stat in the init method. Then in the method that was added to the interface wire up the parameter passed in to the histogram.

Now add tests that make sure the data is correctly exported to the metrics 2 system. For this the MetricsAssertHelper is provided.

174.6. Git Best Practices

Avoid git merges.

Use git pull --rebase or git fetch followed by git rebase.

Do not use git push --force.

If the push does not work, fix the problem or ask for help.

Please contribute to this document if you think of other Git best practices.

174.6.1. rebase_all_git_branches.sh

The dev-support/rebase_all_git_branches.sh script is provided to help keep your Git repository clean. Use the -h parameter to get usage instructions. The script automatically refreshes your tracking branches, attempts an automatic rebase of each local branch against its remote branch, and gives you the option to delete any branch which represents a closed HBASE- JIRA. The script has one optional configuration option, the location of your Git directory. You can set a default by editing the script. Otherwise, you can pass the git directory manually by using the -d parameter, followed by an absolute or relative directory name, or even ‘.’ for the current working directory. The script checks the directory for sub-directory called .git/, before proceeding.

174.7. Submitting Patches

If you are new to submitting patches to open source or new to submitting patches to Apache, start by reading the On Contributing Patches page from Apache Commons Project. It provides a nice overview that applies equally to the Apache HBase Project.

174.7.1. Create Patch

Make sure you review common.patch.feedback for code style. If your patch was generated incorrectly or your code does not adhere to the code formatting guidelines, you may be asked to redo some work.

Using submit-patch.py (recommended)

  1. $ dev-support/submit-patch.py -jid HBASE-xxxxx

Use this script to create patches, upload to jira and optionally create/update reviews on Review Board. Patch name is automatically formatted as (JIRA).(branch name).(patch number).patch to follow Yetus’ naming rules. Use -h flag to know detailed usage information. Most useful options are:

  • -b BRANCH, --branch BRANCH : Specify base branch for generating the diff. If not specified, tracking branch is used. If there is no tracking branch, error will be thrown.

  • -jid JIRA_ID, --jira-id JIRA_ID : If used, deduces next patch version from attachments in the jira and uploads the new patch. Script will ask for jira username/password for authentication. If not set, patch is named .patch.

By default, it’ll also create/update review board. To skip that action, use -srb option. It uses ‘Issue Links’ in the jira to figure out if a review request already exists. If no review request is present, then creates a new one and populates all required fields using jira summary, patch description, etc. Also adds this review’s link to the jira.

Save authentication credentials (optional)

Since attaching patches on JIRA and creating/changing review request on ReviewBoard requires valid user authentication, the script will prompt you for username and password. To avoid the hassle every time, set up ~/.apache-creds with login details and encrypt it by following the steps in footer of script’s help message.

Python dependencies

To install required python dependencies, execute pip install -r dev-support/python-requirements.txt from the master branch.

Manually

  1. Use git rebase -i first, to combine (squash) smaller commits into a single larger one.

  2. Create patch using IDE or Git commands. git format-patch is preferred since it preserves patch author’s name and commit message. Also, it handles binary files by default, whereas git diff ignores them unless you use the --binary option.

  3. Patch name should be as follows to adhere to Yetus’ naming convention:
    (JIRA).(branch name).(patch number).patch
    For eg. HBASE-11625.master.001.patch, HBASE-XXXXX.branch-1.2.0005.patch, etc.

  4. Attach the patch to the JIRA using More→Attach Files then click on Submit Patch button, which’ll trigger Hudson job to check patch for validity.

  5. If your patch is longer than a single screen, also create a review on Review Board and add the link to JIRA. See reviewboard.

Few general guidelines

  • Always patch against the master branch first, even if you want to patch in another branch. HBase committers always apply patches first to the master branch, and backport if necessary.

  • Submit one single patch for a fix. If necessary, squash local commits to merge local commits into a single one first. See this Stack Overflow question for more information about squashing commits.

  • Please understand that not every patch may get committed, and that feedback will likely be provided on the patch.

  • If you need to revise your patch, leave the previous patch file(s) attached to the JIRA, and upload a new one with incremented patch number.
    Click on Cancel Patch and then on Submit Patch to trigger the presubmit run.

174.7.2. Unit Tests

Always add and/or update relevant unit tests when making the changes. Make sure that new/changed unit tests pass locally before submitting the patch because it is faster than waiting for presubmit result which runs full test suite. This will save your own time and effort. Use mockito to make mocks which are very useful for testing failure scenarios by injecting appropriate failures.

If you are creating a new unit test class, notice how other unit test classes have classification/sizing annotations before class name and a static methods for setup/teardown of testing environment. Be sure to include annotations in any new unit test files. See hbase.tests for more information on tests.

174.7.3. Integration Tests

Significant new features should provide an integration test in addition to unit tests, suitable for exercising the new feature at different points in its configuration space.

174.7.4. ReviewBoard

Patches larger than one screen, or patches that will be tricky to review, should go through ReviewBoard.

Procedure: Use ReviewBoard

  1. Register for an account if you don’t already have one. It does not use the credentials from issues.apache.org. Log in.

  2. Click New Review Request.

  3. Choose the hbase-git repository. Click Choose File to select the diff and optionally a parent diff. Click Create Review Request.

  4. Fill in the fields as required. At the minimum, fill in the Summary and choose hbase as the Review Group. If you fill in the Bugs field, the review board links back to the relevant JIRA. The more fields you fill in, the better. Click Publish to make your review request public. An email will be sent to everyone in the hbase group, to review the patch.

  5. Back in your JIRA, click , and paste in the URL of your ReviewBoard request. This attaches the ReviewBoard to the JIRA, for easy access.

  6. To cancel the request, click .

For more information on how to use ReviewBoard, see the ReviewBoard documentation.

174.7.5. Guide for HBase Committers

Becoming a committer

Committers are responsible for reviewing and integrating code changes, testing and voting on release candidates, weighing in on design discussions, as well as other types of project contributions. The PMC votes to make a contributor a committer based on an assessment of their contributions to the project. It is expected that committers demonstrate a sustained history of high-quality contributions to the project and community involvement.

Contributions can be made in many ways. There is no single path to becoming a committer, nor any expected timeline. Submitting features, improvements, and bug fixes is the most common avenue, but other methods are both recognized and encouraged (and may be even more important to the health of HBase as a project and a community). A non-exhaustive list of potential contributions (in no particular order):

  • Update the documentation for new changes, best practices, recipes, and other improvements.

  • Keep the website up to date.

  • Perform testing and report the results. For instance, scale testing and testing non-standard configurations is always appreciated.

  • Maintain the shared Jenkins testing environment and other testing infrastructure.

  • Vote on release candidates after performing validation, even if non-binding. A non-binding vote is a vote by a non-committer.

  • Provide input for discussion threads on the mailing lists (which usually have [DISCUSS] in the subject line).

  • Answer questions questions on the user or developer mailing lists and on Slack.

  • Make sure the HBase community is a welcoming one and that we adhere to our Code of conduct. Alert the PMC if you have concerns.

  • Review other people’s work (both code and non-code) and provide public feedback.

  • Report bugs that are found, or file new feature requests.

  • Triage issues and keep JIRA organized. This includes closing stale issues, labeling new issues, updating metadata, and other tasks as needed.

  • Mentor new contributors of all sorts.

  • Give talks and write blogs about HBase. Add these to the News section of the website.

  • Provide UX feedback about HBase, the web UI, the CLI, APIs, and the website.

  • Write demo applications and scripts.

  • Help attract and retain a diverse community.

  • Interact with other projects in ways that benefit HBase and those other projects.

Not every individual is able to do all (or even any) of the items on this list. If you think of other ways to contribute, go for it (and add them to the list). A pleasant demeanor and willingness to contribute are all you need to make a positive impact on the HBase project. Invitations to become a committer are the result of steady interaction with the community over the long term, which builds trust and recognition.

New committers

New committers are encouraged to first read Apache’s generic committer documentation:

Review

HBase committers should, as often as possible, attempt to review patches submitted by others. Ideally every submitted patch will get reviewed by a committer within a few days. If a committer reviews a patch they have not authored, and believe it to be of sufficient quality, then they can commit the patch. Otherwise the patch should be cancelled with a clear explanation for why it was rejected.

The list of submitted patches is in the HBase Review Queue, which is ordered by time of last modification. Committers should scan the list from top to bottom, looking for patches that they feel qualified to review and possibly commit. If you see a patch you think someone else is better qualified to review, you can mention them by username in the JIRA.

For non-trivial changes, it is required that another committer review your patches before commit. Self-commits of non-trivial patches are not allowed. Use the Submit Patch button in JIRA, just like other contributors, and then wait for a +1 response from another committer before committing.

Reject

Patches which do not adhere to the guidelines in HowToContribute and to the code review checklist should be rejected. Committers should always be polite to contributors and try to instruct and encourage them to contribute better patches. If a committer wishes to improve an unacceptable patch, then it should first be rejected, and a new patch should be attached by the committer for further review.

Commit

Committers commit patches to the Apache HBase GIT repository.

Before you commit!!!

Make sure your local configuration is correct, especially your identity and email. Examine the output of the $ git config —list command and be sure it is correct. See Set Up Git if you need pointers.

When you commit a patch:

  1. Include the Jira issue ID in the commit message along with a short description of the change. Try to add something more than just the Jira title so that someone looking at git log output doesn’t have to go to Jira to discern what the change is about. Be sure to get the issue ID right, because this causes Jira to link to the change in Git (use the issue’s “All” tab to see these automatic links).

  2. Commit the patch to a new branch based off master or the other intended branch. It’s a good idea to include the JIRA ID in the name of this branch. Check out the relevant target branch where you want to commit, and make sure your local branch has all remote changes, by doing a git pull —rebase or another similar command. Next, cherry-pick the change into each relevant branch (such as master), and push the changes to the remote branch using a command such as git push .

    If you do not have all remote changes, the push will fail. If the push fails for any reason, fix the problem or ask for help. Do not do a git push —force.


Before you can commit a patch, you need to determine how the patch was created. The instructions and preferences around the way to create patches have changed, and there will be a transition period.
Determine How a Patch Was Created

  • If the first few lines of the patch look like the headers of an email, with a From, Date, and Subject, it was created using git format-patch. This is the preferred way, because you can reuse the submitter’s commit message. If the commit message is not appropriate, you can still use the commit, then run git commit --amend and reword as appropriate.

  • If the first line of the patch looks similar to the following, it was created using git diff without --no-prefix. This is acceptable too. Notice the a and b in front of the file names. This is the indication that the patch was not created with --no-prefix.

    1. diff --git a/src/main/asciidoc/_chapters/developer.adoc b/src/main/asciidoc/_chapters/developer.adoc
  • If the first line of the patch looks similar to the following (without the a and b), the patch was created with git diff —no-prefix and you need to add -p0 to the git apply command below.
    1. diff --git src/main/asciidoc/_chapters/developer.adoc src/main/asciidoc/_chapters/developer.adoc

master``branch-1 `—no-prefix```` $ git checkout -b HBASE-XXXX $ git am ~/Downloads/HBASE-XXXX-v2.patch —signoff # If you are committing someone else’s patch. $ git checkout master $ git pull —rebase $ git cherry-pick

Resolve conflicts if necessary or ask the submitter to do it

$ git pull —rebase # Better safe than sorry $ git push origin master

Backport to branch-1

$ git checkout branch-1 $ git pull —rebase $ git cherry-pick

Resolve conflicts if necessary

$ git pull —rebase # Better safe than sorry $ git push origin branch-1 $ git branch -D HBASE-XXXX

  1. `--no-prefix``--no-prefix``-p0`

$ git apply ~/Downloads/HBASE-XXXX-v2.patch $ git commit -m “HBASE-XXXX Really Good Code Fix (Joe Schmo)” —author= -a # This and next command is needed for patches created with ‘git diff’ $ git commit —amend —signoff $ git checkout master $ git pull —rebase $ git cherry-pick

Resolve conflicts if necessary or ask the submitter to do it

$ git pull —rebase # Better safe than sorry $ git push origin master

Backport to branch-1

$ git checkout branch-1 $ git pull —rebase $ git cherry-pick

Resolve conflicts if necessary or ask the submitter to do it

$ git pull —rebase # Better safe than sorry $ git push origin branch-1 $ git branch -D HBASE-XXXX

  1. 3.
  2. Resolve the issue as fixed, thanking the contributor. Always set the "Fix Version" at this point, but only set a single fix version for each branch where the change was committed, the earliest release in that branch in which the change will appear.
  3. <a name="3bcd1ff5"></a>
  4. ###### Commit Message Format
  5. The commit message should contain the JIRA ID and a description of what the patch does. The preferred commit message format is:

()

HBASE-12345 Fix All The Things (jane@example.com) ```

If the contributor used git format-patch to generate the patch, their commit message is in their patch and you can use that, but be sure the JIRA ID is at the front of the commit message, even if the contributor left it out.

Add Amending-Author when a conflict cherrypick backporting

We’ve established the practice of committing to master and then cherry picking back to branches whenever possible, unless

  • it’s breaking compat: In which case, if it can go in minor releases, backport to branch-1 and branch-2.

  • it’s a new feature: No for maintenance releases, For minor releases, discuss and arrive at consensus.

When there is a minor conflict we can fix it up and just proceed with the commit. The resulting commit retains the original author. When the amending author is different from the original committer, add notice of this at the end of the commit message as: Amending-Author: Author <committer&apache> See discussion at [HBase, mail # dev - DISCUSSION Best practice when amending commits cherry picked from master to branch].

Close related GitHub PRs

As a project we work to ensure there’s a JIRA associated with each change, but we don’t mandate any particular tool be used for reviews. Due to implementation details of the ASF’s integration between hosted git repositories and GitHub, the PMC has no ability to directly close PRs on our GitHub repo. In the event that a contributor makes a Pull Request on GitHub, either because the contributor finds that easier than attaching a patch to JIRA or because a reviewer prefers that UI for examining changes, it’s important to make note of the PR in the commit that goes to the master branch so that PRs are kept up to date.

To read more about the details of what kinds of commit messages will work with the GitHub “close via keyword in commit” mechanism see the GitHub documentation for “Closing issues using keywords”. In summary, you should include a line with the phrase “closes #XXX”, where the XXX is the pull request id. The pull request id is usually given in the GitHub UI in grey at the end of the subject heading.

Committers are responsible for making sure commits do not break the build or tests

If a committer commits a patch, it is their responsibility to make sure it passes the test suite. It is helpful if contributors keep an eye out that their patch does not break the hbase build and/or tests, but ultimately, a contributor cannot be expected to be aware of all the particular vagaries and interconnections that occur in a project like HBase. A committer should.

Patching Etiquette

In the thread HBase, mail # dev - ANNOUNCEMENT: Git Migration In Progress (WAS ⇒ Re: Git Migration), it was agreed on the following patch flow

  1. Develop and commit the patch against master first.

  2. Try to cherry-pick the patch when backporting if possible.

  3. If this does not work, manually commit the patch to the branch.

Merge Commits

Avoid merge commits, as they create problems in the git history.

Committing Documentation

See appendix contributing to documentation.

174.7.6. Dialog

Committers should hang out in the #hbase room on irc.freenode.net for real-time discussions. However any substantive discussion (as with any off-list project-related discussion) should be re-iterated in Jira or on the developer list.

174.7.7. Do not edit JIRA comments

Misspellings and/or bad grammar is preferable to the disruption a JIRA comment edit causes: See the discussion at Re:(HBASE-451) Remove HTableDescriptor from HRegionInfo

174.8. The hbase-thirdparty dependency and shading/relocation

A new project was created for the release of hbase-2.0.0. It was called hbase-thirdparty. This project exists only to provide the main hbase project with relocated — or shaded — versions of popular thirdparty libraries such as guava, netty, and protobuf. The mainline HBase project relies on the relocated versions of these libraries gotten from hbase-thirdparty rather than on finding these classes in their usual locations. We do this so we can specify whatever the version we wish. If we don’t relocate, we must harmonize our version to match that which hadoop, spark, and other projects use.

For developers, this means you need to be careful referring to classes from netty, guava, protobuf, gson, etc. (see the hbase-thirdparty pom.xml for what it provides). Devs must refer to the hbase-thirdparty provided classes. In practice, this is usually not an issue (though it can be a bit of a pain). You will have to hunt for the relocated version of your particular class. You’ll find it by prepending the general relocation prefix of org.apache.hbase.thirdparty.. For example if you are looking for com.google.protobuf.Message, the relocated version used by HBase internals can be found at org.apache.hbase.thirdparty.com.google.protobuf.Message.

For a few thirdparty libs, like protobuf (see the protobuf chapter in this book for the why), your IDE may give you both options — the com.google.protobuf. and the org.apache.hbase.thirdparty.com.google.protobuf. — because both classes are on your CLASSPATH. Unless you are doing the particular juggling required in Coprocessor Endpoint development (again see above cited protobuf chapter), you’ll want to use the shaded version, always.

The hbase-thirdparty project has groupid of org.apache.hbase.thirdparty. As of this writing, it provides three jars; one for netty with an artifactid of hbase-thirdparty-netty, one for protobuf at hbase-thirdparty-protobuf and then a jar for all else — gson, guava — at hbase-thirdpaty-miscellaneous.

The hbase-thirdparty artifacts are a product produced by the Apache HBase project under the aegis of the HBase Project Management Committee. Releases are done via the usual voting project on the hbase dev mailing list. If issue in the hbase-thirdparty, use the hbase JIRA and mailing lists to post notice.

174.9. Development of HBase-related Maven archetypes

The development of HBase-related Maven archetypes was begun with HBASE-14876. For an overview of the hbase-archetypes infrastructure and instructions for developing new HBase-related Maven archetypes, please see hbase/hbase-archetypes/README.md.