Ref: https://www.baeldung.com/java-bitset

1. Overview

In this tutorial, we’re going to see how we can use BitSets to represent a vector of bits.
First, we’ll start with the rationale behind not using the boolean[]. Then after getting familiar with the _BitSet _internals, we’ll take a closer look at its API.

2. Array of Bits

To store and manipulate arrays of bits, one might argue that we should use _boolean[] _as our data structure. At first glance, that might seem a reasonable suggestion.
However, each _boolean _member in a _boolean[] _usually consumes one byte instead of just one bit. So when we have tight memory requirements, or we’re just aiming for a reduced memory footprint, _boolean[] _are far from being ideal.
To make matters more concrete, let’s see how much space a _boolean[] _with 1024 elements consumes:

boolean[] bits = new boolean[1024]; System.out.println(ClassLayout.parseInstance(bits).toPrintable());
Ideally, we expect a 1024-bit memory footprint from this array. However, the Java Object Layout (JOL) reveals an entirely different reality:

  1. [Z object internals:
  2. OFFSET SIZE TYPE DESCRIPTION VALUE
  3. 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
  4. 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
  5. 8 4 (object header) 7b 12 07 00 (01111011 00010010 00000111 00000000) (463483)
  6. 12 4 (object header) 00 04 00 00 (00000000 00000100 00000000 00000000) (1024)
  7. 16 1024 boolean [Z. N/A
  8. Instance size: 1040 bytes

If we ignore the overhead of object header, the array elements are consuming 1024 bytes, instead of the expected 1024 bits. That’s 700% more memory than what we expected.
The addressability issues and word tearing are the main reasons why _boolean_s are more than just one single bit.
To solve this problem, we can use a combination of numeric data types (such as long) and bit-wise operations. That’s where the _BitSet _comes in.

3. How _BitSet _Works

As we mentioned earlier, to achieve the one bit per flag memory usage, the _BitSet _API uses a combination of basic numeric data types and bit-wise operations.

For the sake of simplicity, let’s suppose we’re going to represent eight flags with one byte. At first, we initialize all bits of this single byte with zero:
image.png
Now if we want to set the bit at position three to true, we should first left-shift the number 1 by three:
image.png
And then or _its result with the current _byte value:
image.png
The same process will happen if decide to set the bit at the index seven:
image.png
As shown above, we perform a left-shift by seven bits and combine the result with the previous byte value using the _or _operator.

3.1. Getting a Bit Index

To check if a particular bit index is set to _true _or not, we’ll use the _and _operator. For instance, here’s how we check if index three is set:

  1. Performing a left-shift by three bits on the value one
  2. Anding _the result with the current _byte value
  3. If the result is greater than zero, then we found a match, and that bit index is actually set. Otherwise, the requested index is clear or is equal to false

image.png
The above diagram shows the get operation steps for index three. If we inquire about a clear index, however, the result will be different:
image.png
Since the _and _result is equal to zero, index four is clear.

3.2. Growing the Storage

Currently, we can only store a vector of 8 bits. To go beyond this limitation, we just have to use an array of byte_s, instead of a single _byte, that’s it!
Now, every time we need to set, get, or clear a specific index, we should find the corresponding array element, first. For instance, let’s suppose we’re going to set index 14:
image.png
As shown in the above diagram, after finding the right array element, we did set the appropriate index.
Also, if we want to set an index beyond 15 here, the BitSet _will expand its internal array, first. Only after expanding the array and copying the elements will it set the requested bit. This is somewhat similar to how ArrayList _works internally.
So far, we used the _byte _data type for the sake of simplicity. The _BitSet _API, however, is using an array of _long _values internally.

4. The _BitSet _API

Now that we know enough about the theory, it’s time to see what the _BitSet _API looks like.
For starters, let’s compare the memory footprint of a _BitSet _instance with 1024 bits with the _boolean[] _we saw earlier:
BitSet bitSet = new BitSet(1024); System.out.println(GraphLayout.parseInstance(bitSet).toPrintable());
This will print both the shallow size of the _BitSet _instance and the size of its internal array:
java.util.BitSet@75412c2fd object externals: ADDRESS SIZE TYPE PATH VALUE 70f97d208 24 java.util.BitSet (object) 70f97d220 144 [J .words [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
As shown above, it uses a _long[] _with 16 elements (16 64 bits = 1024 bits) internally. Anyway, this instance is using 168 bytes in total, while the _boolean[] _were using 1024 bytes.
The more bits we have, the more the footprint difference increases. For example, to store 1024
1024 bits, the _boolean[] _consumes 1 MB, and the _BitSet _instance consumes around 130 KB.

4.1. Constructing _BitSet_s

The simplest way to create a _BitSet _instance is to use the no-arg constructor):
BitSet bitSet = new BitSet();
This will create a _BitSet _instance with a _long[] _of size one. Of course, it can automatically grow this array if needed.
It’s also possible to create a _BitSet _with an initial number of bits):

BitSet bitSet = new BitSet(100000);
Here, the internal array will have enough elements to hold 100,000 bits. This constructor comes in handy when we already have a reasonable estimate on the number of bits to store. In such use cases, it can prevent or decrease the unnecessary copying of array elements while growing it.
It’s even possible to create a _BitSet _from an existing _long[]
, byte[], LongBuffer, and ByteBuffer. For instance, here we’re creating a BitSet _instance from a given _long[]:
BitSet bitSet = BitSet.valueOf(new long[] { 42, 12 });
There are three more overloaded versions of the valueOf())_ _static factory method to support the other mentioned types.

4.2. Setting Bits

We can set the value of a particular index to true _using the set(index)) method:
BitSet bitSet = new BitSet(); bitSet.set(10); assertThat(bitSet.get(10)).isTrue();
As usual, the indices are zero-based. It’s even possible to set a range of bits to _true _using the set(fromInclusive, toExclusive))**
method:
bitSet.set(20, 30);
for (int** i = 20; i <= 29; i++) { assertThat(bitSet.get(i)).isTrue(); } assertThat(bitSet.get(30)).isFalse();
As is evident from the method signature, the beginning index is inclusive, and the ending one is exclusive.
When we say setting an index, we usually mean setting it to _true
. Despite this terminology, we can set a particular bit index to false _using the set(index, boolean)) _method:
bitSet.set(10, false); assertThat(bitSet.get(10)).isFalse();
This version also supports setting a range of values:

bitSet.set(20, 30, false); for (int i = 20; i <= 29; i++) { assertThat(bitSet.get(i)).isFalse(); }

4.3. Clearing Bits

Instead of setting a specific bit index to false, we can simply clear it using the clear(index)) _method:
bitSet.set(42); assertThat(bitSet.get(42)).isTrue(); bitSet.clear(42); assertThat(bitSet.get(42)).isFalse();
Moreover, we can also clear a range of bits with the clear(fromInclusive, toExclusive))
overloaded version:
bitSet.set(10, 20); for (int i = 10; i < 20; i++) { assertThat(bitSet.get(i)).isTrue(); } bitSet.clear(10, 20); for (int i = 10; i < 20; i++) { assertThat(bitSet.get(i)).isFalse(); }
Interestingly, if we call this method without passing any arguments, it’ll clear all the set bits:
bitSet.set(10, 20); bitSet.clear(); for (int i = 0; i < 100; i++) { assertThat(bitSet.get(i)).isFalse(); }
As shown above, after calling the clear())
_method, all bits are set to zero.

4.4. Getting Bits

So far, we used the get(index)) _method quite extensively. **When the requested bit index is set, then this method will return _true. Otherwise, it’ll return false:
bitSet.set(42); assertThat(bitSet.get(42)).isTrue(); assertThat(bitSet.get(43)).isFalse();
Similar to set _and _clear, we can get a range of bit indices using the get(fromInclusive, toExclusive))_ _method:
bitSet.set(10, 20); BitSet newBitSet = bitSet.get(10, 20);
for (int** i = 0; i < 10; i++) { assertThat(newBitSet.get(i)).isTrue(); }
As shown above, this method returns another BitSet in the [20, 30) range of the current one. That is, index 20 of the _bitSet _variable is equivalent to index zero of the _newBitSet _variable.

4.5. Flipping Bits

To negate the current bit index value, we can use the flip(index))_ _method. That is, it’ll turn _true _values to _false _and vice versa:

bitSet.set(42); bitSet.flip(42); assertThat(bitSet.get(42)).isFalse(); bitSet.flip(12); assertThat(bitSet.get(12)).isTrue();
Similarly, we can achieve the same thing for a range of values using the flip(fromInclusive, toExclusive))_ _method:
bitSet.flip(30, 40); for (int i = 30; i < 40; i++) { assertThat(bitSet.get(i)).isTrue(); }

4.6. Length

There are three length-like methods for a BitSet. The size())_ _method returns the number of bits the internal array can represent. For instance, since the no-arg constructor allocates a long[] _array with one element, then the _size() _will return 64 for it:
BitSet defaultBitSet = new BitSet(); assertThat(defaultBitSet.size()).isEqualTo(64);
With one 64-bit number, we can only represent 64 bits. Of course, this will change if we pass the number of bits explicitly:
BitSet bitSet = new BitSet(1024); assertThat(bitSet.size()).isEqualTo(1024);
Moreover, the cardinality())**
method represents the number of set bits in a _BitSet:
assertThat(bitSet.cardinality()).isEqualTo(0); bitSet.set(10, 30); assertThat(bitSet.cardinality()).isEqualTo(30 - 10);
At first, this method returns zero as all bits are false. After setting the [10, 30) range to true, then the _cardinality() _method call returns 20.
Also, the length()) _method returns the one index after the index of the last set bit:
assertThat(bitSet.length()).isEqualTo(30); bitSet.set(100); assertThat(bitSet.length()).isEqualTo(101);
At first, the last set index is 29, so this method returns 30. When we set the index 100 to true, then the _length() _method returns 101.
It’s also worth mentioning that this method will return zero if all bits are clear**.
Finally, the isEmpty())
method returns _false _when there is at least one set bit in the _BitSet. Otherwise, it’ll return true:

assertThat(bitSet.isEmpty()).isFalse(); bitSet.clear(); assertThat(bitSet.isEmpty()).isTrue();

4.7. Combining With Other _BitSet_s

The intersects(BitSet))_ _method takes another _BitSet _and returns _true _when two _BitSet_s have something in common. That is, they have at least one set bit in the same index:
BitSet first = new BitSet(); first.set(5, 10); BitSet second = new BitSet(); second.set(7, 15); assertThat(first.intersects(second)).isTrue();
The [7, 9] range is set in both BitSet_s, so this method returns _true.
It’s also possible to perform the logical _and _operation on two _BitSet_s:
first.and(second); assertThat(first.get(7)).isTrue(); assertThat(first.get(8)).isTrue(); assertThat(first.get(9)).isTrue(); assertThat(first.get(10)).isFalse();
This will perform a logical and _between the two _BitSet_s and modifies the _first _variable with the result. Similarly, we can perform a logical _xor _on two _BitSet_s, too:
first.clear(); first.set(5, 10); first.xor(second); for (int i = 5; i < 7; i++) { assertThat(first.get(i)).isTrue(); } for (int i = 10; i < 15; i++) { assertThat(first.get(i)).isTrue(); }
There are other methods such as the andNot(BitSet))or the or(BitSet))
, _which can perform other logical operations on two _BitSet_s.

4.8. Miscellaneous

As of Java 8, there is a stream()) _method to stream all set bits of a _BitSet. For instance:
BitSet bitSet = new BitSet(); bitSet.set(15, 25); bitSet.stream().forEach(System.out::println);
This will print all set bits to the console. Since this will return an IntStream, we can perform common numerical operations such as summation, average, counting, and so on. For instance, here we’re counting the number of set bits:
assertThat(bitSet.stream().count()).isEqualTo(10);
Also, the nextSetBit(fromIndex)) _method will return the next set bit index starting from the _fromIndex:

assertThat(bitSet.nextSetBit(13)).isEqualTo(15);
The fromIndex _itself is included in this calculation. When there isn’t any _true _bit left in the _BitSet, it’ll return -1:
assertThat(bitSet.nextSetBit(25)).isEqualTo(-1);
Similarly, the nextClearBit(fromIndex)) _returns the next clear index starting from the _fromIndex:
assertThat(bitSet.nextClearBit(23)).isEqualTo(25);
On the other hand, the previousClearBit(fromIndex)) _returns the index of the nearest clear index in the opposite direction:
assertThat(bitSet.previousClearBit(24)).isEqualTo(14);
Same is true forpreviousSetBit(fromIndex)):
assertThat(bitSet.previousSetBit(29)).isEqualTo(24); assertThat(bitSet.previousSetBit(14)).isEqualTo(-1);
Moreover, we can convert a _BitSet _to a _byte[] _or a _long[] _using the toByteArray())
or toLongArray()) _methods, respectively:
byte[] bytes = bitSet.toByteArray(); long[] longs = bitSet.toLongArray();

5. Conclusion

In this tutorial, we saw how we can use BitSet_s to represent a vector of bits.
At first, we got familiar with the rationale behind not using _boolean[]
to represent a vector of bits. Then we saw how a _BitSet _works internally and what its API looks like.
As usual, all the examples are available over on GitHub.