关于 HashSet

HashSet 实现了 Set 接口；

public class HashSet<E>
  extends AbstractSet<E>
  implements Set<E>, Cloneable, java.io.Serializable
{
  static final long serialVersionUID = -5024744406713321676L;
  private transient HashMap<E,Object> map;
  // Dummy value to associate with an Object in the backing Map
  private static final Object PRESENT = new Object();

不能有重复元素 / 对象，在前面 Set 接口使用时已经提到过。
可以存放 null 值，但因为元素不可重复，只能放一个 null；
HashSet 底层实际上是 HashMap，而 HashMap 的底层实际上是数组+链表+红黑树（JDK 7 是数组+链表、JDK 8 是数组+链表+红黑树、JDK X 是… ，此处以 JDK 8 为例说明）；
HashSet 不保证元素是有序的，取决于 hash 后，再确定索引的结果；

构造方法一览


    /**
     * Constructs a new, empty set; the backing HashMap instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<>();
    }
    /**
     * Constructs a new set containing the elements in the specified
     * collection.  The <tt>HashMap</tt> is created with default load factor
     * (0.75) and an initial capacity sufficient to contain the elements in
     * the specified collection.
     *
     * @param c the collection whose elements are to be placed into this set
     * @throws NullPointerException if the specified collection is null
     */
    public HashSet(Collection<? extends E> c) {
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }
    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * the specified initial capacity and the specified load factor.
     *
     * @param      initialCapacity   the initial capacity of the hash map
     * @param      loadFactor        the load factor of the hash map
     * @throws     IllegalArgumentException if the initial capacity is less
     *             than zero, or if the load factor is nonpositive
     */
    public HashSet(int initialCapacity, float loadFactor) {
        map = new HashMap<>(initialCapacity, loadFactor);
    }
    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * the specified initial capacity and default load factor (0.75).
     *
     * @param      initialCapacity   the initial capacity of the hash table
     * @throws     IllegalArgumentException if the initial capacity is less
     *             than zero
     */
    public HashSet(int initialCapacity) {
        map = new HashMap<>(initialCapacity);
    }

扩容机制

添加一个元素时，先得计算这个元素的 hash 值，查存储数据表 table 中，这个索引位置是否已经存放了元素。如果没有，直接放入数据表 table 中，如果有，调用元素的 equals 方法比较，如果相同，就放弃添加，如果不相同，则添加到链表的末尾。在 Java 8 中，如果一条链表的元素个数大等于 TREEIFY_THRESHOLD（默认 8 ） - 1，并且 table 的大小 >= MIN_TREEIFY_CAPACITY（默认 64 ），就会将该索引位置的链表转换为红黑树。

例题

1、唯一性

如何理解 HashSet 不能有重复元素 / 对象？看一下下面的例题，思考一下。

public class Main {
    public static void main(String[] args) {
        Set set = new HashSet();
        // 添加重复元素测试
        System.out.println("添加重复元素测试");
        System.out.println(set.add("test string"));// true
        System.out.println(set.add("test string"));// false
        System.out.println();
        // 添加值相同的对象测试
        System.out.println("添加值相同的对象测试");
        System.out.println(set.add(new Dog("jerry", 3)));// true
        System.out.println(set.add(new Dog("tom", 4)));// true
        System.out.println(set.add(new Dog("tom", 4)));// true 思考一下，为什么？
        System.out.println();
        // 再加深一下，添加 String 对象测试
        System.out.println("添加 String 对象测试");
        System.out.println(set.add(new String("123")));// true
        System.out.println(set.add(new String("123")));// false
        System.out.println(set.add("123"));// false
        System.out.println();
    }
}
class Dog {
    String name;
    int age;
    public Dog(String name, int age) {
        this.name = name;
        this.age = age;
    }
    @Override
    public String toString() {
        return "Dog{" +
                "name='" + name + '\'' +
                ", age=" + age +
                '}';
    }
}

2、无序性

HashSet 是不保证元素有序的，元素的顺序取决于 hash 后索引的结果。即 HashSet 无法保证在操作元素后，既有元素的顺序保持不变。

public class Main {
    public static void main(String[] args) {
        Set set = new HashSet();

        set.add(new Integer(5));
        set.add(6);
        set.add("qwe");
        System.out.println(set);// [5, 6, qwe]

        set.add("16546");
        System.out.println(set);// [5, 6, 16546, qwe]
    }
}

常用方法源码分析

1、HashSet add

https://www.yuque.com/jaded/fz8wr7/eb8nft

2、HashMap.rezise

当 HashSet 调用 add 方法添加元素时，实际上底层调用的是 HashMap 的 put 方法，所以，讲 HashSet 的扩容实际上就是在讲 HashMap 的扩容机制。
https://www.yuque.com/jaded/fz8wr7/bwl5vw

3、treeifyBin

https://www.yuque.com/jaded/fz8wr7/gr4tnv

问题

1、为什么说 HashSet 的底层是 HashMap？

先看一下 HashSet 的 map 属性：

private transient HashMap<E,Object> map;

再看一下 HashSet 的无参构造器：

public HashSet() {
    map = new HashMap<>();
}

可以看到 HashSet 对象被实例化时，底层创建了一个 HashMap 对象赋给了自身的 map 属性。正因为 HashSet 的各个操作（例如 add / remove / clear 等等）都是基于自身这个 map 属性扩展的，所以我们说 HashSet 的底层是 HashMap 。

PS：需要知道的是，JDK 7 的 HashMap 是基于数组+链表实现的、JDK 8 的 HashMap 是基于数组+链表/红黑树实现的、