核心变量

  1. 数组的默认大小,16

数组大小一般都会具体指定一下,预估key-value的数量会有多大然后去设置

  1. /**
  2. * The default initial capacity - MUST be a power of two.
  3. */
  4. static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
  1. 默认负载因子,如果数组里的元素的个数达到了数组大小的(16)*负载因子(0.75f)

    1. /**
    2. * The load factor used when none specified in constructor.
    3. */
    4. static final float DEFAULT_LOAD_FACTOR = 0.75f;
  2. Node内部类

该Node代表了一个key-value对,里面包含了key的hash值,next指针指向下一个Node节点构成一个单向链表。

  1. /**
  2. * Basic hash bin node, used for most entries. (See below for
  3. * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
  4. */
  5. static class Node<K,V> implements Map.Entry<K,V> {
  6. final int hash;
  7. final K key;
  8. V value;
  9. Node<K,V> next;
  1. Node[] table

这是一个数组,就是map里面的核心数据结构的数组,数组的元素就是Node类型的,天然的就可以挂成一个链表。

  1. /**
  2. * The table, initialized on first use, and resized as
  3. * necessary. When allocated, length is always a power of two.
  4. * (We also tolerate length zero in some operations to allow
  5. * bootstrapping mechanics that are currently not needed.)
  6. */
  7. transient Node<K,V>[] table;
  1. size长度

size代表hashMap中有多少个key-value对,如果这个数量达到了指定大小*负载因子,那么就会进行数组的扩容

  1. /**
  2. * The number of key-value mappings contained in this map.
  3. */
  4. transient int size;
  1. 阈值

这个值是capacity * load factor的结果,如果size达到了threshold,就会进行扩容。

  1. /**
  2. * The next size value at which to resize (capacity * load factor).
  3. *
  4. * @serial
  5. */
  6. // (The javadoc description is true upon serialization.
  7. // Additionally, if the table array has not been allocated, this
  8. // field holds the initial array capacity, or zero signifying
  9. // DEFAULT_INITIAL_CAPACITY.)
  10. int threshold;
  1. 转红黑树的限制

链表超过8就要转红黑树了,为什么是8?里面注释都已经写了。
当hashCode离散性很好的时候,树型bin用到的概率非常小,因为数据均匀分布在每个bin中,几乎不会有bin中链表长度会达到阈值。但是在随机hashCode下,离散性可能会变差,然而JDK又不能阻止用户实现这种不好的hash算法,因此就可能导致不均匀的数据分布。不过理想情况下随机hashCode算法下所有bin中节点的分布频率会遵循泊松分布,我们可以看到,一个bin中链表长度达到8个元素的概率为0.00000006,几乎是不可能事件。而长度是8的概率是0.00000006,再长的话他的概率是不到千万分之一!(经过统计得出来的)

  1. /*
  2. * Because TreeNodes are about twice the size of regular nodes, we
  3. * use them only when bins contain enough nodes to warrant use
  4. * (see TREEIFY_THRESHOLD). And when they become too small (due to
  5. * removal or resizing) they are converted back to plain bins. In
  6. * usages with well-distributed user hashCodes, tree bins are
  7. * rarely used. Ideally, under random hashCodes, the frequency of
  8. * nodes in bins follows a Poisson distribution
  9. * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
  10. * parameter of about 0.5 on average for the default resizing
  11. * threshold of 0.75, although with a large variance because of
  12. * resizing granularity. Ignoring variance, the expected
  13. * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
  14. * factorial(k)). The first values are:
  15. *
  16. * 0: 0.60653066
  17. * 1: 0.30326533
  18. * 2: 0.07581633
  19. * 3: 0.01263606
  20. * 4: 0.00157952
  21. * 5: 0.00015795
  22. * 6: 0.00001316
  23. * 7: 0.00000094
  24. * 8: 0.00000006
  25. * more: less than 1 in ten million
  26. */
  27. static final int TREEIFY_THRESHOLD = 8;

优化的hash算法

image.png

  1. //jdk1.7
  2. final int hash(Object k) {
  3. int h = hashSeed;//默认为0
  4. if (0 != h && k instanceof String) {
  5. return sun.misc.Hashing.stringHash32((String) k);
  6. }
  7. h ^= k.hashCode();
  8. // This function ensures that hashCodes that differ only by
  9. // constant multiples at each bit position have a bounded
  10. // number of collisions (approximately 8 at default load factor).
  11. h ^= (h >>> 20) ^ (h >>> 12);
  12. return h ^ (h >>> 7) ^ (h >>> 4);
  13. //1.8
  14. static final int hash(Object key) {
  15. int h;
  16. return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
  17. }
  1. h = key.hashCode();获取key的hashCode

1111 1111 1111 1111 1111 1010 0111 1100

  1. h >>> 16;这个东西是把32位的二进制的数字

0000 0000 0000 0000 1111 1111 1111 1111

  1. (h = key.hashCode()) ^ (h >>> 16);做异或运算

1111 1111 1111 1111 1111 1010 0111 1100
^
0000 0000 0000 0000 1111 1111 1111 1111
1111 1111 1111 1111 0000 0101 1000 0011(结果)
为什么要做hashCode和hashCode右移16位做异或操作?
通过这样的方式降低hash冲突。因为在后面使用hash值定位数据的index的时候,也会有一个位运算。而后面的位运算,一般都是用低16位做运算。如果不在这里把哈希值进行运算的话,就会导致后阿棉通过hash值找index的时候只有hash值的低16位参与了运算。上面的一些列操作就把高16位和低16位的特征都转移到低16位去了,就保留了全部的特征。这样再去做寻址操作时,hash碰撞的概率就比较低了。(下面put剖析时就能看到)