核心变量
- 数组的默认大小,16
数组大小一般都会具体指定一下,预估key-value的数量会有多大然后去设置
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
默认负载因子,如果数组里的元素的个数达到了数组大小的(16)*负载因子(0.75f)
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
Node内部类
该Node代表了一个key-value对,里面包含了key的hash值,next指针指向下一个Node节点构成一个单向链表。
/**
* Basic hash bin node, used for most entries. (See below for
* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
*/
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
- Node
[] table
这是一个数组,就是map里面的核心数据结构的数组,数组的元素就是Node类型的,天然的就可以挂成一个链表。
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
- size长度
size代表hashMap中有多少个key-value对,如果这个数量达到了指定大小*负载因子,那么就会进行数组的扩容
/**
* The number of key-value mappings contained in this map.
*/
transient int size;
- 阈值
这个值是capacity * load factor的结果,如果size达到了threshold,就会进行扩容。
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
- 转红黑树的限制
链表超过8就要转红黑树了,为什么是8?里面注释都已经写了。
当hashCode离散性很好的时候,树型bin用到的概率非常小,因为数据均匀分布在每个bin中,几乎不会有bin中链表长度会达到阈值。但是在随机hashCode下,离散性可能会变差,然而JDK又不能阻止用户实现这种不好的hash算法,因此就可能导致不均匀的数据分布。不过理想情况下随机hashCode算法下所有bin中节点的分布频率会遵循泊松分布,我们可以看到,一个bin中链表长度达到8个元素的概率为0.00000006,几乎是不可能事件。而长度是8的概率是0.00000006,再长的话他的概率是不到千万分之一!(经过统计得出来的)
/*
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
*/
static final int TREEIFY_THRESHOLD = 8;
优化的hash算法
//jdk1.7
final int hash(Object k) {
int h = hashSeed;//默认为0
if (0 != h && k instanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
}
h ^= k.hashCode();
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
//1.8
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
- h = key.hashCode();获取key的hashCode
1111 1111 1111 1111 1111 1010 0111 1100
- h >>> 16;这个东西是把32位的二进制的数字
0000 0000 0000 0000 1111 1111 1111 1111
- (h = key.hashCode()) ^ (h >>> 16);做异或运算
1111 1111 1111 1111 1111 1010 0111 1100
^
0000 0000 0000 0000 1111 1111 1111 1111
1111 1111 1111 1111 0000 0101 1000 0011(结果)
为什么要做hashCode和hashCode右移16位做异或操作?
通过这样的方式降低hash冲突。因为在后面使用hash值定位数据的index的时候,也会有一个位运算。而后面的位运算,一般都是用低16位做运算。如果不在这里把哈希值进行运算的话,就会导致后阿棉通过hash值找index的时候只有hash值的低16位参与了运算。上面的一些列操作就把高16位和低16位的特征都转移到低16位去了,就保留了全部的特征。这样再去做寻址操作时,hash碰撞的概率就比较低了。(下面put剖析时就能看到)