核心变量
- 数组的默认大小,16
数组大小一般都会具体指定一下,预估key-value的数量会有多大然后去设置
/*** The default initial capacity - MUST be a power of two.*/static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
默认负载因子,如果数组里的元素的个数达到了数组大小的(16)*负载因子(0.75f)
/*** The load factor used when none specified in constructor.*/static final float DEFAULT_LOAD_FACTOR = 0.75f;
Node内部类
该Node代表了一个key-value对,里面包含了key的hash值,next指针指向下一个Node节点构成一个单向链表。
/*** Basic hash bin node, used for most entries. (See below for* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)*/static class Node<K,V> implements Map.Entry<K,V> {final int hash;final K key;V value;Node<K,V> next;
- Node
[] table
这是一个数组,就是map里面的核心数据结构的数组,数组的元素就是Node类型的,天然的就可以挂成一个链表。
/*** The table, initialized on first use, and resized as* necessary. When allocated, length is always a power of two.* (We also tolerate length zero in some operations to allow* bootstrapping mechanics that are currently not needed.)*/transient Node<K,V>[] table;
- size长度
size代表hashMap中有多少个key-value对,如果这个数量达到了指定大小*负载因子,那么就会进行数组的扩容
/*** The number of key-value mappings contained in this map.*/transient int size;
- 阈值
这个值是capacity * load factor的结果,如果size达到了threshold,就会进行扩容。
/*** The next size value at which to resize (capacity * load factor).** @serial*/// (The javadoc description is true upon serialization.// Additionally, if the table array has not been allocated, this// field holds the initial array capacity, or zero signifying// DEFAULT_INITIAL_CAPACITY.)int threshold;
- 转红黑树的限制
链表超过8就要转红黑树了,为什么是8?里面注释都已经写了。
当hashCode离散性很好的时候,树型bin用到的概率非常小,因为数据均匀分布在每个bin中,几乎不会有bin中链表长度会达到阈值。但是在随机hashCode下,离散性可能会变差,然而JDK又不能阻止用户实现这种不好的hash算法,因此就可能导致不均匀的数据分布。不过理想情况下随机hashCode算法下所有bin中节点的分布频率会遵循泊松分布,我们可以看到,一个bin中链表长度达到8个元素的概率为0.00000006,几乎是不可能事件。而长度是8的概率是0.00000006,再长的话他的概率是不到千万分之一!(经过统计得出来的)
/** Because TreeNodes are about twice the size of regular nodes, we* use them only when bins contain enough nodes to warrant use* (see TREEIFY_THRESHOLD). And when they become too small (due to* removal or resizing) they are converted back to plain bins. In* usages with well-distributed user hashCodes, tree bins are* rarely used. Ideally, under random hashCodes, the frequency of* nodes in bins follows a Poisson distribution* (http://en.wikipedia.org/wiki/Poisson_distribution) with a* parameter of about 0.5 on average for the default resizing* threshold of 0.75, although with a large variance because of* resizing granularity. Ignoring variance, the expected* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /* factorial(k)). The first values are:** 0: 0.60653066* 1: 0.30326533* 2: 0.07581633* 3: 0.01263606* 4: 0.00157952* 5: 0.00015795* 6: 0.00001316* 7: 0.00000094* 8: 0.00000006* more: less than 1 in ten million*/static final int TREEIFY_THRESHOLD = 8;
优化的hash算法

//jdk1.7final int hash(Object k) {int h = hashSeed;//默认为0if (0 != h && k instanceof String) {return sun.misc.Hashing.stringHash32((String) k);}h ^= k.hashCode();// This function ensures that hashCodes that differ only by// constant multiples at each bit position have a bounded// number of collisions (approximately 8 at default load factor).h ^= (h >>> 20) ^ (h >>> 12);return h ^ (h >>> 7) ^ (h >>> 4);//1.8static final int hash(Object key) {int h;return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);}
- h = key.hashCode();获取key的hashCode
1111 1111 1111 1111 1111 1010 0111 1100
- h >>> 16;这个东西是把32位的二进制的数字
0000 0000 0000 0000 1111 1111 1111 1111
- (h = key.hashCode()) ^ (h >>> 16);做异或运算
1111 1111 1111 1111 1111 1010 0111 1100
^
0000 0000 0000 0000 1111 1111 1111 1111
1111 1111 1111 1111 0000 0101 1000 0011(结果)
为什么要做hashCode和hashCode右移16位做异或操作?
通过这样的方式降低hash冲突。因为在后面使用hash值定位数据的index的时候,也会有一个位运算。而后面的位运算,一般都是用低16位做运算。如果不在这里把哈希值进行运算的话,就会导致后阿棉通过hash值找index的时候只有hash值的低16位参与了运算。上面的一些列操作就把高16位和低16位的特征都转移到低16位去了,就保留了全部的特征。这样再去做寻址操作时,hash碰撞的概率就比较低了。(下面put剖析时就能看到)
