核心变量

数组的默认大小，16

数组大小一般都会具体指定一下，预估key-value的数量会有多大然后去设置

/**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

默认负载因子，如果数组里的元素的个数达到了数组大小的（16）*负载因子（0.75f）

/**
  * The load factor used when none specified in constructor.
  */
 static final float DEFAULT_LOAD_FACTOR = 0.75f;

Node内部类

该Node代表了一个key-value对，里面包含了key的hash值，next指针指向下一个Node节点构成一个单向链表。

/**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

Node[] table

这是一个数组，就是map里面的核心数据结构的数组，数组的元素就是Node类型的，天然的就可以挂成一个链表。

/**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;

size长度

size代表hashMap中有多少个key-value对，如果这个数量达到了指定大小*负载因子，那么就会进行数组的扩容

 /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

阈值

这个值是capacity * load factor的结果，如果size达到了threshold，就会进行扩容。

/**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

转红黑树的限制

链表超过8就要转红黑树了，为什么是8？里面注释都已经写了。
当hashCode离散性很好的时候，树型bin用到的概率非常小，因为数据均匀分布在每个bin中，几乎不会有bin中链表长度会达到阈值。但是在随机hashCode下，离散性可能会变差，然而JDK又不能阻止用户实现这种不好的hash算法，因此就可能导致不均匀的数据分布。不过理想情况下随机hashCode算法下所有bin中节点的分布频率会遵循泊松分布，我们可以看到，一个bin中链表长度达到8个元素的概率为0.00000006，几乎是不可能事件。而长度是8的概率是0.00000006，再长的话他的概率是不到千万分之一！（经过统计得出来的）

/*
     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     */
static final int TREEIFY_THRESHOLD = 8;

优化的hash算法

//jdk1.7
final int hash(Object k) {
        int h = hashSeed;//默认为0
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }
        h ^= k.hashCode();
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
//1.8
static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

h = key.hashCode();获取key的hashCode

1111 1111 1111 1111 1111 1010 0111 1100

h >>> 16;这个东西是把32位的二进制的数字

0000 0000 0000 0000 1111 1111 1111 1111

(h = key.hashCode()) ^ (h >>> 16);做异或运算

1111 1111 1111 1111 1111 1010 0111 1100
^
0000 0000 0000 0000 1111 1111 1111 1111
1111 1111 1111 1111 0000 0101 1000 0011(结果)
为什么要做hashCode和hashCode右移16位做异或操作？
通过这样的方式降低hash冲突。因为在后面使用hash值定位数据的index的时候，也会有一个位运算。而后面的位运算，一般都是用低16位做运算。如果不在这里把哈希值进行运算的话，就会导致后阿棉通过hash值找index的时候只有hash值的低16位参与了运算。上面的一些列操作就把高16位和低16位的特征都转移到低16位去了，就保留了全部的特征。这样再去做寻址操作时，hash碰撞的概率就比较低了。（下面put剖析时就能看到）

并发编程

HashMap核心变量&优化的hash算法

核心变量

优化的hash算法