源码解读 - HashMap - 《Java 核心语言知识》

前言
继承关系
Java 7 HashMap
Java 8 HashMap
Node节点
初始化
tableSizeFor表大小
hash方法
put方法
resize数组扩容
get方法
remove方法
序列化
线程不安全
键不变性
参考

前言

HashMap是平时使用最多，最常用的接口。它的源码更值得学习和理解。Java 7和Java 8的HashMap源码发生了变动，这里引申介绍下2种。

继承关系

HashMap主要继承Map接口，实现了抽象类AbstractMap

Java 7 HashMap

源码相对简单，不支持并发操作，采取的是数组+链表结构，即本身是个数组，每个数组对象放置的是单向链表。

注：图片从https://www.javastack.cn/article/2018/hashmap-concurrenthashmap-details/#lg=1&slide=0 获取。

说明：插入时根据对象key进行hash，找到数组对应的位置。依次进行equals判断链表是否。不相等的话，再链表后面追加操作。

Java 8 HashMap

java8的源码相对精简很多，与java 7的HashMap结构大体一样，区别就是，当单链表的长度>8时，转化为红黑树。整体结构由 数组+链表+红黑树 构成。

注：图片从https://www.javastack.cn/article/2018/hashmap-concurrenthashmap-details/#lg=1&slide=2 获取。

Node节点

HashMap每个元素都是Node节点。包含了节点的hash，key，value，next属性。如果是红黑树，那么节点是TreeNode。

  static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
  }

初始化

这里进行初始化操作，可以指定初始化的容量大小，负载因子，决定增长的大小。

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

tableSizeFor表大小

代码作用：计算出大于或等于cap的第一个2的n次幂。

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

解释：

首先对cap-1操作。这里计算的是2的幂数，如果本身cap就是2的幂数，那么结果就是2*2^n次方了。
采取>>>操作获取幂数，右移补位1操作。

hash方法

拿到key，进行hash得到hash后桶的位置。

  static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

解释：
(h = key.hashCode()) ^ (h >>> 16) 就是计算key.hashCode()并扩展哈希的更高位
对象hash之后可能数值特别大，这样在数组定位时，容易造成hashCode只有低位影响了定位tab操作，这里需要一位打乱下步骤。

put方法

put方法，设置key对应的value。方式：依次找到位置进行判断put操作

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
/**
* onlyIfAbsent  如果是true，只有在不存在key的情况下进行put操作
* evict 如果是false，则这个表是创建模式。（LinkHashMap可以进一步处理）
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //如果table是null，触发resize()操作进行扩容。
    //第一次resize()时，容量默认是DEFAULT_INITIAL_CAPACITY（长度16）
    //默认阈值：DEFAULT_LOAD_FACTOR (0.75) * DEFAULT_INITIAL_CAPACITY
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //定位数组下标，找到元素p是否为null，是的话，进行初始化操作。
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        //判断数组元素p是否和插入的数据相同。如果相同找到这个节点e
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            //如果节点是红黑树节点，进行红黑树的插入方法
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            //找到了数组元素p所在的链表，依次进行链表判断操作
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    //下一个节点为null，进行插入操作
                    p.next = newNode(hash, key, value, null);
                    //TREEIFY_THRESHOLD = 8  
                    //如果链表长度大于8，触发treeifyBin方法，转化为红黑树
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                //一直找到链表对应的数值
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        //存在节点进行替换操作
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    //进行resize()的扩容操作
    if (++size > threshold)
        resize();
    //插入节点之后，进行额外的操作
    afterNodeInsertion(evict);
    return null;
}

解释：
这里定位tab采取的方式是 p = tab[i = (n - 1) & hash] 本身我们应该求余操作，这里 hash%n 可以转化为 (n-1)%hash
n-1的得到的二进制全部是1

resize数组扩容

对数组进行扩容操作，每次扩容后，容量是原来的2倍，并对数据进行迁移。

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    //原来是null，进行扩容
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        //设置新的容量newCap，扩大一倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            //对应的阈值也扩大一倍
            newThr = oldThr << 1; // double threshold
    }
    //原数组有值，对应使用 new HashMap(int initialCapacity) 初始化后，第一次 put 的时候
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        //对应使用 new HashMap() 初始化后，第一次 put 的时候
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    //创建新的数组进行初始化操作
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    //如果是初始化数组的，这里就结束了，返回 newTab
    table = newTab;
    //存在oldTab进行扩容操作
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                //链表e只有1个元素，直接迁移
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    //对存在的链表进行保存到newTable操作
                    //扩容时，需要将原链表也进行拆分，放到新的链表位置上去
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        //高低位数组判断
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        //第一条链表放入newTab的原位置
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        //第二条链表扩容后，放入newTab的j+oldCap位置
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

解释：
链表扩容时，采取的是 (e.hash & oldCap) == 0 方式，是因为每次key.hashCode对容量进行取余的时候，影响的都是cap的后几位。当cap扩容之后，就会再次放大1位扩容。详细见参考。

假设原容量n=10000，n - 1 = 1111 假设key.hash = 10001 那么ta所在的位置是 1 然后扩容一下 现在n=100000，n - 1 = 11111

那么ta所在的位置是10001

get方法

get是直接获取节点进行返回。

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    //定位tab的第一个节点first
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        //判断第一个阶段是否相同
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            //进行链表判断
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

remove方法

移除方法

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        //定位到数组的p节点
        Node<K,V> node = null, e; K k; V v;
        //判断节点p第一个节点是不是否和
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {
            //找到此节点对应的数据
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        //找到节点并判断是否相同
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

序列化

HashMap实现了Serializable接口，所以可以进行序列化操作。但是java没有使用默认的序列化方式，而是自己重写了 writeObject/readObject 进行独自的序列化操作。

注：实现了Serializable接口时，默认会采取ObjectInputStream或者ObjectOutputStream进行序列化操作。如果对象自己重写了writeObject/readObject方法，那么将会采取对象的提供的方法。

HashMap存储的table进行了transient，所以不能进行序列化操作。

transient Node<K,V>[] table;

这里为什么采取自己的序列化方法呢？是因为HashMap对象存放的位置hash是按照key的hashCode计算出来的。而不同的JVM对于hashCode的计算方式是不一样的，采取java默认的方式，那么反序列化就会错误。HashMap就默认将table,size,modCount进行transient修饰了。

writeObject：序列化时，将key，value取出来，一个个设置进去。

//此处私有方法是可以实现私有的readObject和writeObject方法，而不用关心HashMap自己的那一部分。
private void writeObject(java.io.ObjectOutputStream s)
    throws IOException {
    int buckets = capacity();
    // Write out the threshold, loadfactor, and any hidden stuff
    s.defaultWriteObject();
    s.writeInt(buckets);
    s.writeInt(size);
    internalWriteEntries(s);
}
 // Called only from writeObject, to ensure compatible ordering.
void internalWriteEntries(java.io.ObjectOutputStream s) throws IOException {
    Node<K,V>[] tab;
    if (size > 0 && (tab = table) != null) {
        for (int i = 0; i < tab.length; ++i) {
            for (Node<K,V> e = tab[i]; e != null; e = e.next) {
                s.writeObject(e.key);
                s.writeObject(e.value);
            }
        }
    }
}

readObject：

private void readObject(java.io.ObjectInputStream s)
    throws IOException, ClassNotFoundException {
    // Read in the threshold (ignored), loadfactor, and any hidden stuff
    s.defaultReadObject();
    // table size等等进行默认初始化操作
    reinitialize();
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new InvalidObjectException("Illegal load factor: " +
                                         loadFactor);
    s.readInt();                // Read and ignore number of buckets
    int mappings = s.readInt(); // Read number of mappings (size)
    if (mappings < 0)
        throw new InvalidObjectException("Illegal mappings count: " +
                                         mappings);
    else if (mappings > 0) { // (if zero, use defaults)
        // Size the table using given load factor only if within
        // range of 0.25...4.0
        float lf = Math.min(Math.max(0.25f, loadFactor), 4.0f);
        float fc = (float)mappings / lf + 1.0f;
        int cap = ((fc < DEFAULT_INITIAL_CAPACITY) ?
                   DEFAULT_INITIAL_CAPACITY :
                   (fc >= MAXIMUM_CAPACITY) ?
                   MAXIMUM_CAPACITY :
                   tableSizeFor((int)fc));
        float ft = (float)cap * lf;
        threshold = ((cap < MAXIMUM_CAPACITY && ft < MAXIMUM_CAPACITY) ?
                     (int)ft : Integer.MAX_VALUE);
        // Check Map.Entry[].class since it's the nearest public type to
        // what we're actually creating.
        SharedSecrets.getJavaOISAccess().checkArray(s, Map.Entry[].class, cap);
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] tab = (Node<K,V>[])new Node[cap];
        table = tab;
        // Read the keys and values, and put the mappings in the HashMap
        for (int i = 0; i < mappings; i++) {
            @SuppressWarnings("unchecked")
                K key = (K) s.readObject();
            @SuppressWarnings("unchecked")
                V value = (V) s.readObject();
            putVal(hash(key), key, value, false, false);
        }
    }
}

线程不安全

HashMap会进行自动扩容操作，其中会有链表的处理。如果多线程操作，会导致找不到key，或者同时修改链表，造成死循环。

键不变性

HashMap采取的key尽量保证不变，因为它的数组定位是采取hashCode的。如果对象采取为key，当对象发生变化时，HashMap中的数据不存在了，换了新的定位。

参考

HashMap, ConcurrentHashMap 原理及源码，一次性讲清楚！强力推荐，有区分。
HashMap 源码详细分析(JDK1.8) 强力推荐，含有红黑树的讲解
HashMap哲学中的数学原理
为什么HashMap要自己实现writeObject和readObject方法？