哈希表

哈希表

我们通过将我们要查找的某种数据类型转化为一个索引index，然后通过索引去数组中查找，这时它的复杂度就是O(1)级别的。而将某个数据类型转化为索引的函数我们就称为是哈希函数，比如说将26个小写字母转化为索引，我们可以这么写

index = ch - 'a';

这样就建立起了一一对应的关系，但是并不是所有的对应关系都是一一对应的，因为数组的容量是有限的，而输入的范围可能是无穷的，所以很有可能不同的键对应着同一个索引，比如说键是字符串，因为字符串的组合方式是非常的多，可以看做是无穷的，我们不可能去开辟一个无穷的空间去与这些字符串一一对应，所以不同的字符串生成的索引很有可能会有冲突，我们称这种情况为哈希冲突。由于上面讲到的哈希冲突，所以我们要设计好哈希函数(hashCode())使得发生哈希冲突的可能性小，即使哈希函数产生的哈希值均匀的分布在数组中。

哈希函数的设计

哈希函数应该满足上面提到的：哈希函数产生的哈希值均匀的分布在数组中。数据的类型五花八门，对于特殊的领域有特殊领域的哈希函数的设计方式，甚至还有专门的论文，说这么多就是想说哈希函数的设计十分的复杂，在这里我们只提最简单的一种，哈希函数的设计应该满足

一致性
- 如果a == b，那么hashCode(a) == hashCode(b)
高效性
- 计算迅速
均匀性
- 输出尽可能均匀

哈希表 - 图1
由于Java中基本数据类型和字符串类型有默认的hashCode()计算，所以我们就用Java自带的hashCode计算基本数据类型和字符串的哈希值，而对于引用类型Java是根据地址计算的哈希值，所以可能会出现问题，需要我们自己自定义规则，比如对于一个Student类，我们规定学号以及姓名相同(不区分大小写)就是同一个学生，所以根据一致性原则，它们应该产生相同的哈希值，但是由于Java默认是根据地址产生哈希值，由于二者的地址是不同的，所以产生的哈希值有极大的概率是不同的，所以我们需要自己创建哈希函数。

链地址法

现在我们来演示往哈希表中添加元素的步骤
哈希表 - 图2

import java.util.TreeMap;
public class HashTable<K, V> {
    //数组中存储的是TreeMap这种查找表
    private TreeMap<K, V>[] hashTable;
    private int M;
    private int size;
    public HashTable(int M) {
        this.M = M;
        size = 0;
        hashTable = new TreeMap[M];
        for (int i = 0; i < hashTable.length; i++) {
            hashTable[i] = new TreeMap<>();
        }
    }
    public HashTable() {
        this(97);
    }
    public int getSize() {
        return size;
    }
    //得到在数组中的索引
    private int hash(K key) {
        //与0x7fffffff是为了消除负数
        return (key.hashCode() & 0x7fffffff) % M;
    }
    public void add(K key, V value) {
        TreeMap<K, V> map = hashTable[hash(key)];
        //先查看已经是否有这个键了
        if (map.containsKey(key)) {
            //有则更新
            map.put(key, value);
        } else {
            //没有则进行添加，并维护size
            map.put(key, value);
            size++;
        }
    }
    public V remove(K key, V value) {
        V ret = null;
        TreeMap<K, V> map = hashTable[hash(key)];
        //如果包含键则删除，没有返回null
        if (map.containsKey(key)) {
            ret = map.remove(key);
            size--;
        }
        return ret;
    }
    public void set(K key, V value) {
        TreeMap<K, V> map = hashTable[hash(key)];
        //没有该键抛出异常
        if (!map.containsKey(key)) {
            throw new IllegalArgumentException("键不存在");
        }
        map.put(key,value);
    }
    //直接得到相应的TreeMap，然后去查，TreeMap有检查步骤
    public V get(K key) {
        return hashTable[hash(key)].get(key);
    }
}

哈希表 - 图3

import java.util.TreeMap;
public class HashTable<K, V> {
    private static final int upperTol = 10;
    private static final int lowerTol = 2;
    private static final int initCapacity = 7;
    //数组中存储的是TreeMap这种查找表
    private TreeMap<K, V>[] hashTable;
    private int M;
    private int size;
    public HashTable(int M) {
        //只显示改变的内容
        //...
        hashTable = new TreeMap[initCapacity];
    }
    public void add(K key, V value) {
        //...
        if (size >= upperTol * M) {
            resize(2 * M);
        }
    }
    public V remove(K key, V value) {
        //...
        if (size < M * lowerTol && M / 2 >= initCapacity) {
            resize(M / 2);
        }
        return ret;
    }
    private void resize(int newM) {
        TreeMap<K,V>[] newHashTable = new TreeMap[newM];
        //后面要更新M，但是还需要旧M遍历数组
        int oldM = M;
        //由于后面要重新计算下标，所以这里要更新M
        M = newM;
        for (int i = 0; i < oldM; i++) {
            TreeMap<K, V> map = hashTable[i];
            for (K key: map.keySet()) {
                //重新计算下标并赋值
                newHashTable[hash(key)].put(key, map.get(key));
            }
        }
        hashTable = newHashTable;
    }
}

但是我们发现每次我们都扩容为2 * M，这时M就不是一个素数了，为了解决这一个问题，我们准备一个素数表，让M取素数表中的值，每次扩容M在素数表中的索引+1，缩容-1

import java.util.TreeMap;
public class HashTable<K, V> {
    //素数表
    private static final int[] capacity = {};
    private static final int upperTol = 10;
    private static final int lowerTol = 2;
    private  int capacityIndex = 0;
    //数组中存储的是TreeMap这种查找表
    private TreeMap<K, V>[] hashTable;
    private int M;
    private int size;
    public HashTable() {
        this.M = capacity[capacityIndex];
        size = 0;
        hashTable = new TreeMap[M];
        for (int i = 0; i < hashTable.length; i++) {
            hashTable[i] = new TreeMap<>();
        }
    }
    public void add(K key, V value) {
        //...
        if (size >= upperTol * M && capacityIndex + 1 < size) {
            capacityIndex++;
            resize(capacity[capacityIndex]);
        }
    }
    public V remove(K key, V value) {
        //...
        if (size < M * lowerTol && capacityIndex - 1 >= 0) {
            capacityIndex--;
            resize(capacity[capacityIndex]);
        }
        return ret;
    }
}

完整代码

import java.util.TreeMap;
public class HashTable<K, V> {
    private static final int[] capacity = {};
    private static final int upperTol = 10;
    private static final int lowerTol = 2;
    private  int capacityIndex = 0;
    //数组中存储的是TreeMap这种查找表
    private TreeMap<K, V>[] hashTable;
    private int M;
    private int size;
    public HashTable() {
        this.M = capacity[capacityIndex];
        size = 0;
        hashTable = new TreeMap[M];
        for (int i = 0; i < hashTable.length; i++) {
            hashTable[i] = new TreeMap<>();
        }
    }
    public int getSize() {
        return size;
    }
    //得到在数组中的索引
    private int hash(K key) {
        //与0x7fffffff是为了消除负数
        return (key.hashCode() & 0x7fffffff) % M;
    }
    public void add(K key, V value) {
        TreeMap<K, V> map = hashTable[hash(key)];
        //先查看已经是否有这个键了
        if (map.containsKey(key)) {
            //有则更新
            map.put(key, value);
        } else {
            //没有则进行添加，并维护size
            map.put(key, value);
            size++;
        }
        if (size >= upperTol * M && capacityIndex + 1 < capacity.length) {
            capacityIndex++;
            resize(capacity[capacityIndex]);
        }
    }
    public V remove(K key, V value) {
        V ret = null;
        TreeMap<K, V> map = hashTable[hash(key)];
        //如果包含键则删除，没有返回null
        if (map.containsKey(key)) {
            ret = map.remove(key);
            size--;
        }
        if (size < M * lowerTol && capacityIndex - 1 >= 0) {
            capacityIndex--;
            resize(capacity[capacityIndex]);
        }
        return ret;
    }
    public void set(K key, V value) {
        TreeMap<K, V> map = hashTable[hash(key)];
        //没有该键抛出异常
        if (!map.containsKey(key)) {
            throw new IllegalArgumentException("键不存在");
        }
        map.put(key,value);
    }
    //直接得到相应的TreeMap，然后去查，TreeMap有检查步骤
    public V get(K key) {
        return hashTable[hash(key)].get(key);
    }
    private void resize(int newM) {
        TreeMap<K,V>[] newHashTable = new TreeMap[newM];
        //后面要更新M，但是还需要旧M遍历数组
        int oldM = M;
        //由于后面要重新计算下标，所以这里要更新M
        M = newM;
        for (int i = 0; i < oldM; i++) {
            TreeMap<K, V> map = hashTable[i];
            for (K key: map.keySet()) {
                //重新计算下标并赋值
                newHashTable[hash(key)].put(key, map.get(key));
            }
        }
        hashTable = newHashTable;
    }
}