map : key value
- key: var-length, unordered
hash table: int value through array
- int: small range (hash code)
From key to hash code: hash table
- int: small range (hash code)
Hash Code generate
%20%3D%20compress(HashCode(key))#card=math&code=h%28key%29%20%3D%20compress%28HashCode%28key%29%29)
How to generate HashCode?
- : HashCode(“abc”) = HashCode(“acb”)
- #card=math&code=%5Csum_%7Bi%7D%20%28key%5Bi%5D%20%5Ccdot%2016%5Ei%29) byte shift
- cyclic shift
From simple to complicated.
Collision Handling
collision: %20%3D%20h(key_2)#card=math&code=h%28key_1%29%20%3D%20h%28key_2%29) but
- don’t care (tolerate some errors)
- throw away
- don’t insert
example: use hash-based map to realize set:
set =
key
%5D%20%3D%201#card=math&code=a%5Bh%28k_i%29%5D%20%3D%201): correct if no collision ,may not correct if collision
Bloom Filter
- fixed array per bucket (can still overflow)
- linked list per bucket(chaining, hope short chain)
- other container—-other container: secondary
- “reuse” empty space: open addressing
a[key] = value
- insert(key, value) to array location %5D#card=math&code=%5Bh_0%28key%29%5D)
- insert(key, value) to array location %5D#card=math&code=%5Bh_0%28key%29%5D)
- if fail, re-insert to #card=math&code=h_1%28key%29)
- if fail, re-insert to #card=math&code=h_1%28key%29)
- if fail, re-insert to #card=math&code=h_2%28key%29)
- if fail, re-insert to #card=math&code=h_2%28key%29)
- m. if fail, re-insert to #card=math&code=h_%7Bm-1%7D%28key%29)
- m+1. if still not found declare failure
How to decide h(key)
- linear probing
%20%25K%20%3D%20(h0%20%2Bi%20)%25%20K#card=math&code=h_i%20%3D%20%28h%7Bi-1%7D%2B1%29%20%25K%20%3D%20%28h_0%20%2Bi%20%29%25%20K)
- linear probing
- quadratic probing
%20%25%20K#card=math&code=h_i%20%3D%20%28h_0%2Bi%5E2%29%20%25%20K)
- quadratic probing
- double hashing
#card=math&code=h_i%20%3D%20%28h_0%20%2B%20i%20%5Ccdot%20%5Cwidetilde%7Bh%7D%28key%29)
- double hashing
Hash Table Build
Consider a hash table which has entries and keys
We want load factor
to be small
- large hash don’t work
- hash non-uniform large
When n(load factor) is getting larger, then we increase K
naive method
- set
- h(key) range{0,1,2,…,2K-1}
- full rebuild #card=math&code=w%2FO%28n%29) if each insert #card=math&code=%5Csimeq%20O%281%29)
shortcoming: step3 long waiting
partial rebuild method
Example:
origin hash table(mod 4, h(key) = (key[0] - ‘a’) % K):
We want insert “egg”
- Full build method:
We move “eat” to block 4 and “good” to block 6 - partial build method:
We only move “eat” to block 4 because it’s overflow
We use pointer to connect the other blocks
complexity: %2BO(%5Cfrac%7Bn%7D%7Bk%7D)#card=math&code=O%28K%29%2BO%28%5Cfrac%7Bn%7D%7Bk%7D%29)
This way, the hash table is effectively extendable.