Lesson 11 Hash Table - 《Data Structure and algorithms》

Hash Code generate
Collision Handling
- How to decide h(key)
Hash Table Build

map : key $Lesson 11 Hash Table - 图1$ value
- key: var-length, unordered
hash table: int $Lesson 11 Hash Table - 图2$ value through array
- int: small range (hash code)
  From key to hash code: hash table

Hash Code generate

$Lesson 11 Hash Table - 图3$ %20%3D%20compress(HashCode(key))#card=math&code=h%28key%29%20%3D%20compress%28HashCode%28key%29%29)

How to generate HashCode?

1. $Lesson 11 Hash Table - 图4$ : HashCode(“abc”) = HashCode(“acb”)
1. $Lesson 11 Hash Table - 图5$ #card=math&code=%5Csum_%7Bi%7D%20%28key%5Bi%5D%20%5Ccdot%2016%5Ei%29) $Lesson 11 Hash Table - 图6$ byte shift
1. cyclic shift

From simple to complicated.

Collision Handling

collision: $Lesson 11 Hash Table - 图7$ %20%3D%20h(key_2)#card=math&code=h%28key_1%29%20%3D%20h%28key_2%29) but $Lesson 11 Hash Table - 图8$

1. don’t care (tolerate some errors)
- throw away
- don’t insert

example: use hash-based map to realize set:
set = $Lesson 11 Hash Table - 图9$
key $Lesson 11 Hash Table - 图10$ $Lesson 11 Hash Table - 图11$

$Lesson 11 Hash Table - 图12$ %5D%20%3D%201#card=math&code=a%5Bh%28k_i%29%5D%20%3D%201): correct if no collision ,may not correct if collision
Bloom Filter

1. fixed array per bucket (can still overflow)
1. linked list per bucket(chaining, hope short chain)
1. other container—-other container: secondary
1. “reuse” empty space: open addressing

a[key] = value

1. insert(key, value) to array location $Lesson 11 Hash Table - 图13$ %5D#card=math&code=%5Bh_0%28key%29%5D)
1. if fail, re-insert to $Lesson 11 Hash Table - 图14$ #card=math&code=h_1%28key%29)
1. if fail, re-insert to $Lesson 11 Hash Table - 图15$ #card=math&code=h_2%28key%29)
m. if fail, re-insert to $Lesson 11 Hash Table - 图16$ #card=math&code=h_%7Bm-1%7D%28key%29)
m+1. if still not found declare failure

How to decide h(key)

1. linear probing
  $Lesson 11 Hash Table - 图17$ %20%25K%20%3D%20(h0%20%2Bi%20)%25%20K#card=math&code=h_i%20%3D%20%28h%7Bi-1%7D%2B1%29%20%25K%20%3D%20%28h_0%20%2Bi%20%29%25%20K)
1. quadratic probing
  $Lesson 11 Hash Table - 图18$ %20%25%20K#card=math&code=h_i%20%3D%20%28h_0%2Bi%5E2%29%20%25%20K)
1. double hashing
  $Lesson 11 Hash Table - 图19$ #card=math&code=h_i%20%3D%20%28h_0%20%2B%20i%20%5Ccdot%20%5Cwidetilde%7Bh%7D%28key%29)

Hash Table Build

Consider a hash table which has $Lesson 11 Hash Table - 图20$ entries and $Lesson 11 Hash Table - 图21$ keys
We want $Lesson 11 Hash Table - 图22$ load factor to be small

$Lesson 11 Hash Table - 图23$ large $Lesson 11 Hash Table - 图24$ hash don’t work
hash non-uniform $Lesson 11 Hash Table - 图25$ $Lesson 11 Hash Table - 图26$ large

When n(load factor) is getting larger, then we increase K

naive method

1. set $Lesson 11 Hash Table - 图27$
1. h(key) $Lesson 11 Hash Table - 图28$ range{0,1,2,…,2K-1}
1. full rebuild $Lesson 11 Hash Table - 图29$ #card=math&code=w%2FO%28n%29) if each insert $Lesson 11 Hash Table - 图30$ #card=math&code=%5Csimeq%20O%281%29)

shortcoming: step3 long waiting

partial rebuild method

Example:
origin hash table(mod 4, h(key) = (key[0] - ‘a’) % K):
Lesson 11 Hash Table - 图31
We want insert “egg”

Full build method:

We move “eat” to block 4 and “good” to block 6
partial build method:
We only move “eat” to block 4 because it’s overflow

We use pointer to connect the other blocks
complexity: $Lesson 11 Hash Table - 图34$ %2BO(%5Cfrac%7Bn%7D%7Bk%7D)#card=math&code=O%28K%29%2BO%28%5Cfrac%7Bn%7D%7Bk%7D%29)
This way, the hash table is effectively extendable.

Summarize

Lesson 11 Hash Table - 图35