剖析go-map - 《go》

拉链法
map的数据结构
map初始化
- 字面量
- 运行时
读写操作
- map扩容时机

拉链法

拉链法-每个桶中都应有0-1个元素，有时2-3个。计算哈希、定位桶和遍历链表三个过程是哈希表读写操作的主要开销。
装载因子：
装载因子 := 元素数量 / 桶数量
拉链法的哈希表装载因子不会超过1.哈希表装载因子过大会触发哈希的扩容。创建更多的桶来存储哈希中的元素，保证性能不会出现严重的下降。

map的数据结构

src/runtime/map.go#L115

// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed
    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)
    extra *mapextra // optional fields
}

count表示当前哈希表中的元素数量
B表示当前哈希表持有的buckets数量。但是因为哈希表中桶的数量都是2的倍数，所以该字段会存储对数，也就是len(buckest) == 2^B
hash0是哈希的种子，它能为哈希函数的结果引入随机性。政治在创建哈希表时确定，并在调用哈希函数时作为参数传入；
oldbuckets是哈希再扩容是用于保存之前的buckets的字段。它的大小是当前buckets的一半;

// mapextra holds fields that are not present on all maps.
type mapextra struct {
    // If both key and elem do not contain pointers and are inline, then we mark bucket
    // type as containing no pointers. This avoids scanning such maps.
    // However, bmap.overflow is a pointer. In order to keep overflow buckets
    // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
    // overflow and oldoverflow are only used if key and elem do not contain pointers.
    // overflow contains overflow buckets for hmap.buckets.
    // oldoverflow contains overflow buckets for hmap.oldbuckets.
    // The indirection allows to store a pointer to the slice in hiter.
    overflow    *[]*bmap
    oldoverflow *[]*bmap
    // nextOverflow holds a pointer to a free overflow bucket.
    nextOverflow *bmap
}

哈希表 runtime/map.go-hmap 的桶是runtime/map.go-bmap.每一个bmap都能存储8个键值对，当哈希表中存储的数据过多。单个桶已经装满就会使用
extra.nextOverFlow中桶存储溢出的数据。上述两种不同的桶在内存中是连续存储的。分为正常桶和溢出桶。溢出桶使用c语言实现能够减少扩容的频率所以还在使用中。

桶的结构体 runtime/map.go-bmap 只有一个字段 tophash,tophash 存储了键的哈希的高8位

type bmap struct {
    tophash [bucketCnt]uint8
}

runtime.bmap中的其他字段在运行时通过计算内存地址的方式访问的。定义中不包含这些字段。通过编译期间的 cmd/compile/internal/gc/reflect.go-bmap函数重建结构

type bmap struct {
    topbits  [8]uint8
    keys     [8]keytype
    values   [8]valuetype
    pad      uintptr
    overflow uintptr
}

随着哈希表的数据的逐渐增多，会扩容哈希表或者使用额外的桶存储溢出的数据，不会让单个桶中的数据超过8个，不让单个桶的数据超过8个，溢出桶是临时的解决方案。创建过多的溢出桶最终也会导致哈希的扩容。

map初始化

字面量

字面量初始化会通过cmd/compile/internal/gc/sinit.go#L753-maplit初始化。

func maplit(n *Node, m *Node, init *Nodes) {
    // make the map var
    a := nod(OMAKE, nil, nil)
    a.Esc = n.Esc
    a.List.Set2(typenod(n.Type), nodintconst(int64(n.List.Len())))
    litas(m, a, init)
    entries := n.List.Slice()
    // The order pass already removed any dynamic (runtime-computed) entries.
    // All remaining entries are static. Double-check that.
    for _, r := range entries {
        if !isStaticCompositeLiteral(r.Left) || !isStaticCompositeLiteral(r.Right) {
            Fatalf("maplit: entry is not a literal: %v", r)
        }
    }
    if len(entries) > 25 {
    }
    ...
}

当哈希表中的元素少于或者等于25个时，会将所有的键值对一次性加入到哈希表中。
这种初始化方式与数组和切片的几乎相同。得到集合类型的初始化在go语言中有相同的处理逻辑。
超过25个元素数量时，编译器会创建两个数组分别存储键和值。键值对然后循环构建成哈希。

字面量初始化过程通过make关键字创建新的哈希并通过原始的[]语法向哈希追加元素。

运行时

创建的哈希被分配到栈上并且容量小于BUCKETSIZE=8时，直接快速初始化。
其他再通过make创建，编译器然后在类型检查期间转换成 src/runtime/map.go#L303-makemap。字面量最后调用的也是这个函数。

func makemap(t *maptype, hint int, h *hmap) *hmap {
    mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
    if overflow || mem > maxAlloc {
        hint = 0
    }
    // initialize Hmap
    if h == nil {
        h = new(hmap)
    }
    h.hash0 = fastrand()
    // Find the size parameter B which will hold the requested # of elements.
    // For hint < 0 overLoadFactor returns false since hint < bucketCnt.
    B := uint8(0)
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B
    // allocate initial hash table
    // if B == 0, the buckets field is allocated lazily later (in mapassign)
    // If hint is large zeroing this memory could take a while.
    if h.B != 0 {
        var nextOverflow *bmap
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
        if nextOverflow != nil {
            h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }
    return h
}

函数分为四部分：

计算哈希占用的内存是否溢出或者超出能分配的最大值。
调用runtime.fasterand获取一个随机的哈希种子。
根据传入的hint计算出需要的最小需要的桶数量
使用runtime.makeBucketArray 创建用于保存桶的数组

读写操作

写操作：增、删、改
读操作

读操作
直接通过key值进行访问的会被转换成OINDEXMAP操作。 哈希表扩容并不是原子过程, src/runtime/map.go-mapassign 扩容条件：

装载因子已经超过6.5
哈希使用了太多溢出桶

还需要判断是否已经处于扩容状态。避免二次扩容造成混乱。

map扩容时机

装载因子超过阈值。源码定义的阈值是6.5。
overflow 的bucket数量过多:当 B 小于15时，也就是bucket总数2^B小于15时，如果overflow的bucket数量超过2^B；当B>=15，也就是bucket总数2^B大于等于2^15时如果overflow的bucket数量超过2^15。

go的扩容是渐进式扩容。

map搬迁流程
growWork()->mapassign/mapdelete (插入、修改、删除key的时候) 才会搬迁

问题点： map 在迁移过程中老的bucket 发生了扩容，怎么把老的bucket中的迁移到新的buckets