• Go常见类型的底层原理
  • 字符串、切片、map
  • 接口、结构体

    什么变量的大小是0字节

    基本类型的字节数

  • int大小跟随系统字长

  • 指针的大小也是系统字长 ```go package main

import ( “fmt” “unsafe” )

type K struct { }

func main() { a := K{} b := int(0) //8 c := K{} fmt.Println(unsafe.Sizeof(a)) //a和c的地址一模一样 fmt.Printf(“%p\n”, &a) //0x110a5c0 fmt.Printf(“%p\n”, &b) fmt.Printf(“%p\n”, &c) //0x110a5c0 }

  1. `runtime/malloc.go`
  2. ```go
  3. // base address for all 0-byte allocations
  4. var zerobase uintptr

空结构体

  • 空结构体的地址均相同(不被包含在其他结构体时)
  • 空结构体主要是为了节约内存

    • 结合map
    • 结合channel

      总结

  • Go中部分数据的长度与系统字长有关

  • 空结构体不占用空间
  • 空结构体与map结合可以实现hashset
  • 空结构体与channel结合可以当作纯信号(不携带任何信息的信号量)

    数组、字符串、切片底层是一样的吗?

    字符串

    1. type stringStruct struct {
    2. str unsafe.Pointer
    3. len int
    4. }
  • 字符串本质是个结构体

  • Data指针指向底层Byte数组
  • Len表示Byte数组的长度(字节数)

    1. type StringHeader struct {
    2. Data uintptr
    3. Len int
    4. }

    字符编码问题

  • 所有的字符均使用Unicode字符集

  • 使用UTF-8编码

    Unicode

  • 一种统一的字符集

  • 囊括了159种文字的144679个字符
  • 14万个字符至少需要3个字节表示
  • 英文字母均排在前128个

    UTF-8

  • Unicode的一种变长格式

  • 128个US-ASCII字符只需一个字节编码
  • 西方常用字符需要两个字节
  • 其他字符需要3个字节,极少需要4个字节

image.png

  1. // Copyright 2016 The Go Authors. All rights reserved.
  2. // Use of this source code is governed by a BSD-style
  3. // license that can be found in the LICENSE file.
  4. package runtime
  5. // Numbers fundamental to the encoding.
  6. const (
  7. runeError = '\uFFFD' // the "error" Rune or "Unicode replacement character"
  8. runeSelf = 0x80 // characters below runeSelf are represented as themselves in a single byte.
  9. maxRune = '\U0010FFFF' // Maximum valid Unicode code point.
  10. )
  11. // Code points in the surrogate range are not valid for UTF-8.
  12. const (
  13. surrogateMin = 0xD800
  14. surrogateMax = 0xDFFF
  15. )
  16. const (
  17. t1 = 0x00 // 0000 0000
  18. tx = 0x80 // 1000 0000
  19. t2 = 0xC0 // 1100 0000
  20. t3 = 0xE0 // 1110 0000
  21. t4 = 0xF0 // 1111 0000
  22. t5 = 0xF8 // 1111 1000
  23. maskx = 0x3F // 0011 1111
  24. mask2 = 0x1F // 0001 1111
  25. mask3 = 0x0F // 0000 1111
  26. mask4 = 0x07 // 0000 0111
  27. rune1Max = 1<<7 - 1
  28. rune2Max = 1<<11 - 1
  29. rune3Max = 1<<16 - 1
  30. // The default lowest and highest continuation byte.
  31. locb = 0x80 // 1000 0000
  32. hicb = 0xBF // 1011 1111
  33. )
  34. // countrunes returns the number of runes in s.
  35. func countrunes(s string) int {
  36. n := 0
  37. for range s {
  38. n++
  39. }
  40. return n
  41. }
  42. // decoderune returns the non-ASCII rune at the start of
  43. // s[k:] and the index after the rune in s.
  44. //
  45. // decoderune assumes that caller has checked that
  46. // the to be decoded rune is a non-ASCII rune.
  47. //
  48. // If the string appears to be incomplete or decoding problems
  49. // are encountered (runeerror, k + 1) is returned to ensure
  50. // progress when decoderune is used to iterate over a string.
  51. func decoderune(s string, k int) (r rune, pos int) {
  52. pos = k
  53. if k >= len(s) {
  54. return runeError, k + 1
  55. }
  56. s = s[k:]
  57. switch {
  58. case t2 <= s[0] && s[0] < t3:
  59. // 0080-07FF two byte sequence
  60. if len(s) > 1 && (locb <= s[1] && s[1] <= hicb) {
  61. r = rune(s[0]&mask2)<<6 | rune(s[1]&maskx)
  62. pos += 2
  63. if rune1Max < r {
  64. return
  65. }
  66. }
  67. case t3 <= s[0] && s[0] < t4:
  68. // 0800-FFFF three byte sequence
  69. if len(s) > 2 && (locb <= s[1] && s[1] <= hicb) && (locb <= s[2] && s[2] <= hicb) {
  70. r = rune(s[0]&mask3)<<12 | rune(s[1]&maskx)<<6 | rune(s[2]&maskx)
  71. pos += 3
  72. if rune2Max < r && !(surrogateMin <= r && r <= surrogateMax) {
  73. return
  74. }
  75. }
  76. case t4 <= s[0] && s[0] < t5:
  77. // 10000-1FFFFF four byte sequence
  78. if len(s) > 3 && (locb <= s[1] && s[1] <= hicb) && (locb <= s[2] && s[2] <= hicb) && (locb <= s[3] && s[3] <= hicb) {
  79. r = rune(s[0]&mask4)<<18 | rune(s[1]&maskx)<<12 | rune(s[2]&maskx)<<6 | rune(s[3]&maskx)
  80. pos += 4
  81. if rune3Max < r && r <= maxRune {
  82. return
  83. }
  84. }
  85. }
  86. return runeError, k + 1
  87. }
  88. // encoderune writes into p (which must be large enough) the UTF-8 encoding of the rune.
  89. // It returns the number of bytes written.
  90. func encoderune(p []byte, r rune) int {
  91. // Negative values are erroneous. Making it unsigned addresses the problem.
  92. switch i := uint32(r); {
  93. case i <= rune1Max:
  94. p[0] = byte(r)
  95. return 1
  96. case i <= rune2Max:
  97. _ = p[1] // eliminate bounds checks
  98. p[0] = t2 | byte(r>>6)
  99. p[1] = tx | byte(r)&maskx
  100. return 2
  101. case i > maxRune, surrogateMin <= i && i <= surrogateMax:
  102. r = runeError
  103. fallthrough
  104. case i <= rune3Max:
  105. _ = p[2] // eliminate bounds checks
  106. p[0] = t3 | byte(r>>12)
  107. p[1] = tx | byte(r>>6)&maskx
  108. p[2] = tx | byte(r)&maskx
  109. return 3
  110. default:
  111. _ = p[3] // eliminate bounds checks
  112. p[0] = t4 | byte(r>>18)
  113. p[1] = tx | byte(r>>12)&maskx
  114. p[2] = tx | byte(r>>6)&maskx
  115. p[3] = tx | byte(r)&maskx
  116. return 4
  117. }
  118. }
  1. func main() {
  2. s := "世界" //utf-8 编码
  3. for _, v := range s {
  4. fmt.Printf("%c\n", v)
  5. }
  6. }

字符串的访问

  • 对字符串使用len方法得到的是字节数不是字节数
  • 对字符串直接使用下标访问,得到的是字节
  • 字符串被range遍历,被解码为rune类型的字符
  • UTF-8编解码算法位于runtime/utf8.go

    字符串的切分

  • 需要切分时

    • 转为rune数组
    • 切片
    • 转为string
    • s = string([]rune(s)[:3])

      切片

  • 切片的本质是对数组的引用

    1. type slice struct {
    2. array unsafe.Pointer
    3. len int
    4. cap int
    5. }

    切片的创建

  • 根据数组创建

arr[0:3] or slice[0:3]

  • 字面量:编译时插入创建数组的代码

slice := []int{1,2,3}

  • make:运行时创建数组

slice := make([]int,10)

  1. 0x0014 00020 (代码路径\main2.go:11) LEAQ type.int(SB), AX
  2. 0x001b 00027 (代码路径\main2.go:11) MOVL $3, BX
  3. 0x0020 00032 (代码路径\main2.go:11) MOVQ BX, CX
  4. 0x0023 00035 (代码路径\main2.go:11) PCDATA $1, $0
  5. 0x0023 00035 (代码路径\main2.go:11) CALL runtime.makeslice(SB)

arr := [10]int{0,1,2,3,4,5,6,7,8,9}
slice := arr[1:4]
image.png

切片的访问

  • 下标直接访问元素
  • range遍历元素
  • len(slice)查看切片长度
  • cap(slice)查看切片容量

    切片的追加

  • 不扩容时,只调整len(编译器负责)

image.png

  • 扩容时,编译时转为调用runtime.growslice()
  • 如果期望容量大于当前容量的两倍,就会使用期望容量
  • 如果当前切片的长度小于1024,将容量翻倍
  • 如果当前切片的长度小于1024,每次增加25%
  • 切片扩容时,并发不安全,注意切片并发要加锁

    • 我们在做切片扩容的时候,如果两个协程同时访问切片,第一个访问切片,第二个追加切片,当追加切片的时候会废弃掉原来的数组,追加一个容量是其两倍的数组(当然,遵循切片的扩容机制)。如果第一个协程仍旧在读取老的数组(原来的),就会出现并发的问题。新追加过来的元素,原来老的协程就无法查看,故切片追加/扩容时并发是不安全的。

      总结

  • 字符串与切片都是对底层数组的引用

  • 字符串有UTF-8变长编码的特点
  • 切片的容量和长度不同
  • 切片追加时可能需要重建底层数组

    map:重写Redis能用它吗?

    Redis的本质是一个大的hashmap

HashMap的基本方案

  • 开放寻址法
  • 拉链法

    开放寻址法

    image.png

    拉链法

    image.png

    Go的map

  • runtime.hmap


    1. // A header for a Go map.
    2. type hmap struct {
    3. // Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
    4. // Make sure this stays in sync with the compiler's definition.
    5. count int // # live cells == size of map. Must be first (used by len() builtin)
    6. flags uint8
    7. B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    8. noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    9. hash0 uint32 // hash seed
    10. buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    11. oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    12. nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
    13. extra *mapextra // optional fields
    14. }

    buckets指向由很多个bmap组成的数组

image.png

  1. // mapextra holds fields that are not present on all maps.
  2. type mapextra struct {
  3. // If both key and elem do not contain pointers and are inline, then we mark bucket
  4. // type as containing no pointers. This avoids scanning such maps.
  5. // However, bmap.overflow is a pointer. In order to keep overflow buckets
  6. // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
  7. // overflow and oldoverflow are only used if key and elem do not contain pointers.
  8. // overflow contains overflow buckets for hmap.buckets.
  9. // oldoverflow contains overflow buckets for hmap.oldbuckets.
  10. // The indirection allows to store a pointer to the slice in hiter.
  11. overflow *[]*bmap
  12. oldoverflow *[]*bmap
  13. // nextOverflow holds a pointer to a free overflow bucket.
  14. nextOverflow *bmap
  15. }
  1. // A bucket for a Go map.
  2. type bmap struct {
  3. // tophash generally contains the top byte of the hash value
  4. // for each key in this bucket. If tophash[0] < minTopHash,
  5. // tophash[0] is a bucket evacuation state instead.
  6. tophash [bucketCnt]uint8
  7. // Followed by bucketCnt keys and then bucketCnt elems.
  8. // NOTE: packing all the keys together and then all the elems together makes the
  9. // code a bit more complicated than alternating key/elem/key/elem/... but it allows
  10. // us to eliminate padding which would be needed for, e.g., map[int64]int8.
  11. // Followed by an overflow pointer.
  12. }

tophash代表一个数组,数组中有8个hash值 keys字段代表一个数组,数组里面有8个key elems字段代表一个数组,数组里面有8个value overflow是一个指针,指向溢出桶,这个桶里面继续放K-V

map的初始化

  • make
  • 字面量

    map的初始化:make

    m := make(map[string]int,10)

  1. 0x0014 00020 (路径\main3.go:6) LEAQ type.map[string]int(SB), AX
  2. 0x001b 00027 (路径\main3.go:6) MOVL $10, BX
  3. 0x0020 00032 (路径\main3.go:6) XORL CX, CX
  4. 0x0022 00034 (路径\main3.go:6) PCDATA $1, $0
  5. 0x0022 00034 (路径\main3.go:6) CALL runtime.makemap(SB)

image.png

map的初始化:字面量

  • 元素少于25个时,转化为简单赋值

    编译时改成右侧的代码(代码优化?)

image.png

  • 元素多于25个时,转化为循环赋值

image.png

map的访问

image.png

tophash数组中记录的值是八个key的hash值的高八位,为了快速遍历

map的访问:计算桶号

image.png

map的访问:计算tophash

image.png

map的访问:匹配

可能遇到的特殊情况:

  • 哈希碰撞比较厉害,hash的高八位是5c,但是key值hash之后不是5c,往后继续找5c,如果找遍这个tophash都没有找到其他的5c,去查看overflow溢出指针,如果有,说明这八个写满了,然后我们根据溢出指针指向的溢出桶,从溢出的桶中再去遍历,看能不能找到我们要的tophash。如果都没有表明查询的key值不存在在map当中。

image.png

map的写入

image.png

总结

  • Go语言使用拉链实现了hashmap
  • 每一个桶中存储键哈希的前八位(高八位,有助于快速遍历)
  • 桶超出8个数据,就会存储到溢出桶中

    map为什么需要扩容?

    哈希碰撞

    image.png

    map扩容

  • map溢出桶太多时会导致严重的性能下降

  • runtime.mapassign()可能会出发扩容的情况

    • 装载因子超过6.5(平均每个槽6.5个key)
    • 使用太多溢出桶(溢出桶超过普通桶)

      map扩容的类型

  • 等量扩容:数据不多但是溢出桶太多了(整理)

  • 翻倍扩容:数据太多了

    map扩容:Step 1

  • 创建一组新桶

  • oldbuckets指向原有的桶数组
  • buckets指向新的桶数组
  • map标记为扩容状态

image.png

map扩容:Step 2

  • 将所有的数据从旧桶驱逐到新桶
  • 采用渐进式驱逐
  • 每次操作一个旧桶,将旧桶数据驱逐到新桶
  • 读取时不进行驱逐,只判断读取新桶还是旧桶

image.png

map扩容:Step 3

  • 所有的旧桶驱逐完成后
  • oldbuckets回收

image.png

总结

  • 装载系数或者溢出桶的增加,会触发map扩容
  • “扩容”可能并不是增加桶数,而是整理
  • map扩容采用渐进式,桶被操作时才会重新分配

    怎么解决map的并发问题?

    map的并发问题

  • map的读写有并发问题

  • A协程在桶中读数据时,B协程驱逐了这个桶
  • A协程会读到错误的数据或者找不到数据 ```go package main

func main() { m := make(map[int]int) go func() { for { _ = m[1] } }()

  1. go func() {
  2. for {
  3. m[2] = 2
  4. }
  5. }()
  6. select {}

}

  1. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/27391242/1663856262269-7db75f65-7053-48f6-88d6-a107761cc505.png#clientId=u0979010d-4acc-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=453&id=uebbdf966&margin=%5Bobject%20Object%5D&name=image.png&originHeight=679&originWidth=910&originalType=binary&ratio=1&rotation=0&showTitle=false&size=200795&status=done&style=none&taskId=u23b58825-a89c-412b-90ed-b3530b3a91e&title=&width=606.6666666666666)
  2. <a name="jhzbP"></a>
  3. ## map并发问题解决方案
  4. - map加锁(mutex
  5. - 使用sync.Map
  6. <a name="MbgYY"></a>
  7. ## sync.Map
  8. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/27391242/1663906923916-e4c76080-3143-4b0a-850c-deb9eabb48ab.png#clientId=u0979010d-4acc-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=459&id=ub890de35&margin=%5Bobject%20Object%5D&name=image.png&originHeight=688&originWidth=2147&originalType=binary&ratio=1&rotation=0&showTitle=false&size=536455&status=done&style=none&taskId=u30699ff0-28c3-42aa-bae0-a7ca8eea5a9&title=&width=1431.3333333333333)
  9. ```go
  10. type Map struct {
  11. mu Mutex
  12. // read contains the portion of the map's contents that are safe for
  13. // concurrent access (with or without mu held).
  14. //
  15. // The read field itself is always safe to load, but must only be stored with
  16. // mu held.
  17. //
  18. // Entries stored in read may be updated concurrently without mu, but updating
  19. // a previously-expunged entry requires that the entry be copied to the dirty
  20. // map and unexpunged with mu held.
  21. read atomic.Value // readOnly
  22. // dirty contains the portion of the map's contents that require mu to be
  23. // held. To ensure that the dirty map can be promoted to the read map quickly,
  24. // it also includes all of the non-expunged entries in the read map.
  25. //
  26. // Expunged entries are not stored in the dirty map. An expunged entry in the
  27. // clean map must be unexpunged and added to the dirty map before a new value
  28. // can be stored to it.
  29. //
  30. // If the dirty map is nil, the next write to the map will initialize it by
  31. // making a shallow copy of the clean map, omitting stale entries.
  32. dirty map[any]*entry
  33. // misses counts the number of loads since the read map was last updated that
  34. // needed to lock mu to determine whether the key was present.
  35. //
  36. // Once enough misses have occurred to cover the cost of copying the dirty
  37. // map, the dirty map will be promoted to the read map (in the unamended
  38. // state) and the next store to the map will make a new dirty copy.
  39. misses int
  40. }
  41. // readOnly is an immutable struct stored atomically in the Map.read field.
  42. type readOnly struct {
  43. //*entry -> unsafe.Pointer
  44. m map[any]*entry
  45. amended bool // true if the dirty map contains some key not in m.
  46. }
  47. type entry struct {
  48. // p points to the interface{} value stored for the entry.
  49. //
  50. // If p == nil, the entry has been deleted, and either m.dirty == nil or
  51. // m.dirty[key] is e.
  52. //
  53. // If p == expunged, the entry has been deleted, m.dirty != nil, and the entry
  54. // is missing from m.dirty.
  55. //
  56. // Otherwise, the entry is valid and recorded in m.read.m[key] and, if m.dirty
  57. // != nil, in m.dirty[key].
  58. //
  59. // An entry can be deleted by atomic replacement with nil: when m.dirty is
  60. // next created, it will atomically replace nil with expunged and leave
  61. // m.dirty[key] unset.
  62. //
  63. // An entry's associated value can be updated by atomic replacement, provided
  64. // p != expunged. If p == expunged, an entry's associated value can be updated
  65. // only after first setting m.dirty[key] = e so that lookups using the dirty
  66. // map find the entry.
  67. p unsafe.Pointer // *interface{}
  68. }

sync.Map正常读写

“a” -> “AAA”

image.png

read->m(map)->hash->高八位、找桶号(B)-> buckets->bmap->[tophash、key、elems、overflow]->找到值"a"->*entry->Pointer->value->”A”

sync.Map追加

“d” -> “D”

image.png

read->m(map)->没找到->mu加锁->锁住dirty(map)->上锁以后操作dirty(map)->同时只能有一个协程去操作dirty(map)->给dirty(map)做追加->上方的amended的值置为**true**

image.png

Sync.Map追加后读写

image.png

read->m(map)->没查找到"d"->状态量amended = true->读dirty(map)->读取到->misses++ 当misses追加到len(dirty)

Sync.Map dirty提升

image.png

dirty变成新的read map

image.png

misses0

image.png

amended = false nil->dirty(map)后面如果有追加会重建这个dirty(map)

Sync.Map 删除

  • 两个map做删除,相比于查询、修改、新增、删除更麻烦
  • 删除可以分为正常删除和追加后删除
  • 提升后,被删除的key还需要特殊处理

    Sync.Map 正常删除

    image.png

    将*entry指针指向的Pointer置为nil

image.png

Sync.Map 追加后删除

image.png
image.png

Sync.Map 删除后提升dirty

image.png

不重建新的"d" expunged擦除、删去

image.png

普通读写和追加分离

普通的读写不涉及扩容,它一涉及扩容就要有并发问题,所以说普通读写不涉及扩容走上上面的read map,涉及扩容走下面dirty map,走下面的时候要加锁,保证只有一个协程能同时走下边,这样的话就能避免map扩容的并发问题,走完下面之后,你还会出现下面和上面不一致的问题,通过查看amended和misses信号量,来确定是否需要dirty map提升。

总结

  • map在扩容时会有并发问题
  • sync.Map使用两个map,分离了扩容问题
  • 不会引发扩容的操作(查、改、删)使用read map
  • 可能引发扩容的操作(新增)使用dirty map
  • sync.Map在写多读多追加少的情况下,sync.Map性能会好

    接口:隐式更好还是显式更好?

    Go隐式接口特点

  • 只要实现了接口的全部方法,就是自动实现接口

  • 可以在不修改代码的情况下抽象出新的接口 ```go type iface struct { tab *itab data unsafe.Pointer }

type itab struct { inter interfacetype _type type hash uint32 // copy of _type.hash. Used for type switches. [4]byte fun [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter. }

  1. ```go
  2. package main
  3. import (
  4. "fmt"
  5. )
  6. type Car interface {
  7. Driver()
  8. }
  9. type TrafficTool interface {
  10. Driver()
  11. Run()
  12. }
  13. type Truck struct {
  14. Model string
  15. }
  16. func (t Truck) Driver() {
  17. fmt.Println(t.Model)
  18. }
  19. func main() {
  20. //1.一个接口的值在底层是如何表示的
  21. //c的底层表示是iface
  22. //data 指针指向的才是我们新建的结构体
  23. var c Car = Truck{}
  24. truck := c.(Truck)
  25. //tool := c.(TrafficTool)
  26. fmt.Printf("%p\n", c)
  27. fmt.Printf("%+p\n", truck)
  28. //fmt.Printf("%p\n", tool)
  29. switch c.(type) {
  30. //case Truck:
  31. // fmt.Println("Truck")
  32. //case Car:
  33. // fmt.Println("Car")
  34. case TrafficTool:
  35. fmt.Println("TrafficTool")
  36. }
  37. }

接口值的底层表示

  • 接口数据使用runtime.iface表示
  • iface记录了数据的地址
  • iface中也记录了接口类型信息和实现的方法

    类型断言

  • 类型断言是一个使用在接口值上的操作

  • 可以将接口值转换为其他类型值(实现或者兼容接口)
  • 可以配合switch进行类型判断
  • 软式的类型转化

    结构体和指针实现接口

    | | 结构体实现接口 | 结构体指针实现接口 | | —- | —- | —- | | 结构体初始化变量 | 通过 | 不通过 | | 结构体指针初始化变量 | 通过 | 通过 |

空接口值

  • runtime.eface结构体
  • 空接口底层不是普通接口
  • 空接口值可以承载任何数据

    空接口的用途

  • 空接口的最大用途是作为任意类型的函数入参

  • 函数调用时,会新生成一个空接口,再传参

    总结

  • Go的隐式接口更加方便系统的扩展和重构

  • 结构体和指针都可以实现接口
  • 空接口值可以承载任何类型的数据

    nil、空接口、空结构体有什么区别?

    nil

  • nil是空,并不一定是"空指针"

  • nil是6种类型的”零值”
  • 每种类型的nil是不同的,无法比较

    空结构体

  • 空结构体是Go中非常特殊的类型

  • 空结构体的值不是nil
  • 空结构体的指针也不是nil,但是都相同(zerobase)

    空接口

  • 空接口不一定是”nil 接口”


总结

内存对齐是如何优化程序效率的?

总结

本章小结