bytes - 《Golang 源码阅读笔记》

bytes.go
buffer.go
reader.go

这个库是关于 []byte 操作的一些内容，它之于 []byte 就像 strings 库之于 string ，不过好像很多函数挺复杂的，因为还有个 rune 类型混入其中。

bytes.go

这里面全是些功能函数，没有涉及到接口结构体之类的，所以就简单过了一下。

有意思的是，子字符串匹配用的算法是 Rabin-Karp 算法，比如 LastIndex 函数

// src/bytes/bytes.go ---- line 103
// LastIndex returns the index of the last instance of sep in s, or -1 if sep is not present in s.
func LastIndex(s, sep []byte) int {
    n := len(sep)
    switch {
    case n == 0:
        return len(s)
    case n == 1:
        return LastIndexByte(s, sep[0])
    case n == len(s):
        if Equal(s, sep) {
            return 0
        }
        return -1
    case n > len(s):
        return -1
    }
    // Rabin-Karp search from the end of the string
    hashss, pow := hashStrRev(sep)
    last := len(s) - n
    var h uint32
    for i := len(s) - 1; i >= last; i-- {
        h = h*primeRK + uint32(s[i])
    }
    if h == hashss && Equal(s[last:], sep) {
        return last
    }
    for i := last - 1; i >= 0; i-- {
        h *= primeRK
        h += uint32(s[i])
        h -= pow * uint32(s[i+n])
        if h == hashss && Equal(s[i:i+n], sep) {
            return i
        }
    }
    return -1
}
// src/bytes/bytes.go ---- line 1104
// primeRK is the prime base used in Rabin-Karp algorithm.
const primeRK = 16777619
// hashStr returns the hash and the appropriate multiplicative
// factor for use in Rabin-Karp algorithm.
func hashStr(sep []byte) (uint32, uint32) {
    hash := uint32(0)
    for i := 0; i < len(sep); i++ {
        hash = hash*primeRK + uint32(sep[i])
    }
    var pow, sq uint32 = 1, primeRK
    for i := len(sep); i > 0; i >>= 1 {
        if i&1 != 0 {
            pow *= sq
        }
        sq *= sq
    }
    return hash, pow
}
// hashStrRev returns the hash of the reverse of sep and the
// appropriate multiplicative factor for use in Rabin-Karp algorithm.
func hashStrRev(sep []byte) (uint32, uint32) {
    hash := uint32(0)
    for i := len(sep) - 1; i >= 0; i-- {
        hash = hash*primeRK + uint32(sep[i])
    }
    var pow, sq uint32 = 1, primeRK
    for i := len(sep); i > 0; i >>= 1 {
        if i&1 != 0 {
            pow *= sq
        }
        sq *= sq
    }
    return hash, pow
}

这个算法的核心想法是把字符串表示成整数来进行比较，这样效率会比较高，有时间再来写个逐行的分析。

其中还有个这样的数组

// src/bytes/bytes.go ---- line 307
var asciiSpace = [256]uint8{'\t': 1, '\n': 1, '\v': 1, '\f': 1, '\r': 1, ' ': 1}

最开始看到还没看懂，其实字符就是 uint8 嘛，这样写只是为了好理解，把算作空字符的位置置 1，其他都默认 0, 后面用于判断。判断字符 r 是不是 space

// src/bytes/bytes.go ---- line 323
isSpace := int(asciiSpace[r])

buffer.go

这个就是对 []byte 类型的一种封装，也就是 Buffer 结构体，实现了 io.ReadWriter 接口

// src/bytes/buffer.go ---- line 18
// A Buffer is a variable-sized buffer of bytes with Read and Write methods.
// The zero value for Buffer is an empty buffer ready to use.
type Buffer struct {
    buf      []byte // contents are the bytes buf[off : len(buf)]
    off      int    // read at &buf[off], write at &buf[len(buf)]
    lastRead readOp // last read operation, so that Unread* can work correctly.
}

整体都很普通，没有太出彩的地方。学会一个 maxInt 的写法

// src/bytes/buffer.go ---- line 47
const maxInt = int(^uint(0) >> 1)

reader.go

这个其实跟上面那个差不多啦，只不过只实现了 io.Reader 接口，就是用于把 []byte 当作一个 Reader 来使用的一种封装。

// src/bytes/reader.go ---- line 13
// A Reader implements the io.Reader, io.ReaderAt, io.WriterTo, io.Seeker,
// io.ByteScanner, and io.RuneScanner interfaces by reading from
// a byte slice.
// Unlike a Buffer, a Reader is read-only and supports seeking.
// The zero value for Reader operates like a Reader of an empty slice.
type Reader struct {
    s        []byte
    i        int64 // current reading index
    prevRune int   // index of previous rune; or < 0
}