看库名就知道,是带缓存区的 io

bufio.go

实现了两个结构体,分别是带 buffer 的 Reader 和带 buffer 的 Writer 源码如下

  1. // src/bufio/bufio.go ---- line 30
  2. // Reader implements buffering for an io.Reader object.
  3. type Reader struct {
  4. buf []byte
  5. rd io.Reader // reader provided by the client
  6. r, w int // buf read and write positions
  7. err error
  8. lastByte int // last byte read for UnreadByte; -1 means invalid
  9. lastRuneSize int // size of last rune read for UnreadRune; -1 means invalid
  10. }
  11. // src/bufio/bufio.go ---- line 536
  12. // Writer implements buffering for an io.Writer object.
  13. // If an error occurs writing to a Writer, no more data will be
  14. // accepted and all subsequent writes, and Flush, will return the error.
  15. // After all data has been written, the client should call the
  16. // Flush method to guarantee all data has been forwarded to
  17. // the underlying io.Writer.
  18. type Writer struct {
  19. err error
  20. buf []byte
  21. n int
  22. wr io.Writer
  23. }

比较有趣的是 ReaderReadSlice 方法,可惜只接受单 byte 作为参数,我曾经实现过一个接受 []byte 的相同功能的函数,早知道这里有,就直接从这里改进了。要注意的是这个函数返回的字节切片指向的底层数组是和 Readerbuffer 指向的底层数组相同的,意味着会被覆盖,所以一般使用 ReadBytes 来代替。

  1. // src/bufio/bufio.go ---- line 320
  2. // ReadSlice reads until the first occurrence of delim in the input,
  3. // returning a slice pointing at the bytes in the buffer.
  4. // The bytes stop being valid at the next read.
  5. // If ReadSlice encounters an error before finding a delimiter,
  6. // it returns all the data in the buffer and the error itself (often io.EOF).
  7. // ReadSlice fails with error ErrBufferFull if the buffer fills without a delim.
  8. // Because the data returned from ReadSlice will be overwritten
  9. // by the next I/O operation, most clients should use
  10. // ReadBytes or ReadString instead.
  11. // ReadSlice returns err != nil if and only if line does not end in delim.
  12. func (b *Reader) ReadSlice(delim byte) (line []byte, err error) {
  13. s := 0 // search start index
  14. for {
  15. // Search buffer.
  16. if i := bytes.IndexByte(b.buf[b.r+s:b.w], delim); i >= 0 {
  17. i += s
  18. line = b.buf[b.r : b.r+i+1]
  19. b.r += i + 1
  20. break
  21. }
  22. // Pending error?
  23. if b.err != nil {
  24. line = b.buf[b.r:b.w]
  25. b.r = b.w
  26. err = b.readErr()
  27. break
  28. }
  29. // Buffer full?
  30. if b.Buffered() >= len(b.buf) {
  31. b.r = b.w
  32. line = b.buf
  33. err = ErrBufferFull
  34. break
  35. }
  36. s = b.w - b.r // do not rescan area we scanned before
  37. b.fill() // buffer is not full
  38. }
  39. // Handle last byte, if any.
  40. if i := len(line) - 1; i >= 0 {
  41. b.lastByte = int(line[i])
  42. b.lastRuneSize = -1
  43. }
  44. return
  45. }

大体都很简单,没有很难理解的东西。
还发现一些小瑕疵,结构体 ReaderBuffered 方法返回 b.w - b.r 这个值,而 Size 返回 len(b.buf) 值,分别是缓存区中缓存内容的长度和缓存区长度。但是在其他方法中,涉及到 b.w - b.rlen(b.buf) 时,有的是直接调用 BufferedSize 方法,有的又显式写出来,规范不统一,看起来很奇怪。比如上述代码块的第 32 行和 Peek 方法中的几行。


还有一点需要注意,在往 Writer 写入数据时,如果写入内容不超过 buffer 长度的部分是不会自动 Flush 的。所以说每次写都会有 len(data) % len(w.buf) 的数据保留在 buffer 中没有 Flush 需要调用者显式 Flush

scan.go

看了好一会都没看懂这个 Scanner 是干嘛的,然后自己写了个测试代码,一下子就搞懂了,果然实践出真知。下面给出 Scanner 结构体和 Scan 方法。

  1. // src/bufio/scan.go ---- line 14
  2. // Scanner provides a convenient interface for reading data such as
  3. // a file of newline-delimited lines of text. Successive calls to
  4. // the Scan method will step through the 'tokens' of a file, skipping
  5. // the bytes between the tokens. The specification of a token is
  6. // defined by a split function of type SplitFunc; the default split
  7. // function breaks the input into lines with line termination stripped. Split
  8. // functions are defined in this package for scanning a file into
  9. // lines, bytes, UTF-8-encoded runes, and space-delimited words. The
  10. // client may instead provide a custom split function.
  11. //
  12. // Scanning stops unrecoverably at EOF, the first I/O error, or a token too
  13. // large to fit in the buffer. When a scan stops, the reader may have
  14. // advanced arbitrarily far past the last token. Programs that need more
  15. // control over error handling or large tokens, or must run sequential scans
  16. // on a reader, should use bufio.Reader instead.
  17. //
  18. type Scanner struct {
  19. r io.Reader // The reader provided by the client.
  20. split SplitFunc // The function to split the tokens.
  21. maxTokenSize int // Maximum size of a token; modified by tests.
  22. token []byte // Last token returned by split.
  23. buf []byte // Buffer used as argument to split.
  24. start int // First non-processed byte in buf.
  25. end int // End of data in buf.
  26. err error // Sticky error.
  27. empties int // Count of successive empty tokens.
  28. scanCalled bool // Scan has been called; buffer is in use.
  29. done bool // Scan has finished.
  30. }
  31. // src/bufio/scan.go ---- line 125
  32. // Scan advances the Scanner to the next token, which will then be
  33. // available through the Bytes or Text method. It returns false when the
  34. // scan stops, either by reaching the end of the input or an error.
  35. // After Scan returns false, the Err method will return any error that
  36. // occurred during scanning, except that if it was io.EOF, Err
  37. // will return nil.
  38. // Scan panics if the split function returns too many empty
  39. // tokens without advancing the input. This is a common error mode for
  40. // scanners.
  41. func (s *Scanner) Scan() bool {
  42. if s.done {
  43. return false
  44. }
  45. s.scanCalled = true
  46. // Loop until we have a token.
  47. for {
  48. // See if we can get a token with what we already have.
  49. // If we've run out of data but have an error, give the split function
  50. // a chance to recover any remaining, possibly empty token.
  51. if s.end > s.start || s.err != nil {
  52. advance, token, err := s.split(s.buf[s.start:s.end], s.err != nil)
  53. if err != nil {
  54. if err == ErrFinalToken {
  55. s.token = token
  56. s.done = true
  57. return true
  58. }
  59. s.setErr(err)
  60. return false
  61. }
  62. if !s.advance(advance) {
  63. return false
  64. }
  65. s.token = token
  66. if token != nil {
  67. if s.err == nil || advance > 0 {
  68. s.empties = 0
  69. } else {
  70. // Returning tokens not advancing input at EOF.
  71. s.empties++
  72. if s.empties > maxConsecutiveEmptyReads {
  73. panic("bufio.Scan: too many empty tokens without progressing")
  74. }
  75. }
  76. return true
  77. }
  78. }
  79. // We cannot generate a token with what we are holding.
  80. // If we've already hit EOF or an I/O error, we are done.
  81. if s.err != nil {
  82. // Shut it down.
  83. s.start = 0
  84. s.end = 0
  85. return false
  86. }
  87. // Must read more data.
  88. // First, shift data to beginning of buffer if there's lots of empty space
  89. // or space is needed.
  90. if s.start > 0 && (s.end == len(s.buf) || s.start > len(s.buf)/2) {
  91. copy(s.buf, s.buf[s.start:s.end])
  92. s.end -= s.start
  93. s.start = 0
  94. }
  95. // Is the buffer full? If so, resize.
  96. if s.end == len(s.buf) {
  97. // Guarantee no overflow in the multiplication below.
  98. const maxInt = int(^uint(0) >> 1)
  99. if len(s.buf) >= s.maxTokenSize || len(s.buf) > maxInt/2 {
  100. s.setErr(ErrTooLong)
  101. return false
  102. }
  103. newSize := len(s.buf) * 2
  104. if newSize == 0 {
  105. newSize = startBufSize
  106. }
  107. if newSize > s.maxTokenSize {
  108. newSize = s.maxTokenSize
  109. }
  110. newBuf := make([]byte, newSize)
  111. copy(newBuf, s.buf[s.start:s.end])
  112. s.buf = newBuf
  113. s.end -= s.start
  114. s.start = 0
  115. }
  116. // Finally we can read some input. Make sure we don't get stuck with
  117. // a misbehaving Reader. Officially we don't need to do this, but let's
  118. // be extra careful: Scanner is for safe, simple jobs.
  119. for loop := 0; ; {
  120. n, err := s.r.Read(s.buf[s.end:len(s.buf)])
  121. s.end += n
  122. if err != nil {
  123. s.setErr(err)
  124. break
  125. }
  126. if n > 0 {
  127. s.empties = 0
  128. break
  129. }
  130. loop++
  131. if loop > maxConsecutiveEmptyReads {
  132. s.setErr(io.ErrNoProgress)
  133. break
  134. }
  135. }
  136. }
  137. }

Scanner 支持自定义 split 方法。并且提供了内置的几个 split 方法分别是 ScanBytes, ScanRunes, ScanLines, ScanWords
需要注意的是 调用过 Scan 方法之后就不能再调用 Buffer 方法来改变缓存区了,否则会 panic