bufio.go
实现了两个结构体,分别是带 buffer 的 Reader 和带 buffer 的 Writer 源码如下
// src/bufio/bufio.go ---- line 30// Reader implements buffering for an io.Reader object.type Reader struct {buf []byterd io.Reader // reader provided by the clientr, w int // buf read and write positionserr errorlastByte int // last byte read for UnreadByte; -1 means invalidlastRuneSize int // size of last rune read for UnreadRune; -1 means invalid}// src/bufio/bufio.go ---- line 536// Writer implements buffering for an io.Writer object.// If an error occurs writing to a Writer, no more data will be// accepted and all subsequent writes, and Flush, will return the error.// After all data has been written, the client should call the// Flush method to guarantee all data has been forwarded to// the underlying io.Writer.type Writer struct {err errorbuf []byten intwr io.Writer}
比较有趣的是 Reader 的 ReadSlice 方法,可惜只接受单 byte 作为参数,我曾经实现过一个接受 []byte 的相同功能的函数,早知道这里有,就直接从这里改进了。要注意的是这个函数返回的字节切片指向的底层数组是和 Reader 中 buffer 指向的底层数组相同的,意味着会被覆盖,所以一般使用 ReadBytes 来代替。
// src/bufio/bufio.go ---- line 320// ReadSlice reads until the first occurrence of delim in the input,// returning a slice pointing at the bytes in the buffer.// The bytes stop being valid at the next read.// If ReadSlice encounters an error before finding a delimiter,// it returns all the data in the buffer and the error itself (often io.EOF).// ReadSlice fails with error ErrBufferFull if the buffer fills without a delim.// Because the data returned from ReadSlice will be overwritten// by the next I/O operation, most clients should use// ReadBytes or ReadString instead.// ReadSlice returns err != nil if and only if line does not end in delim.func (b *Reader) ReadSlice(delim byte) (line []byte, err error) {s := 0 // search start indexfor {// Search buffer.if i := bytes.IndexByte(b.buf[b.r+s:b.w], delim); i >= 0 {i += sline = b.buf[b.r : b.r+i+1]b.r += i + 1break}// Pending error?if b.err != nil {line = b.buf[b.r:b.w]b.r = b.werr = b.readErr()break}// Buffer full?if b.Buffered() >= len(b.buf) {b.r = b.wline = b.buferr = ErrBufferFullbreak}s = b.w - b.r // do not rescan area we scanned beforeb.fill() // buffer is not full}// Handle last byte, if any.if i := len(line) - 1; i >= 0 {b.lastByte = int(line[i])b.lastRuneSize = -1}return}
大体都很简单,没有很难理解的东西。
还发现一些小瑕疵,结构体 Reader 的 Buffered 方法返回 b.w - b.r 这个值,而 Size 返回 len(b.buf) 值,分别是缓存区中缓存内容的长度和缓存区长度。但是在其他方法中,涉及到 b.w - b.r 和 len(b.buf) 时,有的是直接调用 Buffered 和 Size 方法,有的又显式写出来,规范不统一,看起来很奇怪。比如上述代码块的第 32 行和 Peek 方法中的几行。
还有一点需要注意,在往 Writer 写入数据时,如果写入内容不超过 buffer 长度的部分是不会自动 Flush 的。所以说每次写都会有 len(data) % len(w.buf) 的数据保留在 buffer 中没有 Flush 需要调用者显式 Flush
scan.go
看了好一会都没看懂这个 Scanner 是干嘛的,然后自己写了个测试代码,一下子就搞懂了,果然实践出真知。下面给出 Scanner 结构体和 Scan 方法。
// src/bufio/scan.go ---- line 14// Scanner provides a convenient interface for reading data such as// a file of newline-delimited lines of text. Successive calls to// the Scan method will step through the 'tokens' of a file, skipping// the bytes between the tokens. The specification of a token is// defined by a split function of type SplitFunc; the default split// function breaks the input into lines with line termination stripped. Split// functions are defined in this package for scanning a file into// lines, bytes, UTF-8-encoded runes, and space-delimited words. The// client may instead provide a custom split function.//// Scanning stops unrecoverably at EOF, the first I/O error, or a token too// large to fit in the buffer. When a scan stops, the reader may have// advanced arbitrarily far past the last token. Programs that need more// control over error handling or large tokens, or must run sequential scans// on a reader, should use bufio.Reader instead.//type Scanner struct {r io.Reader // The reader provided by the client.split SplitFunc // The function to split the tokens.maxTokenSize int // Maximum size of a token; modified by tests.token []byte // Last token returned by split.buf []byte // Buffer used as argument to split.start int // First non-processed byte in buf.end int // End of data in buf.err error // Sticky error.empties int // Count of successive empty tokens.scanCalled bool // Scan has been called; buffer is in use.done bool // Scan has finished.}// src/bufio/scan.go ---- line 125// Scan advances the Scanner to the next token, which will then be// available through the Bytes or Text method. It returns false when the// scan stops, either by reaching the end of the input or an error.// After Scan returns false, the Err method will return any error that// occurred during scanning, except that if it was io.EOF, Err// will return nil.// Scan panics if the split function returns too many empty// tokens without advancing the input. This is a common error mode for// scanners.func (s *Scanner) Scan() bool {if s.done {return false}s.scanCalled = true// Loop until we have a token.for {// See if we can get a token with what we already have.// If we've run out of data but have an error, give the split function// a chance to recover any remaining, possibly empty token.if s.end > s.start || s.err != nil {advance, token, err := s.split(s.buf[s.start:s.end], s.err != nil)if err != nil {if err == ErrFinalToken {s.token = tokens.done = truereturn true}s.setErr(err)return false}if !s.advance(advance) {return false}s.token = tokenif token != nil {if s.err == nil || advance > 0 {s.empties = 0} else {// Returning tokens not advancing input at EOF.s.empties++if s.empties > maxConsecutiveEmptyReads {panic("bufio.Scan: too many empty tokens without progressing")}}return true}}// We cannot generate a token with what we are holding.// If we've already hit EOF or an I/O error, we are done.if s.err != nil {// Shut it down.s.start = 0s.end = 0return false}// Must read more data.// First, shift data to beginning of buffer if there's lots of empty space// or space is needed.if s.start > 0 && (s.end == len(s.buf) || s.start > len(s.buf)/2) {copy(s.buf, s.buf[s.start:s.end])s.end -= s.starts.start = 0}// Is the buffer full? If so, resize.if s.end == len(s.buf) {// Guarantee no overflow in the multiplication below.const maxInt = int(^uint(0) >> 1)if len(s.buf) >= s.maxTokenSize || len(s.buf) > maxInt/2 {s.setErr(ErrTooLong)return false}newSize := len(s.buf) * 2if newSize == 0 {newSize = startBufSize}if newSize > s.maxTokenSize {newSize = s.maxTokenSize}newBuf := make([]byte, newSize)copy(newBuf, s.buf[s.start:s.end])s.buf = newBufs.end -= s.starts.start = 0}// Finally we can read some input. Make sure we don't get stuck with// a misbehaving Reader. Officially we don't need to do this, but let's// be extra careful: Scanner is for safe, simple jobs.for loop := 0; ; {n, err := s.r.Read(s.buf[s.end:len(s.buf)])s.end += nif err != nil {s.setErr(err)break}if n > 0 {s.empties = 0break}loop++if loop > maxConsecutiveEmptyReads {s.setErr(io.ErrNoProgress)break}}}}
Scanner 支持自定义 split 方法。并且提供了内置的几个 split 方法分别是 ScanBytes, ScanRunes, ScanLines, ScanWords
需要注意的是 调用过 Scan 方法之后就不能再调用 Buffer 方法来改变缓存区了,否则会 panic
