bufio.go
实现了两个结构体,分别是带 buffer 的 Reader
和带 buffer 的 Writer
源码如下
// src/bufio/bufio.go ---- line 30
// Reader implements buffering for an io.Reader object.
type Reader struct {
buf []byte
rd io.Reader // reader provided by the client
r, w int // buf read and write positions
err error
lastByte int // last byte read for UnreadByte; -1 means invalid
lastRuneSize int // size of last rune read for UnreadRune; -1 means invalid
}
// src/bufio/bufio.go ---- line 536
// Writer implements buffering for an io.Writer object.
// If an error occurs writing to a Writer, no more data will be
// accepted and all subsequent writes, and Flush, will return the error.
// After all data has been written, the client should call the
// Flush method to guarantee all data has been forwarded to
// the underlying io.Writer.
type Writer struct {
err error
buf []byte
n int
wr io.Writer
}
比较有趣的是 Reader
的 ReadSlice
方法,可惜只接受单 byte
作为参数,我曾经实现过一个接受 []byte
的相同功能的函数,早知道这里有,就直接从这里改进了。要注意的是这个函数返回的字节切片指向的底层数组是和 Reader
中 buffer
指向的底层数组相同的,意味着会被覆盖,所以一般使用 ReadBytes
来代替。
// src/bufio/bufio.go ---- line 320
// ReadSlice reads until the first occurrence of delim in the input,
// returning a slice pointing at the bytes in the buffer.
// The bytes stop being valid at the next read.
// If ReadSlice encounters an error before finding a delimiter,
// it returns all the data in the buffer and the error itself (often io.EOF).
// ReadSlice fails with error ErrBufferFull if the buffer fills without a delim.
// Because the data returned from ReadSlice will be overwritten
// by the next I/O operation, most clients should use
// ReadBytes or ReadString instead.
// ReadSlice returns err != nil if and only if line does not end in delim.
func (b *Reader) ReadSlice(delim byte) (line []byte, err error) {
s := 0 // search start index
for {
// Search buffer.
if i := bytes.IndexByte(b.buf[b.r+s:b.w], delim); i >= 0 {
i += s
line = b.buf[b.r : b.r+i+1]
b.r += i + 1
break
}
// Pending error?
if b.err != nil {
line = b.buf[b.r:b.w]
b.r = b.w
err = b.readErr()
break
}
// Buffer full?
if b.Buffered() >= len(b.buf) {
b.r = b.w
line = b.buf
err = ErrBufferFull
break
}
s = b.w - b.r // do not rescan area we scanned before
b.fill() // buffer is not full
}
// Handle last byte, if any.
if i := len(line) - 1; i >= 0 {
b.lastByte = int(line[i])
b.lastRuneSize = -1
}
return
}
大体都很简单,没有很难理解的东西。
还发现一些小瑕疵,结构体 Reader
的 Buffered
方法返回 b.w - b.r
这个值,而 Size
返回 len(b.buf)
值,分别是缓存区中缓存内容的长度和缓存区长度。但是在其他方法中,涉及到 b.w - b.r
和 len(b.buf)
时,有的是直接调用 Buffered
和 Size
方法,有的又显式写出来,规范不统一,看起来很奇怪。比如上述代码块的第 32 行和 Peek
方法中的几行。
还有一点需要注意,在往 Writer
写入数据时,如果写入内容不超过 buffer
长度的部分是不会自动 Flush
的。所以说每次写都会有 len(data) % len(w.buf)
的数据保留在 buffer
中没有 Flush
需要调用者显式 Flush
scan.go
看了好一会都没看懂这个 Scanner
是干嘛的,然后自己写了个测试代码,一下子就搞懂了,果然实践出真知。下面给出 Scanner
结构体和 Scan
方法。
// src/bufio/scan.go ---- line 14
// Scanner provides a convenient interface for reading data such as
// a file of newline-delimited lines of text. Successive calls to
// the Scan method will step through the 'tokens' of a file, skipping
// the bytes between the tokens. The specification of a token is
// defined by a split function of type SplitFunc; the default split
// function breaks the input into lines with line termination stripped. Split
// functions are defined in this package for scanning a file into
// lines, bytes, UTF-8-encoded runes, and space-delimited words. The
// client may instead provide a custom split function.
//
// Scanning stops unrecoverably at EOF, the first I/O error, or a token too
// large to fit in the buffer. When a scan stops, the reader may have
// advanced arbitrarily far past the last token. Programs that need more
// control over error handling or large tokens, or must run sequential scans
// on a reader, should use bufio.Reader instead.
//
type Scanner struct {
r io.Reader // The reader provided by the client.
split SplitFunc // The function to split the tokens.
maxTokenSize int // Maximum size of a token; modified by tests.
token []byte // Last token returned by split.
buf []byte // Buffer used as argument to split.
start int // First non-processed byte in buf.
end int // End of data in buf.
err error // Sticky error.
empties int // Count of successive empty tokens.
scanCalled bool // Scan has been called; buffer is in use.
done bool // Scan has finished.
}
// src/bufio/scan.go ---- line 125
// Scan advances the Scanner to the next token, which will then be
// available through the Bytes or Text method. It returns false when the
// scan stops, either by reaching the end of the input or an error.
// After Scan returns false, the Err method will return any error that
// occurred during scanning, except that if it was io.EOF, Err
// will return nil.
// Scan panics if the split function returns too many empty
// tokens without advancing the input. This is a common error mode for
// scanners.
func (s *Scanner) Scan() bool {
if s.done {
return false
}
s.scanCalled = true
// Loop until we have a token.
for {
// See if we can get a token with what we already have.
// If we've run out of data but have an error, give the split function
// a chance to recover any remaining, possibly empty token.
if s.end > s.start || s.err != nil {
advance, token, err := s.split(s.buf[s.start:s.end], s.err != nil)
if err != nil {
if err == ErrFinalToken {
s.token = token
s.done = true
return true
}
s.setErr(err)
return false
}
if !s.advance(advance) {
return false
}
s.token = token
if token != nil {
if s.err == nil || advance > 0 {
s.empties = 0
} else {
// Returning tokens not advancing input at EOF.
s.empties++
if s.empties > maxConsecutiveEmptyReads {
panic("bufio.Scan: too many empty tokens without progressing")
}
}
return true
}
}
// We cannot generate a token with what we are holding.
// If we've already hit EOF or an I/O error, we are done.
if s.err != nil {
// Shut it down.
s.start = 0
s.end = 0
return false
}
// Must read more data.
// First, shift data to beginning of buffer if there's lots of empty space
// or space is needed.
if s.start > 0 && (s.end == len(s.buf) || s.start > len(s.buf)/2) {
copy(s.buf, s.buf[s.start:s.end])
s.end -= s.start
s.start = 0
}
// Is the buffer full? If so, resize.
if s.end == len(s.buf) {
// Guarantee no overflow in the multiplication below.
const maxInt = int(^uint(0) >> 1)
if len(s.buf) >= s.maxTokenSize || len(s.buf) > maxInt/2 {
s.setErr(ErrTooLong)
return false
}
newSize := len(s.buf) * 2
if newSize == 0 {
newSize = startBufSize
}
if newSize > s.maxTokenSize {
newSize = s.maxTokenSize
}
newBuf := make([]byte, newSize)
copy(newBuf, s.buf[s.start:s.end])
s.buf = newBuf
s.end -= s.start
s.start = 0
}
// Finally we can read some input. Make sure we don't get stuck with
// a misbehaving Reader. Officially we don't need to do this, but let's
// be extra careful: Scanner is for safe, simple jobs.
for loop := 0; ; {
n, err := s.r.Read(s.buf[s.end:len(s.buf)])
s.end += n
if err != nil {
s.setErr(err)
break
}
if n > 0 {
s.empties = 0
break
}
loop++
if loop > maxConsecutiveEmptyReads {
s.setErr(io.ErrNoProgress)
break
}
}
}
}
Scanner
支持自定义 split
方法。并且提供了内置的几个 split
方法分别是 ScanBytes, ScanRunes, ScanLines, ScanWords
需要注意的是 调用过 Scan
方法之后就不能再调用 Buffer
方法来改变缓存区了,否则会 panic