Go source code learning: bufio package-1.1-bufio.go-(2)

bufio package official documentation

Go source code learning-index directory

Previous article on bufio package learning: Go source code learning: bufio package-1.1-bufio.go-(1)

7. Read: read data

// Read 读取数据到 p 中。
// 它返回读取到 p 中的字节数。
// 这些字节最多来自底层 [Reader] 的一次读取,
// 因此 n 可能小于 len(p)。
// 要精确读取 len(p) 字节,请使用 io.ReadFull(b, p)。
// 如果底层 [Reader] 可以在 io.EOF 时返回非零计数,
// 则此 Read 方法也可以这样做;参见 [io.Reader] 文档。
func (b *Reader) Read(p []byte) (n int, err error) {
    
    
	n = len(p)
	if n == 0 {
    
    
		if b.Buffered() > 0 {
    
    
			return 0, nil
		}
		return 0, b.readErr()
	}
	if b.r == b.w {
    
    
		if b.err != nil {
    
    
			return 0, b.readErr()
		}
		if len(p) >= len(b.buf) {
    
    
			// Large read, empty buffer.
			// Read directly into p to avoid copy.
			n, b.err = b.rd.Read(p)
			if n < 0 {
    
    
				panic(errNegativeRead)
			}
			if n > 0 {
    
    
				b.lastByte = int(p[n-1])
				b.lastRuneSize = -1
			}
			return n, b.readErr()
		}
		// One read.
		// Do not use b.fill, which will loop.
		b.r = 0
		b.w = 0
		n, b.err = b.rd.Read(b.buf)
		if n < 0 {
    
    
			panic(errNegativeRead)
		}
		if n == 0 {
    
    
			return 0, b.readErr()
		}
		b.w += n
	}

	// copy as much as we can
	// Note: if the slice panics here, it is probably because
	// the underlying reader returned a bad count. See issue 49795.
	n = copy(p, b.buf[b.r:b.w])
	b.r += n
	b.lastByte = int(b.buf[b.r-1])
	b.lastRuneSize = -1
	return n, nil
}

explain:

  • ReadThe method is Readera method of the struct and has no explicit receiver, but it uses bthe fields of the struct.
  • This method is used to read data into a slice p.
  • The return value nrepresents pthe number of bytes read into .
  • If nis zero and there is data in the buffer, zero is returned directly, indicating that zero bytes were read successfully.
  • If there is no data in the buffer, determine whether Readernew data needs to be read from the underlying layer.
  • If the read pointer of the buffer b.ris equal to the write pointer b.w, it means that the buffer is empty and Readerdata needs to be read from the bottom layer.
    • If an error exists b.err, a read error is returned.
    • If pthe length of the slice is greater than or equal to the length of the buffer len(b.buf), read directly pto avoid intermediate copying.
    • Otherwise, perform a read operation without using b.fillto avoid circular calls.
  • If there is data in the buffer, use copythe function to copy as much data as possible from the buffer into the slice p.
  • Update the read pointer b.rto record the size of the last byte and last rune.
  • Returns the actual number of bytes read n.

effect:

  • ReadMethod is used to read data Readerfrom into the specified slice.
  • It will read data from the buffer first, and if the buffer is empty, it will Readerread data from the underlying.
  • This method returns the number of bytes read and any errors that may have occurred.

8. ReadByte: Read a single byte

// ReadByte 读取并返回一个字节。
// 如果没有可用的字节,则返回错误。
func (b *Reader) ReadByte() (byte, error) {
    
    
    b.lastRuneSize = -1
    for b.r == b.w {
    
    
        if b.err != nil {
    
    
            return 0, b.readErr()
        }
        b.fill() // 缓冲区为空时填充
    }
    c := b.buf[b.r]
    b.r++
    b.lastByte = int(c)
    return c, nil
}

explain:

  • ReadByteIs Readera method of a structure that reads and returns a byte.
  • If there are no bytes available in the buffer, fillthe method is called to fill it.
  • The loop checks b.rwhether the read pointer is equal to the write pointer b.w. If equal, the buffer is empty and needs to be filled.
  • If an error exists b.err, a read error is returned.
  • Reads a byte from the buffer cand b.rmoves the read pointer forward.
  • Records the integer value of the last byte and returns the bytes read and nilan error.

effect:

  • ReadByteMethod is used to read a byte from the buffer.
  • If the buffer is empty, it will be filled first.
  • Returns the bytes read and possible errors.

9. UnreadByte: Undo the most recently read byte

// UnreadByte 撤销最近读取的字节。只有最近读取的字节可以被撤销。
//
// 如果最近调用 [Reader] 上的方法不是读取操作,则 UnreadByte 会返回错误。
// 特别地,[Reader.Peek]、[Reader.Discard] 和 [Reader.WriteTo] 不被认为是读取操作。
func (b *Reader) UnreadByte() error {
    
    
    // 如果最近的字节值为负值(表示未读取),或者读指针为0但写指针大于0(表示有未读取的数据),返回无效的撤销字节错误。
    if b.lastByte < 0 || b.r == 0 && b.w > 0 {
    
    
        return ErrInvalidUnreadByte
    }
    // 如果读指针大于0,表示有已读取的数据,将读指针前移一位。
    if b.r > 0 {
    
    
        b.r--
    } else {
    
    
        // 如果读指针为0且写指针为0,表示缓冲区为空,将写指针设置为1。
        b.w = 1
    }
    // 将最近读取的字节值写回缓冲区的读指针位置。
    b.buf[b.r] = byte(b.lastByte)
    // 重置最近读取的字节和最近 rune 大小的状态。
    b.lastByte = -1
    b.lastRuneSize = -1
    return nil
}

explain:

  • UnreadByteMethod is Readera method of the structure used to undo the most recently read byte.
  • First, check whether the most recent byte value is negative (indicating that it was not read), or the read pointer is 0 but the write pointer is greater than 0 (indicating that there is unread data), if so, an invalid undo byte error is returned .
  • If the read pointer is greater than 0, it means that there is data that has been read, and the read pointer is moved forward by one bit; otherwise, if the read pointer is 0 and the write pointer is 0, it means that the buffer is empty, and the write pointer is set to 1.
  • Writes the most recently read byte value back to the buffer at the read pointer location.
  • Finally, reset the status of the most recently read bytes and most recent rune size, and return nil on success.

effect:

  • UnreadByteThe method allows undoing the most recently successfully read byte, moving the read pointer forward so that the byte becomes available again.
  • This method is useful when you need to roll back a byte, such as when parsing in some parsers and you need to roll back to a previous state.

10. ReadRune: Read a single UTF-8 encoded Unicode character

// ReadRune 读取单个UTF-8编码的Unicode字符,并返回该字符及其字节大小。
// 如果编码的字符无效,它会消耗一个字节并返回unicode.ReplacementChar(U+FFFD)和大小为1。
func (b *Reader) ReadRune() (r rune, size int, err error) {
    
    
    // 循环直到缓冲区中有足够的数据以确保完整的rune被读取,
    // 或者达到缓冲区末尾,或者遇到错误。
    for b.r+utf8.UTFMax > b.w && !utf8.FullRune(b.buf[b.r:b.w]) && b.err == nil && b.w-b.r < len(b.buf) {
    
    
        b.fill() // b.w-b.r < len(buf) => buffer is not full
    }
    // 重置最近rune的大小。
    b.lastRuneSize = -1
    // 如果读取指针达到写指针,表示缓冲区为空,返回读取错误。
    if b.r == b.w {
    
    
        return 0, 0, b.readErr()
    }
    // 读取一个rune,并设置默认大小为1。
    r, size = rune(b.buf[b.r]), 1
    // 如果rune的值大于或等于utf8.RuneSelf,则表示可能占用多个字节,使用utf8.DecodeRune进行解码。
    if r >= utf8.RuneSelf {
    
    
        r, size = utf8.DecodeRune(b.buf[b.r:b.w])
    }
    // 更新读指针和记录最近读取的字节和rune的信息。
    b.r += size
    b.lastByte = int(b.buf[b.r-1])
    b.lastRuneSize = size
    return r, size, nil
}

explain:

  • ReadRuneMethod is Readera method of the structure used to read a single UTF-8 encoded Unicode character.
  • First, make sure there is enough data in the buffer by looping to ensure the complete rune is read.
  • If the buffer is empty, a read error is returned.
  • Read a rune and set the default size to 1.
  • If the value of rune is greater than or equal to utf8.RuneSelf, it means that it may occupy multiple bytes and should be utf8.DecodeRunedecoded.
  • Update the read pointer and record the most recently read bytes and rune information.
  • Returns the read rune, size, and nil to indicate no error.

effect:

  • ReadRuneMethod is used to read a UTF-8 encoded Unicode character from the buffer.
  • If there is not enough data in the buffer to ensure that the complete rune is read, filla method is called to fill the buffer.
  • The method also handles possible error conditions, such as an empty buffer or invalid encoded characters.

11. UnreadRune: Undo the last read rune

// UnreadRune 撤销最后读取的rune。如果最近在[Reader]上调用的方法不是[Reader.ReadRune],
// [Reader.UnreadRune]会返回错误(在这方面它比[Reader.UnreadByte]更严格,后者将从任何读取操作中撤销最后一个字节)。
func (b *Reader) UnreadRune() error {
    
    
    // 如果最近的rune的大小小于零或者读指针小于rune的大小,返回无效的rune撤销错误。
    if b.lastRuneSize < 0 || b.r < b.lastRuneSize {
    
    
        return ErrInvalidUnreadRune
    }
    // 撤销rune,更新读指针和记录最近读取的字节和rune的信息。
    b.r -= b.lastRuneSize
    b.lastByte = -1
    b.lastRuneSize = -1
    return nil
}

explain:

  • UnreadRuneThe method is Readera method of the structure and is used to undo the last read rune.
  • First, check whether the size of the most recent rune is less than zero or whether the read pointer is less than the size of the rune. If so, return an invalid rune undo error.
  • Then, cancel the rune, update the read pointer and record the most recently read bytes and rune information.
  • Return nil to indicate no error.

effect:

  • UnreadRuneMethod used to undo the last read rune without changing the read status.
  • It checks whether the conditions are met for undoing, and returns an error if the conditions are not met.
  • The undo operation will update the read pointer and record the most recently read bytes and rune information.

12. ReadSlice: Read the slice up to the delimiter

// Buffered 返回当前缓冲区中可读取的字节数。
func (b *Reader) Buffered() int {
    
     return b.w - b.r }

// ReadSlice 读取直到输入中第一个定界符的位置,返回一个指向缓冲区中字节的切片。
// 这些字节在下一次读取时将不再有效。
// 如果 ReadSlice 在找到定界符之前遇到错误,
// 它将返回缓冲区中的所有数据和错误本身(通常是 io.EOF)。
// 如果缓冲区在没有定界符的情况下填满,ReadSlice 失败,返回错误 ErrBufferFull。
// 由于从 ReadSlice 返回的数据将被下一次 I/O 操作覆盖,
// 大多数客户端应该使用 [Reader.ReadBytes] 或 ReadString 来代替。
// 如果 line 不以定界符结束,ReadSlice 返回 err != nil。
func (b *Reader) ReadSlice(delim byte) (line []byte, err error) {
    
    
    s := 0 // 搜索开始索引
    for {
    
    
        // 在缓冲区中搜索定界符。
        if i := bytes.IndexByte(b.buf[b.r+s:b.w], delim); i >= 0 {
    
    
            i += s
            line = b.buf[b.r : b.r+i+1]
            b.r += i + 1
            break
        }

        // 有挂起的错误吗?
        if b.err != nil {
    
    
            line = b.buf[b.r:b.w]
            b.r = b.w
            err = b.readErr()
            break
        }

        // 缓冲区是否已满?
        if b.Buffered() >= len(b.buf) {
    
    
            b.r = b.w
            line = b.buf
            err = ErrBufferFull
            break
        }

        s = b.w - b.r // 不要重新扫描之前已经扫描过的区域

        b.fill() // 缓冲区未满时进行填充
    }

    // 处理最后一个字节(如果有的话)。
    if i := len(line) - 1; i >= 0 {
    
    
        b.lastByte = int(line[i])
        b.lastRuneSize = -1
    }

    return
}

explain:

  • ReadSliceMethod is Readera method of a struct that reads a slice from the buffer up to the specified delimiter.
  • First, initialize the search starting with index szero.
  • Use a loop to search the buffer for delimiters.
    • If the delimiter is found, a slice is constructed linepointing to the bytes in the buffer, and the read pointer is updated b.r.
    • If there is a pending error, all data in the buffer and the error itself are returned, and the read pointer is updated.
    • If the buffer fills up and no delimiter is found, the buffer and error ErrBufferFull are returned.
    • If none of the above conditions are met, fill the buffer and continue searching.
  • Process the last byte and update the last byte of the record and rune information.
  • Returns the slice read and possible errors.

effect:

  • ReadSliceMethod used to read slices from the buffer up to the specified delimiter.
  • It provides a convenient way to handle data containing delimiters.
  • Note that since the returned data will be overwritten by the next I/O operation, most clients should use [Reader.ReadBytes] or ReadString instead.

13. ReadLine: Read a line of data

This part of the code defines ReadLinemethods for low-level reading of a row of data. Most callers should use Reader.ReadBytes('\n')or Reader.ReadString('\n'), or Scanner.

// ReadLine 是一个低级别的读取一行数据的原语。大多数调用者应该使用
// [Reader.ReadBytes]('\n') 或 [Reader.ReadString]('\n'),或者使用 [Scanner]。
//
// ReadLine 试图返回一行数据,不包括行尾的字节。
// 如果行太长无法容纳在缓冲区中,则 isPrefix 被设置为 true,返回行的开头。
// 行的其余部分将在将来的调用中返回。当返回行的最后一部分时,isPrefix 将为 false。
// ReadLine 要么返回一个非空的行,要么返回一个错误,而不会同时返回两者。
//
// 从 ReadLine 返回的文本不包括行尾 ("\r\n" 或 "\n")。
// 如果输入在没有最终行尾的情况下结束,将不会给出指示或错误。
// 在调用 ReadLine 后调用 [Reader.UnreadByte] 将始终取消读取最后一个字节
// (可能是属于行尾的字符),即使该字节不是 ReadLine 返回的行的一部分。
func (b *Reader) ReadLine() (line []byte, isPrefix bool, err error) {
    
    
    line, err = b.ReadSlice('\n')
    if err == ErrBufferFull {
    
    
        // 处理 "\r\n" 横跨缓冲区的情况。
        if len(line) > 0 && line[len(line)-1] == '\r' {
    
    
            // 将 '\r' 放回缓冲区并从 line 中删除。
            // 让下一次调用 ReadLine 检查 "\r\n"。
            if b.r == 0 {
    
    
                // 不应该达到的地方
                panic("bufio: tried to rewind past start of buffer")
            }
            b.r--
            line = line[:len(line)-1]
        }
        return line, true, nil
    }

    if len(line) == 0 {
    
    
        if err != nil {
    
    
            line = nil
        }
        return
    }
    err = nil

    if line[len(line)-1] == '\n' {
    
    
        drop := 1
        if len(line) > 1 && line[len(line)-2] == '\r' {
    
    
            drop = 2
        }
        line = line[:len(line)-drop]
    }
    return
}

explain:

  • ReadLineThe method is Readera method of the structure, used to read a row of data.
  • First, call ReadSlice('\n')the method to read a row of data, and the returned linedoes not include the end of the line.
  • If the buffer is found to be full when reading, and the end of the line crosses the buffer, this situation is specially handled and will be \rput back into the buffer and deleted from lineit .
  • lineSet to if the line length read was zero and there was an error nil.
  • If the line data is successfully read and the last character of the line is \n, delete \nand possibly preceding them \r.
  • Returns the row data read line, whether it is the prefix of the row isPrefix, and possible errors err.

effect:

  • ReadLineMethod provides a low-level interface for reading a row of data.
  • ReadSliceLine reading is implemented through , and the situation where the end of the line crosses the buffer is handled .
  • The bytes returned lineexcluding the end of the line isPrefixindicate whether there is any remaining part that has not been read, errindicating whether an error occurred.

14. collectFragments: collect fragments

This part of the code defines collectFragmentsmethods that read the input until the first delimiter occurs delim. It returns a result tuple: (slice of full buffer, remaining bytes before delimiter, total bytes of first two elements, error).

The complete result is equivalent to bytes.Join(append(fullBuffers, finalFragment), nil), whose length is totalLen. To allow the caller to minimize allocations and copies, the results are structured this way.

func (b *Reader) collectFragments(delim byte) (fullBuffers [][]byte, finalFragment []byte, totalLen int, err error) {
    
    
    var frag []byte
    // 使用 ReadSlice 查找 delim,累积完整缓冲区。
    for {
    
    
        var e error
        frag, e = b.ReadSlice(delim)
        if e == nil {
    
     // 获取最后一个片段
            break
        }
        if e != ErrBufferFull {
    
     // 意外的错误
            err = e
            break
        }

        // 复制缓冲区。
        buf := bytes.Clone(frag)
        fullBuffers = append(fullBuffers, buf)
        totalLen += len(buf)
    }

    totalLen += len(frag)
    return fullBuffers, frag, totalLen, err
}

explain:

  • collectFragmentsMethods are Readermethods of a struct that read data from the input until the first delimiter occurs delim.
  • First, declare variables fragto hold the fragments read each time.
  • Use ReadSlicethe method to loop through the delimiter delimwhile accumulating the complete buffer.
    • If the separator is found successfully, it means that the last fragment is obtained and the loop is broken out.
    • If the error returned is not ErrBufferFull, it means that an unexpected error occurred, store the error in errand break out of the loop.
    • If the error returned is ErrBufferFull, indicating that the buffer is full, a copy of the buffer is added to fullBuffers, and the total number of bytes is accumulated.
  • Add the length of the last fragment fragto the total number of bytes.
  • Returns a slice of the full buffer fullBuffers, the last fragment frag, the total number of bytes of the first two elements totalLen, and possible errors err.

effect:

  • collectFragmentsMethod provides a mechanism to accumulate a complete buffer while reading data.
  • ReadSliceIt finds the delimiter by looping through the method delimand accumulates the complete buffer until the delimiter is found.
  • The returned results allow the caller to minimize memory allocation and data copying.

15. ReadBytes: Read bytes

This part of the code defines ReadBytesa method that reads the input until the first occurrence of the delimiter delimand returns a slice containing the data and delimiter. If an error is encountered before the delimiter is found, it will return the data read before the error and the error itself (usually io.EOF). ReadBytesOnly returned if the returned data does not end with a delimiter err != nil. For simple purposes, it may be more convenient to use Scanner.

func (b *Reader) ReadBytes(delim byte) ([]byte, error) {
    
    
    // 使用 collectFragments 读取数据直到分隔符。
    full, frag, n, err := b.collectFragments(delim)
    
    // 为了容纳完整的片段和碎片,分配新的缓冲区。
    buf := make([]byte, n)
    n = 0
    
    // 将完整的片段和碎片复制到新的缓冲区中。
    for i := range full {
    
    
        n += copy(buf[n:], full[i])
    }
    copy(buf[n:], frag)
    
    return buf, err
}

explain:

  • ReadBytesMethod is Readera method of a struct that reads the input until the first delimiter occurs delim.
  • Call collectFragmentsthe method to get a slice of the full buffer full, the last fragment frag, the total number of bytes of the first two elements n, and possible errors err.
  • To accommodate the complete fragment and the last fragment, a new buffer is allocated bufwith length n.
  • Use a loop to copy the complete fragment and the last fragment into a new buffer.
  • Returns a new buffer containing data, delimiters and bufpossible errors err.

effect:

  • ReadBytesMethod provides a way to read data up to a specific delimiter and return a slice including the delimiter.
  • collectFragmentsThe process of reading data from the input is implemented by calling the method.
  • For ease of use, the complete fragment and the last fragment are merged into a new buffer and returned.

16. ReadString: Read string

This part of the code defines ReadStringa method that reads the input until the first delimiter occurs delimand returns a string containing the data and delimiter. If an error is encountered before the delimiter is found, it will return the data read before the error and the error itself (usually io.EOF). ReadStringOnly returned if the returned data does not end with a delimiter err != nil. For simple purposes, it may be more convenient to use Scanner.

func (b *Reader) ReadString(delim byte) (string, error) {
    
    
    // 使用 collectFragments 读取数据直到分隔符。
    full, frag, n, err := b.collectFragments(delim)
    
    // 为了容纳完整的片段和碎片,分配新的字符串构建器。
    var buf strings.Builder
    buf.Grow(n)
    
    // 将完整的片段和碎片写入字符串构建器。
    for _, fb := range full {
    
    
        buf.Write(fb)
    }
    buf.Write(frag)
    
    return buf.String(), err
}

explain:

  • ReadStringMethod is Readera method of a struct that reads the input until the first delimiter occurs delim.
  • Call collectFragmentsthe method to get a slice of the full buffer full, the last fragment frag, the total number of bytes of the first two elements n, and possible errors err.
  • To accommodate the complete fragment and the last fragment, a new string builder is allocated buf, with sufficient space pre-allocated.
  • Use a loop to write the complete fragment and the last fragment into the string builder.
  • Returns a string containing data and delimiters and possible errors err.

effect:

  • ReadStringMethod provides a way to read data up to a specific delimiter and return a string including the delimiter.
  • collectFragmentsThe process of reading data from the input is implemented by calling the method.
  • For ease of use, the complete fragment and the last fragment are combined into a new string and returned.

Guess you like

Origin blog.csdn.net/weixin_49015143/article/details/135182305