Golang strings.Builder

Reprinted from: https://liudanking.com/performance/golang-strings-builder-%E5%8E%9F%E7%90%86%E8%A7%A3%E6%9E%90/

For personal backup only, please see the original text for browsing

 

table of Contents

Original string concatenation 

Before Golang 1.10, use bytes.Buffer to optimize

#### Use strings.Builder for string splicing

#### strings.Builder principle analysis

Stop here?


Original string concatenation 

package main

func main() {
    ss := []string{
        "A",
        "B",
        "C",
    }

    var str string
    for _, s := range ss {
        str += s
    }

    print(str)
}

Like many languages ​​that support the string type, the string type in golang is also read-only and immutable. Therefore, this way of concatenating strings will cause a lot of string creation, destruction, and memory allocation . If you concatenate a lot of strings, this is obviously not a correct posture.

Before Golang 1.10, use bytes.Bufferto optimize

package main

import (
    "bytes"
    "fmt"
)

func main() {
    ss := []string{
        "A",
        "B",
        "C",
    }

    var b bytes.Buffer
    for _, s := range ss {
        fmt.Fprint(&b, s)
    }

    print(b.String())
}

Here, the var b bytes.Buffer final spliced ​​string is used to  store, to a certain extent, str the problem of re-applying for a new memory space to store the intermediate string every time the splicing operation is performed above  .

But there is still a small problem:  b.String() there will be a  []byte -> string type conversion. And this operation will carry out a memory allocation and content copy.

#### Use strings.Builder for string splicing

package main

import (
    "fmt"
    "strings"
)

func main() {
    ss := []string{
        "A",
        "B",
        "C",
    }

    var b strings.Builder
    for _, s := range ss {
        fmt.Fprint(&b, s)
    }

    print(b.String())
}

Golang will officially be strings.Builderintroduced as a feature, and there must be two brushes. Do not believe it? Here is a simple benchmark:

package ts

import (
    "bytes"
    "fmt"
    "strings"
    "testing"
)

func BenchmarkBuffer(b *testing.B) {
    var buf bytes.Buffer
    for i := 0; i < b.N; i++ {
        fmt.Fprint(&buf, "?")
        _ = buf.String()
    }
}

func BenchmarkBuilder(b *testing.B) {
    var builder strings.Builder
    for i := 0; i < b.N; i++ {
        fmt.Fprint(&builder, "?")
        _ = builder.String()
    }
}
╰─➤  go test -bench=. -benchmem                                                                                                                         2 ↵
goos: darwin
goarch: amd64
pkg: test/ts
BenchmarkBuffer-4         300000        101086 ns/op      604155 B/op          1 allocs/op
BenchmarkBuilder-4      20000000            90.4 ns/op        21 B/op          0 allocs/op
PASS
ok      test/ts 32.308s

The performance improvement is touching. You must know that languages ​​with built-in GC such as C# and Java were introduced very early string builder, and Golang was only introduced in 1.10. The timing is not too early, but the huge improvement did not disappoint after all. Let's take a look at how the standard library does it.

#### strings.Builder principle analysis

strings.BuilderThe implementation of is in the file strings/builder.go , which has only 120 lines in total, which is very refined. The key codes are as follows:

type Builder struct {
    addr *Builder // of receiver, to detect copies by value
    buf  []byte // 1
}

// Write appends the contents of p to b's buffer.
// Write always returns len(p), nil.
func (b *Builder) Write(p []byte) (int, error) {
    b.copyCheck()
    b.buf = append(b.buf, p...) // 2
    return len(p), nil
}

// String returns the accumulated string.
func (b *Builder) String() string {
    return *(*string)(unsafe.Pointer(&b.buf))  // 3
}

func (b *Builder) copyCheck() {
    if b.addr == nil {
        // 4
        // This hack works around a failing of Go's escape analysis
        // that was causing b to escape and be heap allocated.
        // See issue 23382.
        // TODO: once issue 7921 is fixed, this should be reverted to
        // just "b.addr = b".
        b.addr = (*Builder)(noescape(unsafe.Pointer(b)))
    } else if b.addr != b {
        panic("strings: illegal use of non-zero Builder copied by value")
    }
}

  1. byte.BufferSimilar to the idea, since the string will be continuously destroyed and rebuilt during the construction process, try to avoid this problem and use one at the bottom  buf []byte to store the content of the string.
  2. For write operation, simply write byte to buf.
  3. In order to solve the bytes.Buffer.String()existing []byte -> stringtype conversion and memory copy problems, a unsafe.Pointerpointer-to-storage conversion operation is used here , which realizes the direct buf []byteconversion to the string type and avoids the problem of sufficient memory allocation.
  4. If we implement strings.Builder by ourselves, in most cases we will feel that we are done after completing the first 3 steps. But the standard library is one step closer. We know that Golang's stack does not require the attention of developers in most cases. If the work that can be done on the stack escapes to the heap, the performance will be greatly reduced. Therefore, copyCheck a line of hacky code was added to prevent buf from escaping to the heap. For this part, you can read further about Go's hidden #pragmas by Dave Cheney .

Stop here?

Generally, the methods used in the Golang standard library will be gradually promoted and become the best practices in certain scenarios.

The ones used here *(*string)(unsafe.Pointer(&b.buf))can also be used in other scenarios. For example: How to compare stringand []bytewhether they are equal without memory allocation? ? It seems that the foreshadowing is too obvious, everyone should be able to write it, so let's give the code directly:

func unsafeEqual(a string, b []byte) bool {
    bbp := *(*string)(unsafe.Pointer(&b))
    return a == bbp
}

Guess you like

Origin blog.csdn.net/chushoufengli/article/details/115051337