Performance Analysis [golang] string of splicing

This article by me 100% (Haoxiang Ma) original, For reprint please indicate the source.

This article was written in 2019/02/16, based Go 1.11.
As for the other versions of the Go SDK, if different please refer to additional information on their own.

Overview

As of this writing motivation comes from Golang Chinese community have a head no tail post "Go language strings and efficient splicing" , which is mentioned only in several ways Golang inside of string concatenation, but in the end would not speak to each mode performance, nor give any best practice. In the boring + curiosity, we decided to write their own benchmark test, then the results were analyzed and the source code, trying to give my view of best practice it.

Performance Testing

According to the contents of the post, in Golang, there are five kinds of string concatenation way:

  • + No direct splicing


    func (strs []string) string {
    s := ""
    for _, str := range strs {
    s += str
    }
    return s
    }
  • fmt.Sprint () splicing

    // fmt拼接
    func ConcatWithFmt(strs []string) string {
    s := fmt.Sprint(strs)
    return s
    }
  • strings.Join () splicing

    // strings.Join拼接
    func ConcatWithJoin(strs []string) string {
    s := strings.Join(strs, "")
    return s
    }
  • Buffer splicing

    // bytes.Buffer拼接
    func ConcatWithBuffer(strs []string) string {
    buf := bytes.Buffer{}
    for _, str := range strs {
    buf.WriteString(str)
    }
    return buf.String()
    }
  • Builder splicing

    // strings.Builder拼接
    func ConcatWithBuilder(strs []string) string {
    builder := strings.Builder{}
    for _, str := range strs {
    builder.WriteString(str)
    }
    return builder.String()
    }

To test their performance, use Golangcomes benchmark test module for testing.

In the test, data is divided into three groups, test groups 5, i.e., a total of 3 * 5 = 15 independent tests. 3 wherein the set of data refers to:

  • size = 10K array of strings, each of the elements are"hello"
  • size = 50K array of strings, each of the elements are"hello"
  • size = 100K array of strings, each of the elements are"hello"

5 test group means:

  • Directly spliced ​​+ sign, run data 10K, 50K, 100K of
  • fmt.Sprint () stitching, run data 10K, 50K, 100K of
  • strings.Join () stitching, run data 10K, 50K, 100K of
  • Buffer stitching, run data 10K, 50K, 100K's
  • Builder stitching, run data 10K, 50K, 100K's

Benchmark code is as follows:

package main

import (
"os"
"testing"
)

var (
Strs10K [] String // length of the string array 10K
Strs50K [] String // length of the string array 50K
Strs100K [] String // length of the string array 100K
Word = "Hello" // be spliced string
)

const (
ADD = iota
BUFFER
BUILDER
JOIN
FMT

_10K = 10000
_50K = 50000
_100K = 100000
)

// preset和teardown
func TestMain(m *testing.M) {
Strs10K = make([]string, 0, _10K)
Strs50K = make([]string, 0, _50K)
Strs100K = make([]string, 0, _100K)

for i := 0;i < _100K;i++ {
if (i < _10K) {
Strs10K = append(Strs10K, word)
Strs50K = append(Strs50K, word)
} else if (i < _50K) {
Strs50K = append(Strs50K, word)
}
Strs100K = append(Strs100K, word)
}

exitCode := m.Run()
os.Exit(exitCode)
}

// 测试直接+号拼接
func BenchmarkConcatWithAdd(b *testing.B) {
b.Run("Concat-10000", GetTestConcat(Strs10K, ADD))
b.Run("Concat-50000", GetTestConcat(Strs50K, ADD))
b.Run("Concat-100000", GetTestConcat(Strs100K, ADD))
}

// 测试bytes.Buffer拼接
func BenchmarkConcatWithBuffer(b *testing.B) {
b.Run("Concat-10000", GetTestConcat(Strs10K, BUFFER))
b.Run("Concat-50000", GetTestConcat(Strs50K, BUFFER))
b.Run("Concat-100000", GetTestConcat(Strs100K, BUFFER))
}

// 测试strings.Builder拼接
func BenchmarkConcatWithBuilder(b *testing.B) {
b.Run("Concat-10000", GetTestConcat(Strs10K, BUILDER))
b.Run("Concat-50000", GetTestConcat(Strs50K, BUILDER))
b.Run("Concat-100000", GetTestConcat(Strs100K, BUILDER))
}

// 测试strings.Join拼接
func BenchmarkConcatWithJoin(b *testing.B) {
b.Run("Concat-10000", GetTestConcat(Strs10K, JOIN))
b.Run("Concat-50000", GetTestConcat(Strs50K, JOIN))
b.Run("Concat-100000", GetTestConcat(Strs100K, JOIN))
}

// 测试fmt拼接
func BenchmarkConcatWithFmt(b *testing.B) {
b.Run("Concat-10000", GetTestConcat(Strs10K, FMT))
b.Run("Concat-50000", GetTestConcat(Strs50K, FMT))
b.Run("Concat-100000", GetTestConcat(Strs100K, FMT))
}

// The splicing type (testType), returns a corresponding test method
FUNC GetTestConcat (STRs [] String , TestType int ) FUNC (B * testing.B) {
concatFunc: = FUNC ([] String ) String { return "" } Switch {TestType Case the ADD: concatFunc = ConcatWithAdd Case the BUFFER: concatFunc = ConcatWithBuffer Case BUILDER: concatFunc = ConcatWithBuilder Case the JOIN: concatFunc = ConcatWithJoin Case the FMT: concatFunc = ConcatWithFmt











}

return func(b *testing.B) {
for i := 0;i < b.N;i++ {
concatFunc(strs)
}
}
}

After testing go test -bench=. -benchmem( ), as follows:

......
4 BenchmarkConcatWithAdd/Concat-10000-4 20 57050217 ns/op 270493320 B/op 9999 allocs/op
5 BenchmarkConcatWithAdd/Concat-50000-4 2 937660008 ns/op 6435464656 B/op 49999 allocs/op
6 BenchmarkConcatWithAdd/Concat-100000-4 1 3748714961 ns/op 25388918224 B/op 99999 allocs/op
7 BenchmarkConcatWithBuffer/Concat-10000-4 10000 138797 ns/op 209376 B/op 12 allocs/op
8 BenchmarkConcatWithBuffer/Concat-50000-4 3000 481466 ns/op 840160 B/op 14 allocs/op
9 BenchmarkConcatWithBuffer/Concat-100000-4 2000 966963 ns/op 1659360 B/op 15 allocs/op
10 BenchmarkConcatWithBuilder/Concat-10000-4 10000 103924 ns/op 227320 B/op 21 allocs/op
11 BenchmarkConcatWithBuilder/Concat-50000-4 3000 495917 ns/op 1431545 B/op 28 allocs/op
12 BenchmarkConcatWithBuilder/Concat-100000-4 2000 891950 ns/op 2930682 B/op 31 allocs/op
大专栏  [Golang]字符串拼接方式的性能分析/> 13 BenchmarkConcatWithJoin/Concat-10000-4 10000 106288 ns/op 114688 B/op 2 allocs/op
14 BenchmarkConcatWithJoin/Concat-50000-4 3000 505209 ns/op 507904 B/op 2 allocs/op
15 BenchmarkConcatWithJoin/Concat-100000-4 2000 990317 ns/op 1015808 B/op 2 allocs/op
16 BenchmarkConcatWithFmt/Concat-10000-4 1000 1293589 ns/op 227716 B/op 10002 allocs/op
17 BenchmarkConcatWithFmt/Concat-50000-4 200 6260637 ns/op 1131960 B/op 50003 allocs/op
18 BenchmarkConcatWithFmt/Concat-100000-4 100 12005780 ns/op 2499702 B/op 100006 allocs/op
......

可以看出

  • 运行速度上,Builder、Buffer、Join的速度属于同一数量级,绝对值也差不了太多;fmt要比它们一个数量级;直接+号拼接是最慢的。
  • 内存分配上,Join表现最优秀,Buffer次之,Builder第三;而fmt和直接+号拼接最差,要执行很多次内存分配操作。

源码分析

  • 速度&内存分配都很优秀的strings.Join()

    func Join(a []string, sep string) string {
    // 专门为短数组拼接做的优化
    // 详情查阅golang.org/issue/6714
    switch len(a) {
    case 0:
    return ""
    case 1:
    return a[0]
    case 2:
    return a[0] + sep + a[1]
    case 3:
    return a[0] + sep + a[1] + sep + a[2]
    }

    // 计算总共要插入多长的分隔符,n = 分隔符总长
    n := len(sep) * (len(a) - 1)

    // 遍历待拼接的数组,逐个叠加字符串的长度
    // 最后n = 分隔符总长 + 所有字符串的总长 = 拼接结果的总长
    for i := 0; i < len(a); i++ {
    n += len(a[i])
    }

    // 一次性分配n byte的内存空间,并且把第一个字符串拷贝到slice的头部
    b := make([]byte, n)
    bp := copy(b, a[0])

    // 从下标为1开始,调用原生的copy函数
    // 逐个把分隔符&字符串拷贝到slice里对应的位置
    for _, s := range a[1:] {
    bp += copy(b[bp:], sep)
    bp += copy(b[bp:], s)
    }

    // 最后将byte slice强转为string,返回
    return string(b)
    }

    可以看出strings.Join()为什么表现如此优秀,主要原因是只有1次的显式内存分配(b := make([]byte, n))和1次隐式内存分配(return string(b),不需要在拼接过程中反复多次分配内存,挪动内存里的数据,减少了很多内存管理的消耗。

  • 略差一筹的bytes.Buffer.WriteString()

    // 尝试扩容n个单位
    func (b *Buffer) tryGrowByReslice(n int) (int, bool) {
    // 如果底层slice的剩余空间 >= n个单位,就不需要重新分配内存
    // 而是reslice,把底层slice的cap限定在l + n
    if l := len(b.buf); n <= cap(b.buf)-l {
    b.buf = b.buf[:l+n]
    return l, true
    }

    // 如果底层slice的剩余空间不足n个单位,放弃reslice
    // 说明需要重新分配内存,而不是reslice那么简单了
    return 0, false
    }

    // 扩容n个单位
    func (b *Buffer) grow(n int) int {
    m := b.Len()

    // 边界情况,空slice,先把一些属性reset掉
    if m == 0 && b.off != 0 {
    b.Reset()
    }

    // 先试试不真正分配空间,通过reslice来“扩容”
    if i, ok := b.tryGrowByReslice(n); ok {
    return i
    }

    // bootstrap是一个长度为64的slice,在buffer对象初始化时,
    // bootstrap就已经分配好了,如果n小于bootstrap长度,
    // 可以利用bootstrap slice来reslice,不需要重新分配内存空间
    if b.buf == nil && n <= len(b.bootstrap) {
    b.buf = b.bootstrap[:n]
    return 0
    }

    // 上述几种情况都无法满足
    c := cap(b.buf)
    if n <= c/2-m {
    // 理解为m + n <= c/2比较好
    // 如果扩容后的长度(m + n)比c/2要小,说明当前还有一大堆可用的空间
    // 直接reslice,以b.off打头
    copy(b.buf, b.buf[b.off:])
    } else if c > maxInt-c-n {
    // c + c + n > maxInt,申请扩容n个单位太多了,不可接受
    panic(ErrTooLarge)
    } else {
    // 当前剩余的空间不太够了,重新分配内存,长度为c + c + n
    buf := makeSlice(2*c + n)
    copy(buf, b.buf[b.off:])
    b.buf = buf
    }
    // Restore b.off and len(b.buf).
    b.off = 0
    b.buf = b.buf[:m+n]
    return m
    }

    // 拼接的方法
    func (b *Buffer) WriteString(s string) (n int, err error) {
    b.lastRead = opInvalid

    // 先尝试reslice得到len(s)个单位的空间
    m, ok := b.tryGrowByReslice(len(s))
    if !ok {
    // 无法通过reslice得到空间,直接粗暴地申请grow
    m = b.grow(len(s))
    }
    return copy(b.buf[m:], s), nil
    }

    为什么bytes.Buffer.WriteString()性能比Join差呢,其实也是内存分配策略惹的祸。在Join里只有两次内存空间申请的操作,而Buffer里可能会有很多次。具体来说就是buf := makeSlice(2*c + n)这一句,每次重申请只申请2 * c + n的空间,用完了就要再申请2 * c + n。当拼接的数据项很多,每次申请的空间也就2 * c + n,很快就用完了,又要再重新申请,所以造成了性能不是很高。

  • 略差一筹的strings.Builder()

    func (b *Builder) WriteString(s string) (int, error) {
    b.copyCheck()
    b.buf = append(b.buf, s...)
    return len(s), nil
    }

    代码很简洁,就是最直白的slice append,一时append一时爽,一直append一直爽。所以当底层slice的可用空间不足,就会在append里一直申请新的内存空间。跟bytes.Buffer不同的是,这里并没有自己管理“扩容”的逻辑,而是交由原生的append函数去管理。

  • 最差劲的fmt.Sprint()

    type buffer []byte

    type pp struct {
    buf buffer
    ......
    }

    func Sprint(a ...interface{}) string {
    p := newPrinter()
    p.doPrint(a)
    s := string(p.buf)
    p.free()
    return s
    }

    printerIn the core data structure it is buf, and bufis actually a []byte, so to bufkeep concatenate strings, not enough space and continue to develop new memory space, so the poor performance.

to sum up

In fact, only when the stitching string very, very much time, only need to tangle performance. Like this article frequently stitching 10K, the situation 50K, 100K string in the actual business should be very little of.

If you really want to tangle performance, refer to the following

  • Join the best speed, but will not after blasting Builder and Buffer. Three belong to the same order of magnitude speed. fmt and direct the + stitching slowest.
  • Join the best memory allocation strategy, a minimum number of memory allocation; Builder and Buffer memory allocation strategy is OK, similar to the linear growth; fmt and direct memory allocation strategy + number splicing worst.

Guess you like

Origin www.cnblogs.com/lijianming180/p/12099473.html