After I wiped the string into byte slices, the capacity of the slices turned out to be strange

From the public account: New World Grocery Store

Magical phenomenon

Slice, slice, slice again!

The previous article talked about slicing. The magical problem encountered today is still related to slicing. Let’s take a look at the following phenomena.

Phenomenon one

a := "abc"
bs := []byte(a)
fmt.Println(bs, len(bs), cap(bs))
// 输出: [97 98 99] 3 8

Phenomenon two

a := "abc"
bs := []byte(a)
fmt.Println(len(bs), cap(bs))
// 输出: 3 32

Phenomenon Three

bs := []byte("abc")
fmt.Println(len(bs), cap(bs))
// 输出: 3 3

Phenomenon Four

a := ""
bs := []byte(a)
fmt.Println(bs, len(bs), cap(bs))
// 输出: [] 0 0

Phenomenon Five

a := ""
bs := []byte(a)
fmt.Println(len(bs), cap(bs))
// 输出: 0 32

analysis

I'm already full of question marks here

String variable to slice

Turning a small string into slices, what happened inside, is so magical. At this time, I have to sacrifice the routine of the previous article , and see if 希望之后有机会能够对go的汇编语法进行简单的介绍there are any keywords in the assembly code ( ) that can help us

The following is the key part of the assembly code for phenomenon one conversion

"".main STEXT size=495 args=0x0 locals=0xd8
	0x0000 00000 (test.go:5)	TEXT	"".main(SB), ABIInternal, $216-0
	0x0000 00000 (test.go:5)	MOVQ	(TLS), CX
	0x0009 00009 (test.go:5)	LEAQ	-88(SP), AX
	0x000e 00014 (test.go:5)	CMPQ	AX, 16(CX)
	0x0012 00018 (test.go:5)	JLS	485
	0x0018 00024 (test.go:5)	SUBQ	$216, SP
	0x001f 00031 (test.go:5)	MOVQ	BP, 208(SP)
	0x0027 00039 (test.go:5)	LEAQ	208(SP), BP
	0x002f 00047 (test.go:5)	FUNCDATA	$0, gclocals·7be4bbacbfdb05fb3044e36c22b41e8b(SB)
	0x002f 00047 (test.go:5)	FUNCDATA	$1, gclocals·648d0b72bb9d7f59fbfdbee57a078eee(SB)
	0x002f 00047 (test.go:5)	FUNCDATA	$2, gclocals·2dfddcc7190380b1ae77e69d81f0a101(SB)
	0x002f 00047 (test.go:5)	FUNCDATA	$3, "".main.stkobj(SB)
	0x002f 00047 (test.go:6)	PCDATA	$0, $1
	0x002f 00047 (test.go:6)	PCDATA	$1, $0
	0x002f 00047 (test.go:6)	LEAQ	go.string."abc"(SB), AX
	0x0036 00054 (test.go:6)	MOVQ	AX, "".a+96(SP)
	0x003b 00059 (test.go:6)	MOVQ	$3, "".a+104(SP)
	0x0044 00068 (test.go:7)	MOVQ	$0, (SP)
	0x004c 00076 (test.go:7)	PCDATA	$0, $0
	0x004c 00076 (test.go:7)	MOVQ	AX, 8(SP)
	0x0051 00081 (test.go:7)	MOVQ	$3, 16(SP)
	0x005a 00090 (test.go:7)	CALL	runtime.stringtoslicebyte(SB)
	0x005f 00095 (test.go:7)	MOVQ	40(SP), AX
	0x0064 00100 (test.go:7)	MOVQ	32(SP), CX
	0x0069 00105 (test.go:7)	PCDATA	$0, $2
	0x0069 00105 (test.go:7)	MOVQ	24(SP), DX
	0x006e 00110 (test.go:7)	PCDATA	$0, $0
	0x006e 00110 (test.go:7)	PCDATA	$1, $1
	0x006e 00110 (test.go:7)	MOVQ	DX, "".bs+112(SP)
	0x0073 00115 (test.go:7)	MOVQ	CX, "".bs+120(SP)
	0x0078 00120 (test.go:7)	MOVQ	AX, "".bs+128(SP)

The following is the key part of the assembly code of the phenomenon two conversion

"".main STEXT size=393 args=0x0 locals=0xe0
	0x0000 00000 (test.go:5)	TEXT	"".main(SB), ABIInternal, $224-0
	0x0000 00000 (test.go:5)	MOVQ	(TLS), CX
	0x0009 00009 (test.go:5)	LEAQ	-96(SP), AX
	0x000e 00014 (test.go:5)	CMPQ	AX, 16(CX)
	0x0012 00018 (test.go:5)	JLS	383
	0x0018 00024 (test.go:5)	SUBQ	$224, SP
	0x001f 00031 (test.go:5)	MOVQ	BP, 216(SP)
	0x0027 00039 (test.go:5)	LEAQ	216(SP), BP
	0x002f 00047 (test.go:5)	FUNCDATA	$0, gclocals·0ce64bbc7cfa5ef04d41c861de81a3d7(SB)
	0x002f 00047 (test.go:5)	FUNCDATA	$1, gclocals·00590b99cfcd6d71bbbc6e05cb4f8bf8(SB)
	0x002f 00047 (test.go:5)	FUNCDATA	$2, gclocals·8dcadbff7c52509cfe2d26e4d7d24689(SB)
	0x002f 00047 (test.go:5)	FUNCDATA	$3, "".main.stkobj(SB)
	0x002f 00047 (test.go:6)	PCDATA	$0, $1
	0x002f 00047 (test.go:6)	PCDATA	$1, $0
	0x002f 00047 (test.go:6)	LEAQ	go.string."abc"(SB), AX
	0x0036 00054 (test.go:6)	MOVQ	AX, "".a+120(SP)
	0x003b 00059 (test.go:6)	MOVQ	$3, "".a+128(SP)
	0x0047 00071 (test.go:7)	PCDATA	$0, $2
	0x0047 00071 (test.go:7)	LEAQ	""..autotmp_5+64(SP), CX
	0x004c 00076 (test.go:7)	PCDATA	$0, $1
	0x004c 00076 (test.go:7)	MOVQ	CX, (SP)
	0x0050 00080 (test.go:7)	PCDATA	$0, $0
	0x0050 00080 (test.go:7)	MOVQ	AX, 8(SP)
	0x0055 00085 (test.go:7)	MOVQ	$3, 16(SP)
	0x005e 00094 (test.go:7)	CALL	runtime.stringtoslicebyte(SB)
	0x0063 00099 (test.go:7)	MOVQ	40(SP), AX
	0x0068 00104 (test.go:7)	MOVQ	32(SP), CX
	0x006d 00109 (test.go:7)	PCDATA	$0, $3
	0x006d 00109 (test.go:7)	MOVQ	24(SP), DX
	0x0072 00114 (test.go:7)	PCDATA	$0, $0
	0x0072 00114 (test.go:7)	PCDATA	$1, $1
	0x0072 00114 (test.go:7)	MOVQ	DX, "".bs+136(SP)
	0x007a 00122 (test.go:7)	MOVQ	CX, "".bs+144(SP)
	0x0082 00130 (test.go:7)	MOVQ	AX, "".bs+152(SP)

Before looking at the assembly code, let’s take a look at runtime.stringtoslicebytethe function signature

func stringtoslicebyte(buf *tmpBuf, s string) []byte

Up to this point, we can’t see more information by relying on keywords. We still need to understand the grammar of the assembly. The author lists a simple analysis here, and then we can still find more things by tricks.

// 现象一给runtime.stringtoslicebyte的传参
0x002f 00047 (test.go:6)	LEAQ	go.string."abc"(SB), AX // 将字符串"abc"放入寄存器AX
0x0036 00054 (test.go:6)	MOVQ	AX, "".a+96(SP) // 将AX中的内容存入变量a中
0x003b 00059 (test.go:6)	MOVQ	$3, "".a+104(SP) // 将字符串长度3存入变量a中
0x0044 00068 (test.go:7)	MOVQ	$0, (SP) // 将0 传递个runtime.stringtoslicebyte(SB)的第一个参数(笔者猜测对应go中的nil)
0x004c 00076 (test.go:7)	PCDATA	$0, $0 // 据说和gc有关, 具体还不清楚, 一般情况可以忽略
0x004c 00076 (test.go:7)	MOVQ	AX, 8(SP) // 将AX中的内容传递给runtime.stringtoslicebyte(SB)的第二个参数
0x0051 00081 (test.go:7)	MOVQ	$3, 16(SP) // 将字符串长度传递给runtime.stringtoslicebyte(SB)的第二个参数
0x005a 00090 (test.go:7)	CALL	runtime.stringtoslicebyte(SB) // 调用函数, 此行后面的几行代码是将返回值赋值给变量bs

// 现象二给runtime.stringtoslicebyte的传参
0x002f 00047 (test.go:6)	LEAQ	go.string."abc"(SB), AX // 将字符串"abc"放入寄存器AX
0x0036 00054 (test.go:6)	MOVQ	AX, "".a+120(SP) // 将AX中的内容存入变量a中
0x003b 00059 (test.go:6)	MOVQ	$3, "".a+128(SP) // 将字符串长度3存入变量a中
0x0047 00071 (test.go:7)	PCDATA	$0, $2
0x0047 00071 (test.go:7)	LEAQ	""..autotmp_5+64(SP), CX // 将内部变量autotmp_5放入寄存器CX
0x004c 00076 (test.go:7)	PCDATA	$0, $1
0x004c 00076 (test.go:7)	MOVQ	CX, (SP) // 将CX中的内容传递给runtime.stringtoslicebyte(SB)的第一个参数
0x0050 00080 (test.go:7)	PCDATA	$0, $0
0x0050 00080 (test.go:7)	MOVQ	AX, 8(SP) // 将AX中的内容传递给runtime.stringtoslicebyte(SB)的第二个参数
0x0055 00085 (test.go:7)	MOVQ	$3, 16(SP) // 将字符串长度传递给runtime.stringtoslicebyte(SB)的第二个参数
0x005e 00094 (test.go:7)	CALL	runtime.stringtoslicebyte(SB)

Through the analysis of the above assembly code, we can know that the difference between phenomenon one and phenomenon two is that runtime.stringtoslicebytethe first parameter passed to it is different. Through the stringtoslicebyteanalysis of the functions in the runtime package, whether the first parameter has a value and the length of the string will affect the branch of code execution, thereby generating different slices, so it is common sense that the capacity is different, let's look at the source code below

func stringtoslicebyte(buf *tmpBuf, s string) []byte {
    
    
	var b []byte
	if buf != nil && len(s) <= len(buf) {
    
    
		*buf = tmpBuf{
    
    }
		b = buf[:len(s)]
	} else {
    
    
		b = rawbyteslice(len(s))
	}
	copy(b, s)
	return b
}

However, we still don't know when the first parameter of stringtoslicebyte will have a value and when it is nil. What to do then, I had to sacrifice the global search method:

# 在go源码根目录执行下面的命令
grep stringtoslicebyte -r . | grep -v "//"

Finally, the following code block was found in the go compiler source code cmd/compile/internal/gc/walk.go

Insert picture description here

We mkcallcan see from the function signature that all variables starting from the fourth parameter will be passed as parameters to the function corresponding to the first parameter, and finally a variable will be generated *Node. The Node structure is explained as follows:

// A Node is a single node in the syntax tree.
// Actually the syntax tree is a syntax DAG, because there is only one
// node with Op=ONAME for a given instance of a variable x.
// The same is true for Op=OTYPE and Op=OLITERAL. See Node.mayBeShared.

Based on the above information, we conclude that the compiler will generate an AST (Abstract Syntax Tree) corresponding node for the function call of stringtoslicebyte. Therefore, we also know that the first variable passed to the stringtoslicebyte function corresponds to the variable a in the figure above.

The initial value nodnil()of a is the return value, which is the default nil. However n.Esc == EscNone, a will become an array. Let's take a look at the explanation of EscNone .

// 此代码位于cmd/compile/internal/gc/esc.go中
const (
	// ...
	EscNone           // Does not escape to heap, result, or parameters.
    ...
)

From the above, it can be seen that it is EscNoneused to judge whether the variable escapes, and we are very easy to handle it here. Next, we will perform escape analysis on the code of phenomenon 1 and phenomenon 2.

# 执行变量逃逸分析命令: go run -gcflags '-m -l' test.go
# 现象一逃逸分析如下:
./test.go:7:14: ([]byte)(a) escapes to heap
./test.go:8:13: main ... argument does not escape
./test.go:8:13: bs escapes to heap
./test.go:8:21: len(bs) escapes to heap
./test.go:8:30: cap(bs) escapes to heap
[97 98 99] 3 8
# 现象二逃逸分析如下:
./test.go:7:14: main ([]byte)(a) does not escape
./test.go:8:13: main ... argument does not escape
./test.go:8:17: len(bs) escapes to heap
./test.go:8:26: cap(bs) escapes to heap
3 32

According to the above information, we know that in phenomenon one, the bs variable has escaped, and in phenomenon two, the variable has not escaped. That is to say, the first parameter of the stringtoslicebyte function is not nil when the variable has not escaped, and the variable has escaped. When the value is nil. At this point we have understood the first parameter of stringtoslicebyte, then we continue to analyze the internal logic of stringtoslicebyte

We see in runtime/string.go that the type of the first parameter of stringtoslicebyte is defined as follows:

const tmpStringBufSize = 32

type tmpBuf [tmpStringBufSize]byte

To sum up: In phenomenon two, there is no variable escape of the bs variable. The first parameter of stringtoslicebyte is not empty and is a byte array with a length of 32. Therefore, a slice with a capacity of 32 is generated in phenomenon two.

According to the source code analysis of stringtoslicebyte, we know that the rawbyteslicefunction is called in phenomenon one

func rawbyteslice(size int) (b []byte) {
    
    
	cap := roundupsize(uintptr(size))
	p := mallocgc(cap, nil, false)
	if cap != uintptr(size) {
    
    
		memclrNoHeapPointers(add(p, uintptr(size)), cap-uintptr(size))
	}

	*(*slice)(unsafe.Pointer(&b)) = slice{
    
    p, size, int(cap)}
	return
}

Known by the code above, slicing through the volume runtime / msize.go the roundupsizederived function calculation, which are defined in _MaxSmallSize and class_to_size runtime / sizeclasses.go

func roundupsize(size uintptr) uintptr {
    
    
	if size < _MaxSmallSize {
    
    
		if size <= smallSizeMax-8 {
    
    
			return uintptr(class_to_size[size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]])
		} else {
    
    
			return uintptr(class_to_size[size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]])
		}
	}
	if size+_PageSize < size {
    
    
		return size
	}
	return round(size, _PageSize)
}

Since the length of the string abc is less than _MaxSmallSize(32768), the length of the slice can only take the value in the array class_to_size, which is 0, 8, 16, 32, 48, 64, 80, 96, 112, 128....s

So far, why the slice capacity is 8 in Phenomenon 1 is also clear. I believe that many people here already understand what phenomenon four and five are about, and their logic is consistent with phenomenon one and two, respectively. If you are interested, you can have an interview on your computer.

String directly to slice

Then you have said so much, phenomenon three still cannot be explained. Please don’t worry about it, let’s continue our analysis.

I believe that all attentive friends should have discovered that we cmd/compile/internal/gc/walk.gohave folded part of the code in the source code diagram above , and now we will display this mysterious code nakedly

Insert picture description here

We analyzed this piece of code and found that the go compiler is 字符串转字节切片divided into three steps in total when generating AST.

  1. First judge whether the variable is a constant string, if it is a constant string, directly types.NewArraycreate an array with the same length as the string

  2. The slice variable generated by the constant string must also be escaped analysis, and determine whether its size is greater than the maximum length of the function stack that can be allocated to the variable, so as to determine whether the node is allocated on the stack or on the heap

  3. Finally, if the length of the string is greater than 0, copy the string content into the byte slice, and then return. Therefore, it is completely clear that the slice capacity in phenomenon three is 3.

in conclusion

The steps to convert a string to byte slice are as follows

  1. Determine whether it is a constant, if it is a constant, convert it to a byte slice of equal capacity and length
  2. If it is a variable, first judge whether the generated slice has variable escape
    • If escape or string length>32, then different capacity can be calculated according to string length
    • If there is no escape and the string length is <=32, the character slice capacity is 32

Expand

Common escape situations

  1. Function returns local pointer
  2. Insufficient stack space to escape
  3. Dynamic type escape, many function parameters are interface type, such as fmt.Println(a …interface{}), it is difficult to determine the specific type of its parameter during compilation, and escape will also occur
  4. Closure reference object escape

Note: At the time of writing this article, the go version used by the author is: go1.13.4

reference

https://golang.org/src/cmd/compile/README.md

https://my.oschina.net/renhc/blog/2222104

Life is endless, exploration is endless, follow-up will continue to update technical explorations about go

Originality is not easy, but humblely seek attention and collect Erlian.

Guess you like

Origin blog.csdn.net/u014440645/article/details/108636950