Explore the three mechanisms of the Go language defer statement

Golang versions 1.13 and 1.14 optimized defer twice, which greatly reduced the performance overhead of defer in most scenarios. What is the principle?

This is because these two versions have added a new mechanism to defereach so that deferwhen the statement is compiled, the compiler will deferchoose run the call in a lighter way.

heap allocation

In versions prior to Golang 1.13, all deferwere allocated on the heap, and the mechanism did two steps at compile time:

  1. Inserted at the position of the deferstatement runtime.deferproc, when executed, the delayed call will be saved as a _deferrecord , and the entry address of the delayed call and its parameters will be copied and saved, and stored in the call list of the goroutine.
  2. Inserted at the position before the function returns runtime.deferreturn, when executed, the deferred call will be taken out of the Goroutine linked list and executed, and multiple deferred calls will be executed continuously in the form of jmpdefer tail-recursive calls.

The main performance issues with this mechanism lie in the memory allocation when each deferstatement generates a record, and the system call overhead of logging parameters and moving parameters when the call is completed.

stack allocation

Go 1.13 version newly added to deferprocStackimplement the form of allocation on the stack instead deferproc. Compared with the latter, the allocation on the stack is released _deferafter , which saves the performance overhead caused by memory allocation, and only needs to maintain _defera properly maintained linked list. Can.

The compiler has its own logic to choose whether to use deferprocor deferprocStack, in most cases, the latter will be used, and the performance will be improved by about 30%. deferHowever defer, it is still used when the statement appears in a loop, or when higher-order compiler optimizations cannot be performed, or when too many are used in the same function deferproc.

open coding

Go 1.14 version continues to add development coding (open coded), this mechanism will insert the delayed call directly before the function returns, omitting the deferprocor deferprocStackoperation at runtime, deferreturnand will not make tail recursive calls at runtime, but directly in the Iterates over all delayed function executions in a loop.

This mechanism makes deferthe overhead of is almost negligible , and the only runtime cost is to store information about participating in deferred calls, but using this mechanism requires some conditions:

  1. Compiler optimizations are not disabled, i.e. not set -gcflags "-N";
  2. deferThe number in the function does not exceed 8, and the product of the return statement and the number of delayed statements does not exceed 15;
  3. defernot in a loop statement.

The mechanism also introduces an element , the defer bit , which is used at runtime to record deferwhether is executed (especially in conditional branches defer), so that it is convenient to determine which functions should be executed by the last defer call.

The principle of delay bit: every time a function appears in the same function, deferit will be allocated 1 bit. If it is executed, it is set to 1, otherwise it is set to 0. When it is necessary to judge the delay call before the function returns, use the mask to judge Bit of each position, if it is 1, the delay function is called, otherwise it is skipped.

In order to be lightweight, the official limit the delay bits to 1 byte, that is, 8 bits, which deferis If it exceeds, it will still choose stack allocation, but obviously it will not exceed 8 in most cases.

Demonstrate with code as follows:

deferBits = 0  // 延迟比特初始值 00000000

deferBits |= 1<<0  // 执行第一个 defer,设置为 00000001
_f1 = f1  // 延迟函数
_a1 = a1  // 延迟函数的参数
if cond {
    // 如果第二个 defer 被执行,则设置为 00000011,否则依然为 00000001
    deferBits |= 1<<1
    _f2 = f2
    _a2 = a2
}
...
exit:
// 函数返回之前,倒序检查延迟比特,通过掩码逐位进行与运算,来判断是否调用函数

// 假如 deferBits 为 00000011,则 00000011 & 00000010 != 0,因此调用 f2
// 否则 00000001 & 00000010 == 0,不调用 f2
if deferBits & 1<<1 != 0 {
    deferBits &^= 1<<1  // 移位为下次判断准备
    _f2(_a2)
}
// 同理,由于 00000001 & 00000001 != 0,调用 f1
if deferBits && 1<<0 != 0 {
    deferBits &^= 1<<0
    _f1(_a1)
}

Summarize

The performance of Golang defer statements has been criticized in the past, and the recently officially released version 1.14 finally brought a phased end to this controversy. If not in special cases, we don't need to worry about the performance overhead of defer.

References

[1] Ou Changkun - Original Go language:
https://changkun.de/golang/zh-cn/part2runtime/ch09lang/defer/

[2] Fengyun is hers - go1.14 realizes the principle of greatly improving defer performance:
http://xiaorui.cc/archives/6579

[3] 34481-opencoded-defers:
https://github.com/golang/proposal/blob/master/design/34481-opencoded-defers.md


This article is original and first published on the WeChat public account "Programming for Life". If you need to reprint, please leave a message in the background.

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324119339&siteId=291194637