Golang versions 1.13 and 1.14 optimized defer twice, which greatly reduced the performance overhead of defer in most scenarios. What is the principle?
This is because these two versions have added a new mechanism to defer
each so that defer
when the statement is compiled, the compiler will defer
choose run the call in a lighter way.
heap allocation
In versions prior to Golang 1.13, all defer
were allocated on the heap, and the mechanism did two steps at compile time:
- Inserted at the position of the
defer
statementruntime.deferproc
, when executed, the delayed call will be saved as a_defer
record , and the entry address of the delayed call and its parameters will be copied and saved, and stored in the call list of the goroutine. - Inserted at the position before the function returns
runtime.deferreturn
, when executed, the deferred call will be taken out of the Goroutine linked list and executed, and multiple deferred calls will be executed continuously in the form of jmpdefer tail-recursive calls.
The main performance issues with this mechanism lie in the memory allocation when each defer
statement generates a record, and the system call overhead of logging parameters and moving parameters when the call is completed.
stack allocation
Go 1.13 version newly added to deferprocStack
implement the form of allocation on the stack instead deferproc
. Compared with the latter, the allocation on the stack is released _defer
after , which saves the performance overhead caused by memory allocation, and only needs to maintain _defer
a properly maintained linked list. Can.
The compiler has its own logic to choose whether to use deferproc
or deferprocStack
, in most cases, the latter will be used, and the performance will be improved by about 30%. defer
However defer
, it is still used when the statement appears in a loop, or when higher-order compiler optimizations cannot be performed, or when too many are used in the same function deferproc
.
open coding
Go 1.14 version continues to add development coding (open coded), this mechanism will insert the delayed call directly before the function returns, omitting the deferproc
or deferprocStack
operation at runtime, deferreturn
and will not make tail recursive calls at runtime, but directly in the Iterates over all delayed function executions in a loop.
This mechanism makes defer
the overhead of is almost negligible , and the only runtime cost is to store information about participating in deferred calls, but using this mechanism requires some conditions:
- Compiler optimizations are not disabled, i.e. not set
-gcflags "-N"
; defer
The number in the function does not exceed 8, and the product of the return statement and the number of delayed statements does not exceed 15;defer
not in a loop statement.
The mechanism also introduces an element , the defer bit , which is used at runtime to record defer
whether is executed (especially in conditional branches defer
), so that it is convenient to determine which functions should be executed by the last defer call.
The principle of delay bit: every time a function appears in the same function, defer
it will be allocated 1 bit. If it is executed, it is set to 1, otherwise it is set to 0. When it is necessary to judge the delay call before the function returns, use the mask to judge Bit of each position, if it is 1, the delay function is called, otherwise it is skipped.
In order to be lightweight, the official limit the delay bits to 1 byte, that is, 8 bits, which defer
is If it exceeds, it will still choose stack allocation, but obviously it will not exceed 8 in most cases.
Demonstrate with code as follows:
deferBits = 0 // 延迟比特初始值 00000000
deferBits |= 1<<0 // 执行第一个 defer,设置为 00000001
_f1 = f1 // 延迟函数
_a1 = a1 // 延迟函数的参数
if cond {
// 如果第二个 defer 被执行,则设置为 00000011,否则依然为 00000001
deferBits |= 1<<1
_f2 = f2
_a2 = a2
}
...
exit:
// 函数返回之前,倒序检查延迟比特,通过掩码逐位进行与运算,来判断是否调用函数
// 假如 deferBits 为 00000011,则 00000011 & 00000010 != 0,因此调用 f2
// 否则 00000001 & 00000010 == 0,不调用 f2
if deferBits & 1<<1 != 0 {
deferBits &^= 1<<1 // 移位为下次判断准备
_f2(_a2)
}
// 同理,由于 00000001 & 00000001 != 0,调用 f1
if deferBits && 1<<0 != 0 {
deferBits &^= 1<<0
_f1(_a1)
}
Summarize
The performance of Golang defer statements has been criticized in the past, and the recently officially released version 1.14 finally brought a phased end to this controversy. If not in special cases, we don't need to worry about the performance overhead of defer.
References
[1] Ou Changkun - Original Go language:
https://changkun.de/golang/zh-cn/part2runtime/ch09lang/defer/
[2] Fengyun is hers - go1.14 realizes the principle of greatly improving defer performance:
http://xiaorui.cc/archives/6579
[3] 34481-opencoded-defers:
https://github.com/golang/proposal/blob/master/design/34481-opencoded-defers.md
This article is original and first published on the WeChat public account "Programming for Life". If you need to reprint, please leave a message in the background.