(Note: The following content and principles are all from "In-depth Understanding of Computer Systems, 3rd Edition")
When we implement a code with a judgment function, we need to use the if judgment statement. How is the if statement implemented in assembly? Let's first look at such a code
absdiff.c
long absdiff(long x, long y)
{
long result;
if(x < y)
result = y-x;
else
result = x-y;
return result;
}
Then we look at its assembly, we use Og optimization level
gcc -Og -S absdiff.c
Assemble on x86-64Linux as follows
.file "absdiff.c"
.text
.globl absdiff
.type absdiff, @function
absdiff:
.LFB0:
.cfi_startproc
cmpq %rsi, %rdi
jl .L4
movq %rdi, %rax
subq %rsi, %rax
ret
.L4:
movq %rsi, %rax
subq %rdi, %rax
ret
.cfi_endproc
.LFE0:
.size absdiff, .-absdiff
.ident "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section .note.GNU-stack,"",@progbits
We can see that the implementation is achieved by control transmission, that is, jumping. Modern CPUs work in a multi-stage pipeline. The latter instructions can be executed without waiting for the execution of the previous instructions to complete. Then the CPU needs to respond to cmpq The result is predicted, and a branch is selected for execution. When the prediction is wrong, the current work will be discarded, and the return to the jump will be executed again, which results in a great waste of CPU resources. The CPU prediction cannot guarantee a high accuracy rate, because the user procedure is unpredictable.
Next we increase the optimization level to O1 level:
gcc -O1 -S absdiff.c
The assembly result is as follows:
.file "absdiff.c"
.text
.globl absdiff
.type absdiff, @function
absdiff:
.LFB0:
.cfi_startproc
movq %rsi, %rdx
subq %rdi, %rdx
movq %rdi, %rax
subq %rsi, %rax
cmpq %rsi, %rdi
cmovl %rdx, %rax
ret
.cfi_endproc
.LFE0:
.size absdiff, .-absdiff
.ident "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section .note.GNU-stack,"",@progbits
It can be seen that the jump is not used, but the conditional transfer instruction cmovl is used, so that the CPU does not need to perform branch prediction, which can greatly improve performance. The implementation principle can be explained using the following C language
long cmovdiff(long x, long y)
{
long rval = y-x;
long eval = x-y;
long ntest = x >= y;
/* Line below requires
* single instruction: */
if(ntest)
rval = eval;
return rval;
}
But using conditional transfers doesn't always make your code more efficient. When a lot of computation is required in the if, when the relative conditions are not met, the work is wasted. The compiler has to choose between wasted computation and branch misprediction. In gcc, conditional transfer is only used when the expression is easy to evaluate. Typically, gcc uses conditional control transfers even though the overhead of many branch mispredictions would outweigh more complex computations.