An over-optimization problem about O2 compilation option and its solution

【Problem background】

The O2 compilation option is widely used in embedded programming, which can greatly reduce CPU time consumption.

However, the higher the level of optimization compilation options, the greater the intensity of instruction optimization and rearrangement of the source code by the compilation tool chain, and the more likely the final generated assembly instructions will deviate from the logic expected by the programmer.

【Problem phenomenon】

At present, in actual work, it is found that the loop statement under O2 works abnormally . The experimental code snippet is as follows:

code 1

The function realized by code 1 is: poll the value whose address is 0xf0000000, once the value is 1, it will jump out of the loop and continue "other processing".

In heterogeneous programming, this writing method is often used to wait for the value of a certain memory/register to change.

However, practice has found that after the above code is compiled by O2, the actual working phenomenon is: even if the value in the address 0xf0000000 is 1, it cannot jump out of the loop. This is grossly inconsistent with programming expectations.

[problem root cause]

The assembly file of code 1 after O2 optimization is:

 

code 2

It can be seen from code 2 that the action of reading the value from the address 0xf0000000 only occurs once, and it is not read again in the loop, but simply compares the value read for the first time with the immediate value. Therefore, even if the value in the address 0xf0000000 changes to 1, it cannot break out of the loop.

In addition, by removing the optimization option as a control group, it is determined that the root cause of the problem is "O2 is enabled". At this time, the assembly file of Code 1 is:

 

code 3

It can be seen from code 3 that in the loop, the value in address 0xf0000000 is read first and then compared with the immediate value. Therefore, when O2 is not enabled, the logic of the assembly instruction is consistent with the programmer's expectation.

【solution】

Through the above analysis, we can know the root cause of the problem: O2 is enabled. But it is unrealistic not to use O2, especially for platforms with weak CPUs.

Faced with this kind of problem, you can use the volatile keyword to protect some code when all the code is in O2 optimization, so as to prevent the compiler from optimizing it, that is, change the third line in code 1 to:

unsigned int volatile *addr = 0xf0000000;

At this time, the assembly file of code 1 is:

 

code 4

It can be seen from code 4 that before each comparison, the value will be re-read from 0xf0000000, which is in line with the expected logic. And Code 4 is more streamlined and less time-consuming than Code 3 that does not use optimization compilation options at all.

【Unresolved issues】

At present, this problem exists in the cross-compilation tool chain arm-seev100-linux-gnueabihf- and aarch64-none-linux-gnu-, but it is not sure whether this problem will exist in compiling with gcc, so it is not completely sure whether it is compatible with compiling related to the toolchain.

Guess you like

Origin blog.csdn.net/u012824853/article/details/123020384