ARM embedded compiler compilation optimization option -O

The Arm Embedded Compiler can perform some optimizations to reduce code size and improve application performance. Not only that, different optimization levels have different optimization goals, but optimizing for one goal will affect other goals. For example, if you want to reduce the amount of code generated, it will inevitably affect the performance of the code. So the optimization level is always a trade-off between these different goals (code size, program performance, debug information).

Table of contents

Optimization level -O0

Optimization level -O1 

Optimization level -O2

Optimization level -O3

Optimization level -Os

Optimization level -Oz

Optimization level -Omin

Optimization level -Ofast

Optimization level -Omax


Arm Compiler for Embedded provides various optimization levels to control different optimization goals:

optimize the target available optimization levels
smaller code size -Oz-Omin
faster performance -O2-O3-Ofast-Omax
Take into account the amount of code and debug information -O1
Better correlation between source code and generated code -O0 (no optimization)
Faster compile and build times -O0 (no optimization)
Balance between code size and performance -Os
  • If you use a higher optimization level for performance, it also has a greater impact on other goals, such as a degraded debugging experience, increased code size, and increased compile and build times.
  • If the goal of an optimization is to reduce code size, it can have an impact on other goals, such as a degraded debugging experience, reduced performance, and increased compile and build times. 

Therefore, users can choose their own compilation optimization level according to their own compilation goals:

Optimization level -O0

-O0 will disable all optimizations .

This optimization level is the default . Using -O0 results in faster compile and build times, but produces code with lower performance and significantly higher code size and stack usage than other optimization levels. Since there is no optimization, the generated code is closely related to the source code, which results in significantly more generated code, including dead code.

Optimization level -O1 

-O1 enables core optimizations in the compiler. -O1 is easier for users to debug , and the code quality is better than -O0. Also, stack usage is improved over -O0. If you want to debug the program, it is recommended to use -O1 to get more debugging information. Using -O1 differs from using -O0 in that:

  • Optimizations are enabled, which may reduce the fidelity of debug information.
  • Inlining is enabled, which means that the backtrace of the function call stack may not have a hierarchical relationship like reading the source code. Inlining will directly load the function body to the place where the function is called.
  • A function without side effects may not be called where it is expected, or may even be ignored if the result returned by the function is not needed.
  • The value of a local variable may not be available in its scope after it is no longer used. For example, their stack locations may already be used by other modules.

Optimization level -O2

-O2 is more optimized for performance than -O1 . This layer is the first optimization layer where the compiler automatically generates vector instructions. It also degrades the debugging experience and can lead to increased code size. Using -O2 differs from using -O1 in that:

  • Increase the threshold for inlining function calls.
  • The number of loop unrolling performed may increase.
  • Vector instructions can be generated for simple loops and related sequences of independent scalar operations. The creation of vector instructions can be disabled using the armclang command line option -fno-vectorize .

Optimization level -O3

-O3 is a higher performance optimization compared to -O2 . This optimization level requires extensive compile-time analysis and optimization of resources, -O3 instructs the compiler to optimize the performance of the generated code , regardless of the amount of generated code , which may lead to increased code size. It also degrades the debugging experience compared to -O2. and:

  • Increase the threshold for inlining function calls.
  • The number of loop unrolling performed may increase.
  • On the compiler pipeline, a more aggressive instruction optimization strategy is implemented.

Optimization level -Os

The goal of -Os is to provide high performance without significantly increasing code size . Depending on the user's code, -Os may provide performance similar to -O2 or -O3. Compared with -O3, -Os reduces the amount of code. It also degrades the debugging experience compared to -O1. Using -Os differs from using -O3 in that:

  • Lowered the threshold for inlining function calls.
  • The amount of loop unrolling performed is significantly reduced.

Optimization level -Oz

-Oz's goal is to provide a smaller code size without using Link Time Optimization (LTO) . If LTO is not suitable for the user's application, Arm recommends that this option can be used for optimal code size. This optimization level degrades the debugging experience compared to -O1. Compared to using -Oz:

  • The compiler only optimizes for code size and ignores performance optimizations, which can lead to slower code.
  • Function inlining is not disabled. Inlining can reduce overall code size in some cases, for example, if a function is only called once.
  • Some optimizations that may increase the code size are prohibited, such as loop unrolling, loop vectorization, etc.
  • Outlining will be enabled for M-series AArch32 and other AArch64 targets. The outliner will search for codes of the same sequence in the code, put them in the same function, and then replace these same code segments by calling the same function. Outlining reduces code size but increases code execution time. Users can use -moutline, -mno-outline options to manually enable or disable this feature.

Optimization level -Omin

-Omin aims to provide a smaller code size than -Oz by using a subset of LTO features . Using -Omin differs from using -Oz in that:

  • -Omin supports a basic set of LTOs designed to remove unused code and data , while also attempting to optimize global memory access.
  • -Omin supports elimination of virtual functions (C++).

If you wish to compile under -Omin and use separate compile and link steps, you must also include -Omin on the armlink command line.

Optimization level -Ofast

-Ofast performs -O3-level optimizations, including those performed with the armclang option -ffast-math . This level also performs other aggressive optimizations that may violate strict adherence to language standards. Compared to -O3, this level degrades the debugging experience and may result in increased code size.

Optimization level -Omax

-Omax performs maximum optimization and is optimized specifically for performance. It supports all optimizations from -Ofast to LTO. At this level of optimization, the Arm Embedded Compiler may violate strict adherence to the language standard. Use this optimization level for the fastest performance . This level degrades the debugging experience compared to -Ofast and may result in increased code size. If the user wishes to compile under -Omax, and have separate compile and link steps, then you must also include -Omax on the armlink command line.

example

int test()
{
    int x=10, y=20;
    int z;
    z=x+y;
    return 0;
}

In the above code, int x=10 and z=x+y ,两行代码为死代码(dead code),如果使用-O0,则不进行任何优化,这两行将会被编译生成到源文件中:

armclang --target=arm-arm-none-eabi -march=armv7-a -O0 -S file.c

If -O1 is used, these two lines will be ignored:

armclang --target=arm-arm-none-eabi -march=armv7-a -O1 -S file.c

Selecting optimization optionsicon-default.png?t=N3I4https://developer.arm.com/documentation/100748/0620/Using-Common-Compiler-Options/Selecting-optimization-options?lang=en

Guess you like

Origin blog.csdn.net/luolaihua2018/article/details/130374387