# Linux # gcc compiler optimization -O0 -O1 -O2 -O3 -OS Description

gcc provides a number of optimization options used to compile time, the target file size, the efficiency of three different dimensions of trade-offs and balance.

Common gcc compiler option 

  • -c Compile and only generate object files. 
  • -E only run C preprocessor. 
  • -g generate debugging information. GNU debugger can use this information. 
  • -Os relatively language -O2.5. 
  • -o FILE generate the specified output file. When used in an executable file. 
  • -O0 not optimized processing. 
  • Or -O -O1 generated code optimization. 
  • -O2 further optimized. 
  • -O3 than -O2 further optimization, including inline function. 
  • -shared generate a shared object file. Often used in building a shared library. 
  • -W open all warnings gcc can provide. 
  • -w does not generate any warning messages. 
  • -Wall generate all warnings.

Our optimization options and what each level are optimized for -O0 -O1 -O2 -O3 for the next four detail

-O0

-O0: do not do any optimization, which is the default compiler options. 

-O1

-O1: optimization will consume much less compile time, it is the main branch of the code, constants, and expressions were optimized. 

-O and -O1: do part of the program compiler optimization for large functions, optimizing compiler little more time and occupy considerable memory. Use of this term optimization, the compiler attempts to reduce the size of the generated code, and to shorten the execution time, but does not perform the optimization need to take a lot of time to compile. -O1 open tuning options: 

  • l -fdefer-pop: pop the stack delay time. When a function call is completed, the parameters do not immediately pop from the stack, but the plurality of function is called, one-time pop. 
  • l -fmerge-constants: attempt to merge the same coding unit across constants (string constants and floating point constants) 
  • l -fthread-jumps: if a branch destination of a jump there is another comparison conditions, and the conditions of the comparison comprises a first comparison statements, the implementation of this then optimized according to conditions is true or false, the front piece. redirect destination to the second branch or the second branch followed branch. 
  • l -floop-optimize: performing the optimization loop, the constant expression removed from the circulation loop to simplify determination conditions, and optionally do strength-reduction, or the like opens the cycle. In large and complex cycles, this optimization more significant. 
  • l -fif-conversion: attempts to conditional jumps into the equivalent non-branching pattern. Optimization implementations include moving condition, min, max, a flag is set, and abs instruction, and some arithmetic skills.  
  • The basic meaning l -fif-conversion2, found no more explanation. 
  • l -fdelayed-branch: This technique attempts to reschedule instructions according to the instruction cycle time. It also tries to put as many conditional branch instruction to move to the front, so that the best use of the processor's cache management. 
  • l -fguess-branch-probability: not available when profiling feedback or __builtin_expect, random pattern compiler possibility guess branch is performed, and moving a position corresponding assembly code, which may lead to different compilers compile a very different object code. 
  • l -fcprop-registers: Because the function assigned to the variable registers, the compiler performs a second examination schedule in order to reduce dependence (two segments requires the same register) and removing unnecessary register copy operation. 

-O2

-O2:'ll try more instruction-level optimization and optimization of register stages, it will take up more memory and compilation time during compilation. 
Gcc will perform almost all of the compromise does not include the optimization of time and space. When setting O2, the compiler does not circulate open () loop unrolling and function inlining. In comparison with the O1, O2 optimization increase compilation time on the foundation, improve the efficiency of the generated code. O1 O2 open all the options, and open the following options: 

  • l -fforce-mem: before making arithmetic operations, and then forced to perform the copy memory data registers later. This causes all memory references potential co-expression, and thus more efficient code output, when no common subexpressions, the instruction merge register is loaded with individual exhaust. This optimization variables involves only a single instruction, so maybe there will be a lot of optimization results. But for a lot of variables and then instruction (must mathematical operations) are involved, this is a very significant optimization will be, because values ​​and compared to access memory, access to the processor registers a value much faster. 
  • l -foptimize-sibling-calls: calls related to optimization and end of recursion. Generally, recursive function calls can be expanded into a series of general instructions, instead of using the branch. Such instruction cache processor can load the instructions and process them to expand, and instructions need to remain separate function calls compared to a branching operation, and faster. 
  • l -fstrength-reduce: This optimization technique to perform optimization cycle and elimination of iteration variables. It is tied to the variable iteration loop counter variables, such as the use of variables, for-next loop is then performed using a mathematical operation loop counter variable. 
  • l -fcse-follow-jumps: when the common subexpression elimination, when the target will not be a jump up the other path, then jump to scan the entire expression. For example, if encountered when the common sub-expression elimination when ... else ... statement, when the article is false, then the common subexpression elimination will follow the jump.   
  • l -fcse-skip-blocks: Similar to -fcse-follow-jumps, except that, in accordance with certain conditions, followed by a jump would be cse entire blocks 
  • l -frerun-cse-after-loop: after the optimization cycle is completed, re-operating common subexpression elimination. 
  • l -frerun-loop-opt: two runs loop optimization l -fgcse: Perform global common subexpression elimination pass. The pass also performs global constant and copy propagation. These analysis generated trying to optimize the operation of the assembly language code and general binding fragment thereof, to eliminate redundant code. If the code using computational goto, gcc instructions recommended -fno-gcse option. 
  • l-fgcse-lm: global common subexpression elimination will attempt to move the position of its own that is stored only kill the loading operation. This will allow the transfer of the sequence of operations in the circulation load / store to the outside of the load cycle (one only needs to load), change to the Copy / store sequence in the circulation. After selecting -fgcse, turned on by default. 
  • l -fgcse-sm: When the back pass a store operation in a global common subexpression elimination, this pass will attempt to transfer to the store operation to outside the loop. If used with -fgcse-lm, then the load / store operation will be converted into the loop before the Load, Store after the cycle, thereby improving operation efficiency, reduce unnecessary operations. 
  • l -fgcse-las: global common subexpression elimination pass to eliminate unnecessary load store back operation, which is generally the same load and store a memory cell (wholly or partially) 
  • l-fdelete-null-pointer-checks: the analysis of the global data stream, identifying and checking for discharging useless null pointer. The compiler is assumed that an indirect reference to a null pointer stops the program. If after checking pointer dereference, it can not be empty. 
  • l -fexpensive-optimizations: some from the compiler's point of view, expensive optimizations (This optimization is said to be the program execution may not be of great benefit and may even be very inefficient, concrete is not very clear) 
  • l -fregmove: the compiler tries to reallocate the number of register move instruction of simple instructions, or other similar number of other operations, so as to maximize the number of binding registers. This optimization especially for the machine instruction operand bis more helpful. 
  • l -fschedule-insns: compiler rearranges instructions to try to eliminate the delay due to waiting data is not ready is generated. This optimization will slow the machine and perform floating-point operations need to load memory instruction helps, because at this time to allow other instruction execution until the load memory of the instruction is completed, or floating point instruction again need cpu.
  • -fschedule-insns2: Similar -fschedule-insns. However, when the register allocation is complete, a request to program additional instructions pass. This optimization of the smaller register, and the load memory operation clock cycle time is greater than a machine with a very good results. 
  • l -fsched-interblock: This technique enables the compiler to schedule instructions across the block of instructions. This can be very flexible to move command in order to maximize the work completed during the wait. 
  • l -fsched-spec-load: load instructions allow some some speculative operation. (Specifically unknown) functions as well as the same -fsched-spec-load-dangerous, allowing more speculative load instruction operation. Both of these options is selected by default when -fschedule-insns. 
  • l -fcaller-saves: by way around the storage and recovery of call registers call, so the call to invoke a value may be assigned to a register, which will appear at the time was only used to produce a better code. (If more than one function call, this can save time, because only once to save and restore operations are not performed in each function call register.) 
  • l -fpeephole2: allowing a computer to perform a specific peephole optimization (that do not know what is meant), - in that the difference fpeephole -fpeephole2 and different compilers different ways, by the use of -fpeephole, some use -fpeephole2, there are also two are used. 
  • l -freorder-blocks: rearrange basic blocks when the compiled function, aimed at reducing the number of branches to improve code locality. 
  • l -freorder-functions: rearrange basic blocks when the compiled function, aimed at reducing the number of branches to improve code locality. Such embodiment optimization Depending on the specific information already exists: .text.hot inform the access frequency for the higher functions, .text.unlikely function for informing substantially not executed. 
  • l -fstrict-aliasing: This technique is high-level language to impose strict rules variables. And c a c ++ program, it does not ensure that the shared variables among the data type. For example, integer variables and single precision floating point variable is not the same memory location. 
  • l -funit-at-a-time: before the code generation, to analyze the entire assembly language code. This will make some additional optimization to be performed, but in between the compiler needs to consume large amounts of memory. (Data, said: This allows the compiler can rearrange the code does not consume a lot of time in order to optimize instruction cache) 
  • l -falign-functions: This option is used to align the start location in memory function specific boundary. Most processors read by page memory, and to ensure that all the function code located within a single page of memory, the code does not need Jiaohua desired page. 
  • l -falign-jumps: branch code aligned to n-th power of two boundary. In this case, without performing dummy instruction (dummy operations) 
  • l -falign-loops: loop to align the boundary n-th power of 2. It can be expected to perform multiple cycles to compensate for the time it takes to run dummy operations. 
  • l -falign-labels: aligned branches to n-th power of two boundary. This option is easy to make code slower, because the need to insert a number of dummy operations when the branch arrive usual flow of the code. 
  • l -fcrossjumping: This is the jump across the code conversion processing in order to combine the same code is scattered throughout the program. This reduces the length of the code, but may not have a direct impact on the performance of the program.  

-O3

-O3: for more optimized on the basis of the O2. For example pseudo register network, ordinary inline functions, and optimized for more cycles. On the basis of O2 contains all of the optimization, optimization and opened the following options: 

  • l -finline-functions: inline simple function to the called function.
  • l -fweb: Construction of pseudo-registers to store variables of the network. Pseudo-register contains data as if they are the same register, but can be used to optimize a variety of other optimization techniques, such as cse and loop optimization techniques. This optimization makes debugging more impossible, because the variables are no longer stored in the original register. 
  • l -frename-registers: After the register allocation, by using the registers left over to avoid spurious dependency predetermined code. This makes it very difficult to debug because the variables are no longer stored in the original register. 
  • l -funswitch-loops: a conditional branch out of the cycle of no change, the result will be replaced into the copy cycle.

-The

-Os: the equivalent of -O2.5. It is the use of all optimization options -O2, but not the method code size reduction.


In the official documentation GCC's also found a note on the -Os:
http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Optimize-Options.html#Optimize-Options
 

Published 170 original articles · won praise 207 · Views 4.59 million +

Guess you like

Origin blog.csdn.net/xiaoting451292510/article/details/104977828