Introduction to ARM linker optimization functions

Eliminate common group

The linker can detect multiple copies of a section group and discard others.

® Arm Compiler for Embedded generates complete objects for linking. therefore:

If inline functions are present in C and C++ source code, each object contains an out-of-line copy of the inline function required by the object.
If you use templates in C++ source code, each object contains the template functions that the object requires.

When these functions are declared in a common header file, they may be defined multiple times in separate objects that are subsequently linked together. To eliminate duplication, the compiler compiles these functions into separate instances of the common section group.

Individual instances of a public section group may not be identical. For example, some copies may be in libraries built with different but compatible build options, different optimization or debugging options.

If the copies are not identical, 则 armlink the best available variant of each common section group is retained based on the properties of the input object. Armlink Discard the rest.

If the copies are identical, 则 armlink the first partial group located will be retained.

You can control this optimization using the following linker options:

Use the -option -bestdebug to use the largest common data (COMDAT) group (probably provides the best debugging view).
Use the - -no_bestdebug option to use the smallest COMDAT group (possibly providing the smallest code size). This is the default setting.

If you use - g to compile all files containing COMDAT group A, -no_bestdebugthe image will change even if - is used.

Eliminate unused parts

Eliminating unused parts is the most important optimization the linker performs on image size.

Elimination of unused parts:

Remove inaccessible code and data from the final image.
Suppressed under circumstances that might result in deletion of all parts.

To control this optimization, use armlink options -- remove, -- no_remove, --first, - -last and - -keep.

Unused portion elimination requires an entry point. So if no entry point is specified for the image, use armlink option -entry -specify entry point.

Use armlink option - -info unused instructs the linker to generate a list of unused sections that it removes.

Notice
armlink Reports 错误：L6218E：未定义的符号 <symbol> that this symbol has been removed even if the unused portion is removed. This behavior is different from the GNU linker ld .

The input portion will remain in the final image if:

It contains an entry point or an externally accessible symbol. For example, input functions in the security code of the Arm® v8-M security extension.
It is SHT_INIT_ARRAY, SHT_FINI_ARRAYor SHT_PREINIT_ARRAYpart of.
It is specified as the first or last input part, specified by the --or first option --last or scatter-loading equivalent.
It is --keep marked as non-removable by the option.
It is referenced directly or indirectly by a non-weak reference to the input part held in the image.
Its name matches the name referenced by the input section symbol, and the symbol is referenced from the section retained in the image.

Notice

Compilers usually collect functions and data together and emit a section for each category. The linker can only remove completely unused parts.

You can use __attribute__（used）） attributes to mark functions or variables in your source code. This property causes armclang symbols __tagsym$$used.<num> to be generated for each function or variable, where <num> is a counter used to distinguish each symbol. Eliminating unused sections does not delete included __tagsym$$used.<num> sections.

You can also use armclang the option - ffunction-sections to instruct the compiler to generate an ELF section for each function in the source file.

Optimize using RW data compression

RW data areas often contain a large number of repeating values (such as zeros), which makes them suitable for compression.

By default, RW data compression is enabled to minimize ROM size.

The linker compresses the data. This data is then decompressed on the target at runtime.

The Arm library contains a number of decompression algorithms and the linker chooses the best algorithm to add to the image in order to decompress the data region when the image is executed. The algorithm selected by the linker can be overridden.

How the linker selects a compressor

Armlink Gather information about the content of parts of the data before choosing the most appropriate compression algorithm to produce the smallest image.

If compression is appropriate, armlink only one data compressor can be used for all compressible data portions of the image. Different compression algorithms can be tried on these parts to produce the best overall size. Compression is automatically applied if:

Compressed data size + Size of decompressor < Uncompressed data size

When you select a compressor, armlink the decompressor is added to the code area of the image. If the final image does not contain any compressed data, no decompressor will be added.

Options available to override the compression algorithm used by the linker

The linker has options for disabling compression or specifying the compression algorithm to use.

The compression algorithm used by the linker can be overridden in any of the following ways:

Use the - -datacompressor off option to turn off compression.
Specify the compression algorithm.

To specify a compression algorithm, use the number of the desired compressor on the linker command line, for example:

armlink --datacompressor 2 ...

Use command line option - -datacompressor list Get a list of compression algorithms available in the linker:

armlink --datacompressor list
...
Num     Compression algorithm
========================================================
0       Run-length encoding
1       Run-length encoding, with LZ77 on small-repeats
2       Complex LZ77 compression

When choosing a compression algorithm, please note:

Compressor 0 performs well on data with a lot of zero bytes but fewer non-zero bytes.
Compressor 1 performs well when handling data with non-zero byte duplication.
Compressor 2 performs well when processing data containing duplicate values.

The linker prefers compressor 0 or 1, where the data contains mostly zero bytes (>75%). When Compressor 2 is selected, the data contains very few zero bytes (<10%). If the image consists only of A32 code, the A32 decompressor is automatically used. If the image contains any T32 code, the T32 decompressor is used. If there is no clear preference, all compressors are tested to produce the best overall size.

Things to note when using RW data compression

There are some considerations when using RW data compression.

When using RW data compression:

Use linker options - -map See where compression is applied to areas in your code.
If there is a reference from a compressed area to a linker-defined symbol using a load address, the linker turns off RW compression.
If you are using an Arm® processor with on-chip cache, enable cache after decompression to avoid code consistency issues.

Compressed data segments are automatically decompressed at runtime if executed using code from the Arm library __main. This code must be placed in the root zone. InRoot$$Sections This is best done using a scatter file .

If you are using scatter files, you can NOCOMPRESS specify that the load or execution regions are not compressed by adding attributes.

Functions inline with the linker

Linker inlining capabilities depend on the options you specify and the contents of the input files.

The linker can inline a small function in place of the branch instruction for that function. For the linker to be able to do this, the function (without a return instruction) must fit within the four bytes of the branch instruction.

Use the -- inline and -- -no_inline command line options to control branch inlining. However - -no_inline only turns off inlining of user-supplied objects. By default, the linker still inlines functions in the Arm standard library.

If branch inlining optimization is enabled, the linker scans every function call in the image and inlines it as necessary. When the linker finds a suitable function to inline, it replaces the function call with instructions from the function being called.

The linker applies branch inlining optimization before eliminating any unused sections so that inline sections can also be removed when they are no longer called.

Notice

For Arm®v7-A, the linker can inline two 16-bit encoded Thumb instructions in place of the 32-bit encoded Thumb® BL instructions.

For Armv8-A and Armv8-M, the linker can inline two 16-bit T32 instructions instead of 32-bit T32 BL instructions.

Use the - -info=inline command line option to list all inline functions.

About optimizing the branch to NOP

Although the linker can replace branches NOP, in some cases you may want to prevent this from happening.

By default, the linker replaces any branch with a relocation that resolves to the NOP next instruction with the instruction. This optimization can also be applied if the linker reorders the tail call section.

However, in some cases you may want to disable this option, for example when performing validation or pipeline refreshes.

To control this optimization, use the - -branchnop and --no_branchnop command line options.

Linker reordering of tail call sections

In some cases you may want the linker to reorder the tail call section.

The tail call section is the section that contains the branch instructions at the end of the section. If the branch instruction has a relocation that targets a function that begins in another section, the linker can place the tail calling section before the called section. The linker can then optimize the branch instructions at the end of the tail call section into NOP instructions.

To take advantage of this behavior, use the command line option -tailreorder -move the tail call section before its target.

Use the - -info=tailreorder command line option to display information about any tail call optimizations performed by the linker.

Limitations on tail call partial reordering

There are some restrictions on the reordering of tail call sections.

Linker:

For each tail call target, only one tail call part can be moved. If there are multiple tail calls to a single section, the tail call section with the same section name will be moved before the target. If a section name is not found in a tail call section with a matching name, the linker moves the first section it encounters.
The tail call section cannot be moved out of its execution region.
The tail is not moved before inline veneer.

Merge identical constants

The linker can attempt to merge identical constants in objects targeting AArch32 state. Objects must be generated using Arm® Compiler for Embedded 6. armclang -ffunction-sections Merging is more efficient if compiled with options. This option is the default.

About this task

The following procedure is an example showing the merge functionality.

Notice
If using a scatter file, any areas marked with a OVERLAY or PROTECTED attribute will affect armlink --merge_litpools the behavior of the options.

program

Create a C source file litpool.ccontaining the following code:

int f1() {
    return 0xdeadbeef;
}
int f2() {
    return 0xdeadbeef;
}

Use -S compiled source code to create assembly files:

armclang -c -S -target arm-arm-none-eabi -mcpu=cortex-m0 -ffunction-sections \
    litpool.c -o litpool.s

Notice
- ffunction-sections is the default value.

Since 0xdeadbeefit is a constant that is difficult to create using instructions, a text pool is created, for example:

...
f1:
    .fnstart
@ BB#0:
    ldr    r0, __arm_cp.0_0
    bx     lr
    .p2align    2
@ BB#1:
__arm_cp.0_0:
    .long    3735928559              @ 0xdeadbeef
...
    .fnend

...
    .code    16                      @ @f2
    .thumb_func
f2:
    .fnstart
@ BB#0:
    ldr    r0, __arm_cp.1_0
    bx     lr
    .p2align    2
@ BB#1:
__arm_cp.1_0:
    .long    3735928559              @ 0xdeadbeef
...
    .fnend
...

Notice
Each function has a copy of the constants, since armclang these constants cannot be shared between two functions.

Compile the source code to create the object:

armclang -c -target arm-arm-none-eabi -mcpu=cortex-m0 litpool.c -o litpool.o

--merge_litpools Link object files using options:
```
armlink --cpu=Cortex-M0 --merge_litpools litpool.o -o litpool.axf
```
Notice
- -merge_litpools is the default value.

Run fromelf to view the image structure:

fromelf -c -d -s -t -v -z litpool.axf

The following example shows the combined results:

...
    f1
        0x00008000:    4801        .H      LDR      r0,[pc,#4] ; [0x8008] = 0xdeadbeef
        0x00008002:    4770        pG      BX       lr
    f2
        0x00008004:    4800        .H      LDR      r0,[pc,#0] ; [0x8008] = 0xdeadbeef
        0x00008006:    4770        pG      BX       lr
    $d.4
    __arm_cp.1_0
        0x00008008:    deadbeef    ....    DCD    3735928559
...