Eliminate common group
The linker can detect multiple copies of a section group and discard others.
® Arm Compiler for Embedded generates complete objects for linking. therefore:
- If inline functions are present in C and C++ source code, each object contains an out-of-line copy of the inline function required by the object.
- If you use templates in C++ source code, each object contains the template functions that the object requires.
When these functions are declared in a common header file, they may be defined multiple times in separate objects that are subsequently linked together. To eliminate duplication, the compiler compiles these functions into separate instances of the common section group.
Individual instances of a public section group may not be identical. For example, some copies may be in libraries built with different but compatible build options, different optimization or debugging options.
If the copies are not identical, 则 armlink
the best available variant of each common section group is retained based on the properties of the input object. Armlink
Discard the rest.
If the copies are identical, 则 armlink
the first partial group located will be retained.
You can control this optimization using the following linker options:
- Use the -option
-bestdebug
to use the largest common data (COMDAT) group (probably provides the best debugging view). -
Use the -
-no_bestdebug
option to use the smallest COMDAT group (possibly providing the smallest code size). This is the default setting.If you use -
g
to compile all files containing COMDAT group A,-no_bestdebug
the image will change even if - is used.
Eliminate unused parts
Eliminating unused parts is the most important optimization the linker performs on image size.
Elimination of unused parts:
- Remove inaccessible code and data from the final image.
- Suppressed under circumstances that might result in deletion of all parts.
To control this optimization, use armlink
options -- remove
, -- no_remove
, --first, - -last
and - -keep
.
Unused portion elimination requires an entry point. So if no entry point is specified for the image, use armlink
option -entry
-specify entry point.
Use armlink
option - -info unused
instructs the linker to generate a list of unused sections that it removes.
Notice
armlink
Reports错误:L6218E:未定义的符号 <symbol>
that this symbol has been removed even if the unused portion is removed. This behavior is different from the GNU linkerld
.
The input portion will remain in the final image if:
- It contains an entry point or an externally accessible symbol. For example, input functions in the security code of the Arm® v8-M security extension.
- It is
SHT_INIT_ARRAY
,SHT_FINI_ARRAY
orSHT_PREINIT_ARRAY
part of. - It is specified as the first or last input part, specified by the --or
first
option--last
or scatter-loading equivalent. - It is
--keep
marked as non-removable by the option. - It is referenced directly or indirectly by a non-weak reference to the input part held in the image.
- Its name matches the name referenced by the input section symbol, and the symbol is referenced from the section retained in the image.
Notice
Compilers usually collect functions and data together and emit a section for each category. The linker can only remove completely unused parts.
You can use
__attribute__(used))
attributes to mark functions or variables in your source code. This property causesarmclang
symbols__tagsym$$used.
<num> to be generated for each function or variable, where<num>
is a counter used to distinguish each symbol. Eliminating unused sections does not delete included__tagsym$$used.<num>
sections.You can also use
armclang
the option -ffunction-sections
to instruct the compiler to generate an ELF section for each function in the source file.
Optimize using RW data compression
RW data areas often contain a large number of repeating values (such as zeros), which makes them suitable for compression.
By default, RW data compression is enabled to minimize ROM size.
The linker compresses the data. This data is then decompressed on the target at runtime.
The Arm library contains a number of decompression algorithms and the linker chooses the best algorithm to add to the image in order to decompress the data region when the image is executed. The algorithm selected by the linker can be overridden.
How the linker selects a compressor
Armlink
Gather information about the content of parts of the data before choosing the most appropriate compression algorithm to produce the smallest image.
If compression is appropriate, armlink
only one data compressor can be used for all compressible data portions of the image. Different compression algorithms can be tried on these parts to produce the best overall size. Compression is automatically applied if:
Compressed data size + Size of decompressor < Uncompressed data size
When you select a compressor, armlink
the decompressor is added to the code area of the image. If the final image does not contain any compressed data, no decompressor will be added.
Options available to override the compression algorithm used by the linker
The linker has options for disabling compression or specifying the compression algorithm to use.
The compression algorithm used by the linker can be overridden in any of the following ways:
- Use the -
-datacompressor off
option to turn off compression. - Specify the compression algorithm.
To specify a compression algorithm, use the number of the desired compressor on the linker command line, for example:
armlink --datacompressor 2 ...
Use command line option - -datacompressor list
Get a list of compression algorithms available in the linker:
armlink --datacompressor list
...
Num Compression algorithm
========================================================
0 Run-length encoding
1 Run-length encoding, with LZ77 on small-repeats
2 Complex LZ77 compression
When choosing a compression algorithm, please note:
- Compressor 0 performs well on data with a lot of zero bytes but fewer non-zero bytes.
- Compressor 1 performs well when handling data with non-zero byte duplication.
- Compressor 2 performs well when processing data containing duplicate values.
The linker prefers compressor 0 or 1, where the data contains mostly zero bytes (>75%). When Compressor 2 is selected, the data contains very few zero bytes (<10%). If the image consists only of A32 code, the A32 decompressor is automatically used. If the image contains any T32 code, the T32 decompressor is used. If there is no clear preference, all compressors are tested to produce the best overall size.
Things to note when using RW data compression
There are some considerations when using RW data compression.
When using RW data compression:
- Use linker options -
-map
See where compression is applied to areas in your code. - If there is a reference from a compressed area to a linker-defined symbol using a load address, the linker turns off RW compression.
- If you are using an Arm® processor with on-chip cache, enable cache after decompression to avoid code consistency issues.
Compressed data segments are automatically decompressed at runtime if executed using code from the Arm library __main
. This code must be placed in the root zone. InRoot$$Sections
This is best done using a scatter file .
If you are using scatter files, you can NOCOMPRESS
specify that the load or execution regions are not compressed by adding attributes.
Functions inline with the linker
Linker inlining capabilities depend on the options you specify and the contents of the input files.
The linker can inline a small function in place of the branch instruction for that function. For the linker to be able to do this, the function (without a return instruction) must fit within the four bytes of the branch instruction.
Use the -- inline
and -- -no_inline
command line options to control branch inlining. However - -no_inline
only turns off inlining of user-supplied objects. By default, the linker still inlines functions in the Arm standard library.
If branch inlining optimization is enabled, the linker scans every function call in the image and inlines it as necessary. When the linker finds a suitable function to inline, it replaces the function call with instructions from the function being called.
The linker applies branch inlining optimization before eliminating any unused sections so that inline sections can also be removed when they are no longer called.
Notice
- For Arm®v7-A, the linker can inline two 16-bit encoded Thumb instructions in place of the 32-bit encoded Thumb®
BL
instructions.- For Armv8-A and Armv8-M, the linker can inline two 16-bit T32 instructions instead of 32-bit T32
BL
instructions.
Use the - -info=inline
command line option to list all inline functions.
About optimizing the branch to NOP
Although the linker can replace branches NOP
, in some cases you may want to prevent this from happening.
By default, the linker replaces any branch with a relocation that resolves to the NOP
next instruction with the instruction. This optimization can also be applied if the linker reorders the tail call section.
However, in some cases you may want to disable this option, for example when performing validation or pipeline refreshes.
To control this optimization, use the - -branchnop
and --no_branchnop
command line options.
Linker reordering of tail call sections
In some cases you may want the linker to reorder the tail call section.
The tail call section is the section that contains the branch instructions at the end of the section. If the branch instruction has a relocation that targets a function that begins in another section, the linker can place the tail calling section before the called section. The linker can then optimize the branch instructions at the end of the tail call section into NOP
instructions.
To take advantage of this behavior, use the command line option -tailreorder
-move the tail call section before its target.
Use the - -info=tailreorder
command line option to display information about any tail call optimizations performed by the linker.
Limitations on tail call partial reordering
There are some restrictions on the reordering of tail call sections.
Linker:
- For each tail call target, only one tail call part can be moved. If there are multiple tail calls to a single section, the tail call section with the same section name will be moved before the target. If a section name is not found in a tail call section with a matching name, the linker moves the first section it encounters.
- The tail call section cannot be moved out of its execution region.
- The tail is not moved before inline veneer.
Merge identical constants
The linker can attempt to merge identical constants in objects targeting AArch32 state. Objects must be generated using Arm® Compiler for Embedded 6. armclang -ffunction-sections
Merging is more efficient if compiled with options. This option is the default.
About this task
The following procedure is an example showing the merge functionality.
Notice
If using a scatter file, any areas marked with aOVERLAY
orPROTECTED
attribute will affectarmlink --merge_litpools
the behavior of the options.
program
- Create a C source file
litpool.c
containing the following code:int f1() { return 0xdeadbeef; } int f2() { return 0xdeadbeef; }
- Use
-S
compiled source code to create assembly files:armclang -c -S -target arm-arm-none-eabi -mcpu=cortex-m0 -ffunction-sections \ litpool.c -o litpool.s
Notice
-ffunction-sections
is the default value.Since
0xdeadbeef
it is a constant that is difficult to create using instructions, a text pool is created, for example:... f1: .fnstart @ BB#0: ldr r0, __arm_cp.0_0 bx lr .p2align 2 @ BB#1: __arm_cp.0_0: .long 3735928559 @ 0xdeadbeef ... .fnend ... .code 16 @ @f2 .thumb_func f2: .fnstart @ BB#0: ldr r0, __arm_cp.1_0 bx lr .p2align 2 @ BB#1: __arm_cp.1_0: .long 3735928559 @ 0xdeadbeef ... .fnend ...
Notice
Each function has a copy of the constants, sincearmclang
these constants cannot be shared between two functions. - Compile the source code to create the object:
armclang -c -target arm-arm-none-eabi -mcpu=cortex-m0 litpool.c -o litpool.o
--merge_litpools
Link object files using options:armlink --cpu=Cortex-M0 --merge_litpools litpool.o -o litpool.axf
Notice
--merge_litpools
is the default value.- Run
fromelf
to view the image structure:fromelf -c -d -s -t -v -z litpool.axf
The following example shows the combined results:
... f1 0x00008000: 4801 .H LDR r0,[pc,#4] ; [0x8008] = 0xdeadbeef 0x00008002: 4770 pG BX lr f2 0x00008004: 4800 .H LDR r0,[pc,#0] ; [0x8008] = 0xdeadbeef 0x00008006: 4770 pG BX lr $d.4 __arm_cp.1_0 0x00008008: deadbeef .... DCD 3735928559 ...