GUN C Compiler Extended Grammar Study Notes (2) Attribute Declaration

property declaration

- 1. Property declaration

1. Property declaration

1. Storage segment: section

1.1 GNU C compiler extension keywords:`attribute`

GNU C adds a __attribute__keyword for declaring special properties of a function, variable, or type. The main purpose is to instruct the compiler to perform certain aspects of optimization or code inspection when compiling the program. For example, we can specify the data alignment of a variable through an attribute declaration.
__attribute__The use of is very simple. When we define a function, variable or type, just add the following attribute declaration next to their name.
insert image description here
It should be noted that __attribute__there are two pairs of parentheses after it, and you can’t just write one pair for convenience, otherwise an error will be reported when compiling. The ATTRIBUTE in parentheses indicates the attribute to be declared. __attribute__More than a dozen attribute declarations are currently supported.

section.
aligned.
packed.
format.
weak.
alias.
noinline.
always_inline.

aligned and packed are used to explicitly specify the storage alignment of a variable. Under normal circumstances, when we define a variable, the compiler will allocate an appropriate size of storage space for the variable according to the variable type, and allocate an address according to the default boundary alignment. And using the __atttribute__ attribute declaration is equivalent to telling the compiler: to allocate storage space for this variable according to the boundary alignment we specified.
insert image description here
Some properties may also have their own parameters. For example, aligned(8) indicates that the variable is aligned according to the 8-byte address, and the parameter of the attribute must also be enclosed in parentheses. If the parameter of the attribute is a string, the parameter in the parentheses must also be enclosed in double quotation marks.

1.2 Property declaration:`section`

The main function of the section attribute is: when the program is compiled, put a function or variable into the specified section, that is, into the specified section. An executable file is mainly composed of a code segment, a data segment, and a BSS segment. In addition to these three sections, the executable contains several other sections. In the terminology of the compiler, it also contains other sections, such as read-only data segments, symbol tables, etc.
In the Linux environment, use GCC to compile and generate an executable file a.out, and use the readelf command to view the basic information of each section in the executable file, such as size and starting address. Among these sections, the .text section is what we often call the code section, the .data section is the data section, and the .bss section is the BSS section.
insert image description here

When the compiler compiles the program, the source file is used as the unit to compile each source file into an object file. During the compilation process, the compiler will follow this default rule, put functions and variables sectionin different files, and finally sectionform an object file. After the compilation process is over, the linker will assemble, merge and relocate each object file to generate an executable file.
insert image description here
In GNU C, we can explicitly specify a function or variable through the attribute, and put it in the specified one at compile __attribute__time . Through the above program, we know that uninitialized global variables are placed in the .bss section by default, that is, they are placed in the segment by default. Now we can put this uninitialized global variable into the data segment through the attribute declaration . Viewing the symbol table through the readelf command, we can see that this uninitialized global variable, declared through attributes, is placed in the data segment by the compiler just like the initialized global variable .sectionsectionBSSsection.data
uninit_val__attribute__((section(".data"))).data

1.3 U-boot image self-replication analysis

With the section attribute declaration, we can try to analyze: U-boot在启动过程中，是如何将自身代码加载的RAM中的.
In embedded Linux, the purpose of U-boot is mainly to load the Linux kernel image into the memory, pass the startup parameters to the kernel, and then boot the Linux operating system to start. U-boot is generally stored on NOR Flash or NAND Flash. Regardless of booting from NOR Flash or NAND Flash, U-boot itself will load its own code from the Flash storage medium into the memory during the boot process , and then relocate and jump to the memory RAM for execution.
那么U-boot是怎么完成代码自复制的呢？或者说它是怎样将自身代码从Flash复制到内存的呢？
In the process of copying its own code, a main question is: how does U-boot identify its own code? How do you know where to start copying code? How does it know where to stop copying? At this time, we need to understand a zero-length array in the U-boot source code.
insert image description here The function of these two lines of code is to define a zero-length array respectively, and instruct the compiler to place them in the two sections .__image_copy_start and .__image_copy_end respectively.
When the linker links each object file, it will assemble each section into an executable file according to the order of each section in the link script.
insert image description here
Through the link script, we can see that __image_copy_startthese __image_copy_endtwo are placed in front of the code segment.text and behind the data segment.datasection when linking., as the start address and end address for U-boot to copy its own code. **In these two sections, except for two zero-length arrays, we did not put other variables. As we all know, zero-length arrays do not occupy storage space. **Therefore, the above two zero-length arrays respectively represent the start address and end address of the U-boot image to copy its own image. Regardless of whether U-boot's own image is stored in NOR Flash or NAND Flash, as long as we know these two addresses, we can directly call the relevant code to copy.
In the embedded system, through the LDR pseudo-instruction of ARM, the first address of the image to be copied is obtained directly, and stored in the R1 register. The array name itself actually represents an address. In this way, U-boot has completed the copying of its own code in the initial stage of embedded startup: copying its own image from Flash to memory, then relocating, and finally jumping to executed in memory.

2. Attribute declaration: aligned

2.1 Address alignment: aligned

GNU C specifies the alignment of a variable or type__attribute__ through declarations alignedand packedattributes . These two attributes are used to tell the compiler: when allocating storage space to a variable, the address should be allocated to the variable according to the specified address alignment. Define a variable, aligned with a byte address in memory , and you can define it like this. ![Insert picture description here](https://img-blog.csdnimg.cn/9bf5c7c5a99747d4b956b2ecd86b31c3.png#pic_center Through attributes, we can explicitly specify the address alignment of variables in memory. aligned has a parameter that means To align by a few bytes, pay attention when using it. The number of bytes aligned by the address must be a power of 2 , otherwise the compilation will make an error. Generally, when we define a variable, the compiler will follow the default address Alignment, to assign a storage space address to the variable. In the above program, we define 2 int variables and 2 char variables respectively, and then print their addresses respectively. The running results are as follows. For int data, Its address in memory is aligned with 4 bytes or an integer multiple of 4 bytes. And char type data is aligned with 1 byte in memory. The variable c2 is directly assigned to the next variable of c1 A storage unit does not need to consider 4-byte alignment like int data. Next, we modify the program to specify that the variable c2 is aligned by 4 bytes. The program running results are as follows.
int8
aligneda

insert image description here

As can be seen from the running results, the character variable c2 is declared to be aligned on a 4-byte boundary using the aligned attribute, so it is impossible for the compiler to assign the address 0x00402009 to it, because this address is not aligned to 4 bytes. The compiler will free up 3 storage units and allocate storage space to the variable c2 directly from the address 0x0040200C. Through the aligned attribute declaration, although the address alignment of variables can be explicitly specified, it will also cause certain memory holes due to boundary alignment, wasting memory resources. For example, in the above program, the storage units at the three addresses 0x00402009~0x0040200b are not used.
Problem: 地址对齐会造成一定的内存空洞，为什么使用地址对齐？
Reason:这种对齐设置可以简化CPU和内存RAM之间的接口和硬件设计。为了配合计算机的硬件设计，编译器在编译程序时，对于一些基本数据类型，如int、char、short、float等，会按照其数据类型的大小进行地址对齐，按照这种地址对齐方式分配的存储地址，CPU一次就可以读写完毕。虽然边界对齐会造成一些内存空洞，浪费一些内存单元，但是在硬件上的设计却大大简化了。

2.2 Structure alignment: aligned

A structure is a compound data type. When the compiler allocates storage space for a structure variable, it must not only consider the address alignment of each basic member in the structure, but also consider the alignment of the structure as a whole. In order to align the address of each member in the structure, the compiler may fill some space in the structure; in order to align the structure as a whole, the compiler may fill some space at the end of the structure.
For example, define a structure, define three members of int, char and short in the structure, and print the size of the structure and the address of each member.
insert image description here The program running results are as follows.

Normally, member b of the structure needs 4-byte alignment, so after the compiler allocates 1 byte of storage space for member a, it will free up 3 bytes, and only at address 0x0028FF34 that satisfies 4-byte alignment Allocate 4 bytes of storage space to member b; then member c of short type occupies 2 bytes of storage space; the three structure members occupy a total of 1+3+4+2=10 bytes of storage space. However, according to the alignment rules of the structure, the overall alignment of the structure should be aligned according to the maximum number of aligned bytes in all members of the structure or an integer multiple thereof, or the overall length of the structure should be an integer of the maximum number of bytes of its members times, if it is not an integer multiple, it must be filled. **Because the largest member int of the structure is 4 bytes, the structure should be aligned by 4 bytes, or if the overall length of the structure is an integer multiple of 4, 2 bytes should be added at the end of the structure, and finally the structure The size is 12 bytes.
Struct members are laid out in a different order, which may result in a different overall length of the struct. As shown in the following program.
insert image description here The program running results are as follows.
We found that the char type variable a and the short type variable b are allocated in the first 4 bytes of storage space of the structure, and both meet their respective address alignment. The size of the entire structure is 8 bytes, resulting in only 1 word section memory holes. We continue to modify the program so that the short type variable b is aligned by 4 bytes.
insert image description here The program running results are as follows.

The size of the structure becomes 12 bytes again. This is because we explicitly specify that the short variable is aligned with a 4-byte address, resulting in a 3-byte space filled behind the variable a. The int variable c is also 4-byte aligned, so the variable b is also filled with 2 bytes, resulting in a size of the entire structure of 12 bytes.
We can not only explicitly specify the address alignment of a member in the structure, but also explicitly specify the alignment of the entire structure.
insert image description here The program running results are as follows.
In this structure, each member occupies a total of 8 bytes. Through the previous study, we know that the alignment of the entire structure only needs to be aligned according to the number of aligned bytes of the largest member. Therefore, the whole structure is aligned with 4 bytes, and the overall length of the structure is 8 bytes. But here, it is explicitly specified that the structure as a whole is aligned with 16 bytes, so the compiler will pad 8 bytes at the end of the structure to meet the requirements of 16-byte alignment, and finally the total length of the structure becomes 16 byte.
Question: 编译器一定会按照aligned指定的方式对齐吗?
Answer: 非也！我们通过这个属性声明，其实只是建议编译器按照这种大小地址对齐，但不能超过编译器允许的最大值。一个编译器，对每个基本数据类型都有默认的最大边界对齐字节数。如果超过了，则编译器只能按照它规定的最大对齐字节数来给变量分配地址。
insert image description here In this program, we specify that the variable c2 of type char is aligned with 16 bytes, and the results of compiling and running are as follows.
We can see that the address allocated by the compiler to c2 is aligned with a 16-byte address. If we continue to modify the c2 variable to align with a 32-byte address, you will find that the running result of the program will no longer change, and the compiler will still allocate a 16-byte aligned address, because the 32-byte alignment exceeds the maximum allowed by the compiler.

2.3 Attribute declaration: packed

The aligned attribute is generally used to increase the address alignment of variables, and the address alignment between elements will cause certain memory holes. The packed attribute is the opposite. It is generally used to reduce address alignment, and the specified variable or type uses the smallest possible address alignment .
insert image description here In the above program, we declare the members b and c of the structure with the packed attribute, which is to tell the compiler to use the smallest possible address alignment to allocate addresses to them and reduce memory holes as much as possible. The result of running the program is as follows.
insert image description here
This feature is still very useful in the development of low-level drivers. Using packed can avoid this problem. Each member of the structure is next to each other, and the storage address is allocated in sequence, thus avoiding the memory hole caused by the address alignment of each member. We can also add the packed attribute to the entire structure, which has the same effect as adding the packed attribute to each member separately.
insert image description here

2.4 aligned and packed declarations in the kernel

In the Linux kernel source code, we often see aligned and packed used together, that is, using both aligned and packed attribute declarations for a variable or type. The advantage of this is that it not only avoids memory holes caused by address alignment of members in the structure, but also specifies the alignment of the entire structure.
insert image description here The program running results are as follows.

Although the structure data uses packedattribute declarations, the storage space occupied by all members in the structure is 7 bytes, but we also use the aligned(8)specified structure to 8align by byte address, so the compiler needs to fill 1bytes behind the structure, so The size of the entire structure becomes 8bytes, 8aligned by byte address.

3、format

3.1 Format check of variable parameter function

GNU uses __attribute__the extended format attribute to specify the parameter format check of variable parameter functions. The method of use is as follows.
insert image description here
In some commercial projects, we often implement some custom printing debugging functions, and even implement an independent log printing module. These custom printing functions are often variable parameter functions. When users call these interface functions, the parameters are often not fixed. So how does the compiler know whether our parameter format is correct when compiling the program?
__attribute__The formatattributes of this time come in handy. In the sample code above, we define a LOG()variable parameter function to realize the log printing function.
When the compiler compiles the program, how to check whether the parameter format of the LOG() function is correct? __attribute__（（format(printf，1，2)））This is done by adding an attribute declaration to the LOG() function . This attribute declaration tells the compiler: Do you know the printf() function? How do you check the parameter format of the printf() function? Check the LOG() function in the same way.
The attribute format(printf, 1, 2) has 3 parameters, the first parameter printf tells the compiler to check according to the standard of the printf() function; the second parameter indicates the format in all parameter lists of the LOG() function The position index of the string; the third parameter tells the compiler the starting position of the parameter to check.
insert image description here
There are 2 parameters in this LOG() function, the first parameter is the format string, and the second parameter is a constant value 0 to be printed, which is used to match the placeholder in the format string.

3.2 Try to implement variable parameter printing function

Try to implement variable parameter printing functionmy_printf(char *fmt, ...)
insert image description here

3.2 Try to implement the log printing function

Try to implement a variadic print function LOG(char *fmt, ...). Although the C standard library functions have ready-made printing functions, they cannot fully meet our debugging and printing needs. During the embedded debugging process, we may need to implement the byte printing format, printing output mode control, printing switch control and priority Control, and can continue to add functions as needed. Moreover, during the debugging process, we need to print the process parameters to facilitate the debugging results, but it will be troublesome to delete after debugging. Therefore, we can turn it off or on through a macro switch, which is much more convenient for maintenance. As shown in the following code
insert image description here
, when we define a DEBUG switch macro in the program, the LOG() function realizes the normal printing function; when we delete the DEBUG macro, the LOG() function is an empty function. In addition, you can also set some printing levels through macros. For example, it can be divided into printing levels such as ERROR, WARNNING, and INFO. According to the set printing level, the log information printed by the module is also different. As shown in the following program.
insert image description here We encapsulate 3 printing functions: INFO(), WARN() and ERR(), which print log information with different priorities respectively. In the actual debugging, we can set the appropriate printing level according to the printing information we need, and then we can control the printing information hierarchically.

4、weak

4.1 Strength symbols

GNU C can convert a strong symbol to a weak symbol through the weak attribute declaration.
insert image description here
Whether it is a variable name or a function name, in the eyes of the compiler, it is just a symbol. Symbols can be divided into strong symbols and weak symbols.

Strong symbols: function names, initialized global variable names.
Weak Symbol: Uninitialized global variable name.
For the same global variable name and function name, we can generally attribute it to the following three scenarios.
Strong Symbol + Strong Symbol.
Strong symbols + weak symbols.
Weak symbol + weak symbol.
Strong symbols and weak symbols are mainly used to solve the conflict problem of multiple global variables with the same name and functions with the same name in the process of program linking. Generally we follow the following three rules.
One mountain cannot accommodate two tigers. In a project, two strong symbols cannot exist at the same time. If you define two functions or global variables with the same name in a multi-file project, the linker will report a redefinition error when linking.
Strong and weak can coexist. Both strong symbols and weak symbols are allowed to exist in a project. Such as defining an initialized global variable and an uninitialized global variable at the same time.
The bigger one wins. In a project, when symbols with the same name are weak symbols, which one should the compiler choose? Whoever has the largest volume, that is, whoever has the largest storage space in the memory, will be selected.
The sample program is as follows:
The program result is as follows:

We defined two global variables a and b with the same name in main.c and func.c respectively, but one is a strong symbol and the other is a weak symbol. During the linking process, the linker will choose a strong symbol when it sees a conflicting symbol of the same name, so you will see that whether it is the main() function or the func() function, the value of the strong symbol is printed.

4.2 Strong and weak signs of functions

The linker also follows the same rules for function conflicts with the same name. The function name itself is a strong symbol. If two functions with the same name are defined in one project, a redefinition error will definitely be reported during compilation. But we can convert one of the function names to a weak symbol through the weak attribute declaration.
insert image description here
The result of the program is as follows:

In this program, we define a func() function with the same name in main.c, and then convert it into a weak symbol through the weak attribute declaration. The linker will select the strong symbol in func.c when linking. When we call the func() function in the main() function, we actually call the func() function in the func.c file. The global variable a is just the opposite, because a weak symbol is defined in func.c, so the value of the global variable a in main.c is printed in the func() function.

4.2 Purpose of weak functions

When a variable or function is referenced in a source file, when the compiler only sees its declaration but not its definition, the compiler will generally compile without reporting an error: the compiler will think that this symbol may be defined in other files. In the linking stage, the linker will look for the definitions of these symbols in other files, and if not found, it will report an undefined error.
When a function is declared as a weak symbol, there is a peculiarity: when the linker cannot find the definition of this function, it will not report an error. The compiler will set this function name, the weak symbol, to 0 or a special value. Only when the program is running, calling this function, jumping to zero address or a special address will report an error and generate a memory error. insert image description here

In this sample program, we did not define the func() function, but just made a declaration in main.c and declared it as a weak symbol. Compile this project, and you will find that the program can be compiled and passed, but an error will occur when the program is running, resulting in a segment error.
In order to prevent the function from running wrong, we can make a judgment before running the function to see if the address of the function name is 0, and then decide whether to call and run, so as to avoid segment faults.
insert image description here
Compile the program and run it, you can see that the program can run normally, and no segmentation fault occurs again.

This feature of weak symbols is widely used in library functions. If when you develop a library, the basic functions have been realized, but some advanced functions have not been realized, then you can convert these functions into a weak symbol through the weak attribute declaration. With this setting, even if the function has not been defined, we only need to make a non-zero judgment before calling it in the application program, and it will not affect the normal operation of the program . When a new library version is released in the future, these advanced functions are realized, and the application program does not need to be modified, and these advanced functions can be called by running directly.

4.3 Attribute declaration: alias

GNU C extends an alias attribute, which is very simple and is mainly used to define an alias for a function.
insert image description here
The program running results are as follows.

insert image description here Through the alias attribute declaration, we can define an alias f() for the __f() function. If we want to call the __f() function in the future, we can call it directly through f().
In the Linux kernel, we will find that aliases are sometimes used together with weak attributes. If some functions are upgraded with the kernel version, the function interface has changed, we can encapsulate the old interface name through the alias attribute, and re-name the interface.
insert image description here
If we newly define the f() function in main.c, then when the main() function calls the f() function, it will directly call the newly defined function in main.c; when the f() function is not defined, Then call the __f() function.