GNU C Compiler Extended Grammar Study Notes (1) Detailed Explanation of GNU C Special Grammar

1. Specify initialization

1. Array initialization

  In the C language standard, when we define and initialize an array, the common methods are as follows.

int a[10] = {
    
    0,1,2,3,4,5,6,7,8};

  When the array is relatively large and the non-zero elements in the array are not continuous, it is more troublesome to initialize in a fixed order. The C99 standard improves the initialization method of arrays and supports the initialization of specified elements . Through the array element index, we can directly assign a value to the specified array element .

int b[100] = {
    
    [10] = 1, [30] = 2};

  If we want to initialize the array elements of a certain index range in the array , we can use the following method. We use [10...30] to represent an index range, which is equivalent to assigning 1 to 20 array elements between a[10] and a[30].

int b[100] = {
    
    [10...30] = 1,[50...60] = 2};

  GNU C supports the use ...of range extensions. This feature can be used not only in array initialization, but also in switch-case statements, such as the following program.
insert image description here
  In this program, if the same case branch is executed when the case value is 2 to 8, we can case 2 ... 8simplify the code by using the form: There is also a detail that needs to be paid attention to here, that is, ...there must be a space between the data range 2 and 8 at both ends, and it cannot be written in 2...8a form, otherwise a compilation error will be reported .

2. Specify the initialization structure members

  Similar to arrays, in the C language standard, the initialization of structure variables also follows a fixed order, but in GNU C we can specify the initialization of a certain member through the structure field. As shown in the following program.
insert image description here
  When initializing stu2, we adopt the initialization method of GNU C, and through the structure domain names .name and .age, we can directly assign a value to a specified member of the structure variable . When the structure has many members, it is more convenient to use the second initialization method.

3. The benefits of specifying initialization

  Designated initialization is not only flexible to use, but also has the advantage that the code is easy to maintain. If the C standard is used to assign values ​​in a fixed order, when the file_operations structure type changes, such as adding a member, deleting a member, or adjusting the order of members, a large number of C files that use this structure type to define variables need to be readjusted The initialization sequence affects the whole body.
  We can avoid this problem by specifying the initialization method. No matter how the type of the file_operations structure changes, adding members, deleting members, or adjusting the order of members will not affect the use of other files.

2. Macro construction weapon: statement expression

1. Expressions, statements and code blocks

  Expressions and statements are fundamental concepts in the C language. What is an expression? An expression is a formula composed of a series of operators and operands. Operators can be various arithmetic operators, logical operators, assignment operators, and comparison operators stipulated in the C language standard. The operand can be a constant or a variable. An expression can also have no operator, a single constant or even a string is an expression. The following character sequences are expressions.

2 + 3
2
i = 2 + 3
i = i++ + 3
"关注Owl City 点赞收藏不迷路"

  Expressions are generally used to calculate data or implement algorithms for certain functions. Expressions have two basic properties: value and type.
Expressions can be divided into various types, as follows.

  • relational expression
  • logical expression
  • conditional expression
  • assignment expression
  • arithmetic expression

  It is worth noting that different statements, enclosed in braces {}, constitute a code block. C language allows a variable to be defined in a code block, and the scope of this variable is limited to this code block , because the compiler manages the scope of variables according to {} , such as the following program.
insert image description here
  The program running results are as follows.
insert image description here

2. Statement expression

  GNU C has extended the C language standard, allowing statements to be embedded in an expression, allowing local variables, for loops, and goto jump statements to be used inside expressions. This type of expression is called a statement expression.
insert image description here
  Statement expressions are enclosed by parentheses () on the outside, and a pair of curly braces {} inside is a code block, and various statements are allowed to be embedded in the code block. The format of the statement can be a general expression, or a loop or jump statement. The value
   of the statement expression is the value of the last expression in the embedded statement . Let's take a program example.
insert image description here
  In the above program, the cumulative sum from 1 to 10 is realized through the statement expression, because the value of the statement expression is equal to the value of the last expression, so after the for loop, we need to add An s; statement represents the value of the entire statement expression. If you don't add this sentence, you will find that sum=0. Or if you change this line of statement to 100;, you will find that the value of the final sum becomes 100, because the value of the statement expression is always equal to the value of the last expression.

3. Statement expression in macro definition

  The main use of statement expressions is to define macros with complex functions. Using statement expressions to define macros can not only realize complex functions, but also avoid ambiguity and loopholes caused by macro definitions.
Ask a question: 请定义一个宏,求两个数的最大值.

  entry-level writing

#define MAX(x,y) x > y ? x : y

  This is a very basic way of writing, but there are problems. In actual use, the output results will not meet expectations due to operator precedence issues . Do the following test.
insert image description here
  Test the statement on line 4. When the parameter of the macro is an expression, it is found that the actual running result is max=0, which is different from our expected result of max=1. This is because, after the macro is expanded, it becomes as follows.
insert image description here
  Because the comparison operator > has a priority of 6, which is greater than ! = (priority is 7), so in the expanded expression, the order of operations has changed, and the result is not the same as expected.

  Middle-level writing
  In order to avoid the expansion error above, we can add a parenthesis () to the parameter of the macro to prevent the operation order of the expression after expansion from changing.

#define MAX(x,y) (x) > (y) ? (x) : (y)

  Use parentheses to wrap the macro definition, so as to avoid breaking the operation order of the entire expression when an expression contains both macro definitions and other high-priority operators.

  The macro written in the good level writing method
  solves the problem caused by the operator priority, but there are still some loopholes. For example, we use the following code to test our defined macro. insert image description here
  In the program, we define two variables i and j, then compare the size of the two variables, and perform self-increment operations. The actual running results found that max=7, instead of the expected result max=6. This is because the variables i and j perform two self-increment operations after macro expansion, resulting in the printed value of i being 7.
  At this point, statement expressions should come into play. We can use a statement expression to define this macro , define two temporary variables in the statement expression to temporarily store the values ​​of i and j respectively, and then use the temporary variables for comparison, thus avoiding two self-increment and self-decrement question.

#define MAX(x,y) ({
      
       \
	int _x = y; \
	int _y = y; \
	_x > _y ? _x : _y; \
})

  In the above statement expression, we define two local variables _x, _y to store the values ​​of macro parameters x and y, and then use _x and _y to compare the size, thus avoiding the 2 caused by i and j Sub-increment operation problem.

  Excellent-level writing
  In the good-level macro, the data type of the two temporary variables we define is int type, and only two integer data can be compared. Then for other types of data, you need to redefine a macro, which is too troublesome! We can continue to modify based on the above macro, so that it can support any type of data comparison size.

#define MAX(type,x,y) ({
      
       \
	type _x = y; \
	type _y = y; \
	_x > _y ? _x : _y; \
})

In the above macro, we add a parameter type to specify the type of temporary variables _x and _y. In this way, when we compare the size of two numbers, we can compare any type of data as long as the types of the two data are passed to the macro as parameters.

  Top-level optimized writing
  In the excellent-level macro definition, we added a type parameter to adapt to different data types. At this moment, for the sake of salary, we should try to further optimize and omit this parameter. How to do it? Just use typeof. typeof is a new keyword in GNU C, which is used to get the data type. We don’t need to pass parameters in, let typeof get it directly!

#define MAX(x,y) ({
      
       \
	typeof(x) _x = y; \
	typeof(y) _y = y; \
	(void) (&_x == &_y); \
	_x > _y ? _x : _y; \
})

  In the macro definition above, we used the typeof keyword to automatically obtain the types of the two parameters of the macro. The more difficult thing to understand is (void) (&x==&y); this sentence seems redundant, but if you analyze it carefully, you will find that this sentence is very interesting. It has two functions:

  • One is used to prompt a warning to the user. For different types of pointer comparisons, the compiler will issue a warning, prompting that the types of the two data are different.
    insert image description here
  • The second is that two numbers are compared, but the result of the operation is not used. Some compilers may give a warning. After adding a (void), this warning can be eliminated.

3. typeof and container_of macros

1. typeof keyword macro

  GNU C extends a keyword typeofused to obtain the type of a variable or expression. The argument to typeof takes two forms: expression or type .
insert image description here
  In the above code, because ithe type of the variable is int, it typeof(i)is equal to int, typeof(i) j=20which is equivalent to int j=20, typeof(int*) a; equivalent to int*a, f()the return value type of the function is int, so typeof(f()) k; is equivalent to int k;.
  typeofadvanced usage of .
insert image description here

2. The container_of macro in the Linux kernel

  Analyze the first macro of the Linux kernel: container_of. This macro is widely used in the Linux kernel.
insert image description here
  As the first macro in the kernel, it includes the comprehensive use of the extended features of the GNU C compiler. There are macros in the macro, and sometimes we have to admire the sharp design of the kernel developers.
  Its main function is to obtain the first address of the structure according to the address of a member of the structure. According to the macro definition, we can see that this macro has three parameters: type is the structure type, member is the member in the structure, and ptr is the address of the member member in the structure. That is to say, if we know the type of a structure and the address of a member in the structure, we can get the first address of the structure. The container_of macro returns the first address of this structure.
  Application scenario : In the kernel, we often encounter this situation: the parameter we pass to a certain function is a member variable of a certain structure, and in this function, other member variables of this structure may also be used , so how to do it? This is what container_of does. Through it, we can first find the first address of the structure, and then access other member variables through member access of the structure.
insert image description here
  In this program, we define a structure variable stu, know the address of its member variable math: &stu.math, we can directly obtain the first address of the stu structure variable through the container_of macro, and then we can access the other stu variables Members stup->age and stup->num.

3. Implementation analysis of container_of macro in Linux kernel

  For a structure data type, in the same compilation environment, the offset of each member relative to the first address of the structure is fixed. Therefore, when you know the address of the structure member, you can directly use the address of the structure member and subtract the offset of the member in the structure to get the first address of the structure. As the following program is implemented
insert image description here
  in the above program, we did not directly define the structure variable, but converted the number 0 into a constant pointer pointing to the structure type of student through mandatory type conversion, and then printed each value pointed to by the constant pointer. member address . The running results are as follows.
insert image description here
  Because the value of the constant pointer is 0, it can be regarded as the first address of the structure is 0, so the address of each member variable in the structure is the offset of the member relative to the first address of the structure. The implementation of the container_of macro is implemented using this technique.
  With the above foundation, it is relatively simple for us to analyze the implementation of the container_of macro. From a syntactic point of view, the implementation of the container_of macro consists of a statement expression. The value of the statement expression is the value of the last expression. insert image description here
  The meaning of the last sentence is to take the address of a member of the structure, subtract the offset of this member in the structure type, and the result of the operation is the first address of the structure type. Because the value of the statement expression is equal to the value of the last expression, this result is also the value of the entire statement expression, container_of will finally return this address value to the caller of the macro.
  So how to calculate the offset of a member of the structure within the structure? The offset macro is defined in the kernel to achieve this function, let's see its definition.
insert image description here
  This macro has two parameters, one is the structure type TYPE, and the other is the member MEMBER of the structure TYPE. The technique it uses is the same as that of calculating the offset of the zero-address constant pointer above. Forcibly convert 0 to a structure constant pointer pointing to TYPE type, and then access the member through this constant pointer to obtain the address of the member MEMBER, whose size is numerically equal to the offset of the MEMBER member in the structure TYPE.
  The member data type of the structure can be any data type. In order to make this macro compatible with various data types, we define a temporary pointer variable __mptr, which is used to store the address of the structure member MEMBER, that is, to store the parameters in the macro The value of ptr. How to obtain the ptr pointer type, you can use the following method.
insert image description here  We know that the macro parameter ptr represents the address of a structure member variable MEMBER, so the type of ptr is a pointer to the MEMBER data type. When we use the temporary pointer variable __mptr to store the value of ptr, we must ensure that _ The pointer type of _mptr is the same as ptr, which is a pointer variable pointing to MEMBER type. The typeof(((type
)0)->member) expression uses the typeof keyword to obtain the data type of the structure member MEMBER, and then uses this type, through typeof(((type )0)->
member) __mptr this A program statement can define a pointer variable pointing to this type.
  At the end of the statement expression, because the first address of the structure is returned, the entire address must be converted to TYPE
, that is, a pointer to the TYPE structure type is returned.

4. Zero-length array

  Zero-length arrays and variable-length arrays are all array types supported by the GNU C compiler.

1. Definition

   A zero-length array is an array of length 0. A zero-length array is defined as follows.

int a[0];

   One peculiarity of a zero-length array is that it does not occupy memory storage space. The length size is 0. The zero-length array is generally rarely used alone, and it is often used as a member of the structure to form a variable-length structure.
   As shown in the following code.
insert image description here
   The zero-length array also does not occupy storage space in the structure, so the size of the buffer structure is 4.

2. Example of using a zero-length array

  Zero-length arrays are often used in some special applications in the form of variable-length structures. In a variable-length structure, the zero-length array does not occupy the storage space of the structure, but we can use the member a of the structure to access the memory , which is very convenient. An example of using a variable-length structure is as follows.
insert image description here
  In this program, we use malloc to apply for a piece of memory with a size of sizeof(buffer)+20, which is 24 bytes. Among them, 4 bytes are used to indicate the length of the memory 20, and the remaining 20 bytes are the memory space that we can actually use. We can directly access this memory through the structure member a.

3. Zero-length arrays in the kernel

  Zero-length arrays generally appear in the form of variable-length structures in the kernel. Let's analyze the application in the variable-length structure kernel USB driver. In the network card driver, everyone may be familiar with a name: socket buffer, that is, Socket Buffer, used to transmit network data packets. Similarly, in the USB driver, there is also a similar thing called URB, whose full name is USB Request Block, that is, the USB request block, which is used to transmit USB data packets.
insert image description here
  This structure defines the transmission direction, transmission address, transmission size, transmission mode, etc. of the USB data packet. We don't go into these details, just look at the last member.
insert image description here
  A zero-length array is defined at the end of the URB structure, which is mainly used for USB synchronous transmission. USB has 4 transfer modes: interrupt transfer, control transfer, bulk transfer and isochronous transfer. Different USB devices have different requirements on transmission speed and transmission data security, and adopt different transmission modes. USB cameras have high requirements for real-time transmission of video or images, and don't care much about frame loss of data. It doesn't matter if one frame is lost, and then it can be downloaded. So the USB camera adopts the USB synchronous transmission mode.
  USB cameras generally support multiple resolutions, ranging from 16*16 to high-definition 720P formats. For video transmission with different resolutions, the size of a frame of image data is different, and the requirements for the size and number of USB transmission data packets are also different. So how should USB be designed to adapt to the data transmission requirements of different sizes without affecting other USB transmission modes? The answer lies in this zero-length array inside the structure.
   When users set video formats with different resolutions, USB uses data packets of different sizes and numbers to transmit a frame of video data. This variable-length structure composed of zero-length arrays can meet this requirement. The USB driver can flexibly apply for memory space according to the size of a frame of image data to meet data transmission of different sizes. And this zero-length array does not occupy the storage space of the structure. When the USB uses other modes of transmission, it will not be affected in any way, and it is completely possible to assume that this zero-length array does not exist. So I have to say, this design is still very ingenious.

4. Thinking: pointers and zero-length arrays

   为什么不使用指针来代替零长度数组?
  We often see that when an array name is passed as a function parameter, it is equivalent to a pointer. Note that we must not be confused by this sentence: when the array name is passed as a parameter, it is indeed an address, but the array name is by no means a pointer, and the two are not the same thing. The array name is used to represent the address of a continuous memory space , and the pointer is a variable, and the compiler must allocate a separate memory space for it to store the address of the variable it points to . As shown in the following program.
insert image description here
  The running results are as follows.
insert image description here
  For a pointer variable, the compiler must allocate a separate storage space for this pointer variable, and then store the address of another variable in this storage space, we say that this pointer points to this variable. For the array name, the compiler will not allocate a separate storage space for it. It is just a symbol, which is used to represent an address just like the function name.
So why choose to use zero-length arrays instead of pointers is as follows:

  • The pointer itself occupies storage space, and the zero-length array does not occupy storage space
  • According to the case analysis of the USB driver above, you will find that it is far less ingenious than zero-length arrays: zero-length arrays will not cause redundancy in structure definitions, and it is very convenient to use.

Guess you like

Origin blog.csdn.net/qq_41866091/article/details/130542873