Array storage and pointer study notes (1) data type and storage, data alignment, data migration, typedef

1. Data type and storage

  A type is a set of values ​​and a collection of various operations on the set of values. The same type of data may be stored in different ways on different processor platforms. Different types of data may have different storage methods and calculation rules under the same processor platform. Many people say that pointers are the soul of C language, but I think storage is the essence and soul of C language.
  Get to know the ANSI C keywords and C99/C11 new keywords. Among the keywords, except for some keywords that control the structure of the program, most of them are related to data types and storage.
insert image description here
insert image description here

1.1 Big endian mode and little endian mode

  In a computer, a bit (bit) is the smallest storage unit , usually represented by a capacitor: when charging, a high potential means 1, and when discharging, a low potential means 0. 8 bits form a byte (Byte), and a byte is a computer The most basic storage unit is also the smallest addressing unit . Computers usually address in bytes. Word (Word) represents the binary digits of computer processing instructions or data, and is the unit of computer operations for data storage and data processing . In a 32-bit computer system, 4 bytes usually form a word (Word), which is a storage unit commonly used by software developers . (Note: A word does not always occupy 4 bytes. It is usually related to the system hardware (bus, CPU command word number, etc.). In a 16-bit system (such as 8086 microcomputer) 1 word (word) = 2 bytes (byte) = 16 (bit), in a 32-bit system (such as win32) 1 word (word) = 4 bytes (byte) = 32 (bit), in a 64-bit system (such as win64) 1 word ( word) = 8 bytes (byte) = 64 (bit)).

  • Endianness
      The order in which data of different bytes are stored in memory is called endianness. According to the different byte order, we generally divide the storage mode into big-endian mode and little-endian mode.

    • Big-endian mode: High address stores high-byte data, and low address stores low-byte data.
    • Little endian mode: high address stores low byte data, and low address stores high byte data.
      insert image description here
        Among commonly used processors, ARM, X86, and DSP generally adopt little-endian mode, while processors of IBM, Sun, and PowerPC architecture generally adopt big-endian mode.
        How to judge whether the current platform on which the program is running is in big-endian mode or little-endian mode? Very simple, we only need to assign an integer variable to a character variable, usually "truncation" will occur, and the lower 8-bit bytes will be assigned to a character variable, and we can judge whether it is big endian or small by printing end mode.
    #include <stdio.h>
    
    int main(void)
    {
          
          
    	int a = 0x11223344;
    	char b;
    	b = a;
    	if(b==0x44)
    		printf("Little endian\r\n");
    	else
    		printf("Big endian\r\n");
    	return 0;
    }
    
  • Bit
      sequence Bit sequence refers to the storage order of each bit in a byte of storage. Taking the hexadecimal data 0x78=01111000(B) as an example, there may be two storage methods in the memory. In general, byte order and bit order are in one-to-one correspondence. In little-endian mode, the low-end address stores low-byte data. In a byte, the bit0 address is also used to store the bit0 of this byte. The big-endian mode is the opposite, bit0 is used to store the high-order bits of a byte.
    insert image description here
      Generally speaking, the little-endian mode stores low-byte data at low addresses, which is more in line with human thinking habits; while the big-endian mode is more suitable for computer processing habits: there is no need to consider the correspondence between addresses and data, in units of bytes, The data can be read and written directly according to the order of addresses from low to high from left to right. Big-endian mode is generally used in network byte order and various codecs.
      As an embedded engineer, it is necessary to master the storage methods of big endian mode and little endian mode. In the process of development and transplantation, such as configuration registers, network data transmission, transplantation network, etc., it is necessary to consider the conversion of big and small endian modes. In an embedded software, how to realize the conversion process of big and small endian mode, the sample code is as follows:

#define swap_endian_u16(A) \                       ((A & 0xFF00 >> 8)|(A & 0x00FF << 8))

1.2 Signed and unsigned numbers

  In order to represent negative numbers, the C language introduces the concepts of signed numbers and unsigned numbers, and the keywords signed and unsigned are used to modify the data types respectively. If the variable we defined is not explicitly modified with signed or unsigned, it defaults to a signed number of signed type.
  A character-type signed number, the highest bit bit7 is the sign bit: 0 indicates a positive number, 1 indicates a negative number, and the remaining bits are used to indicate the size. For an unsigned number of character type, all bits are used to represent the size of the number. The range of values ​​that can be represented by signed numbers and unsigned numbers is different. For a character data, the range of values ​​that can be represented by signed numbers is [-128, 127], while the range of values ​​that can be represented by unsigned numbers is [ 0, 255] . Signed and unsigned numbers can be formatted and printed using the %d and %u format characters, respectively.
  A data stored in physical memory can be regarded as a signed number or an unsigned number, depending on how you parse it: if you use the format character %d to print, the printf() function will It treats it as a signed number; if you print it with the unsigned format specifier %u, the printf() function treats it as another number.
As the saying goes, "Looking at mountains is mountains, looking at water is water; looking at mountains is not mountains, and seeing water is not water". As shown in the following program
insert image description hereinsert image description here

  •   When unsigned numbers are stored in computer memory, all bits are used to represent the size of the number , and there is no theory of original code and complement code.
  •   Signed numbers are stored in complement code form . A signed number has original code, inverse code, and complement code.
反码 = 符号位保持不变,所有的数据位取反。
补码 = 反码 + 1
正数的补码 = 原码
负数补码 = 反码 + 1

  Question: 为啥采用补码存储,而不全使用原码?
  Answer: 1. Solved the coding problem of 0. If all the data is coded using the original code, then the codes of +0 and -0 are 00000000 and 10000000 respectively, and a number is represented by two codes, and there is a problem with the code . This problem can be avoided by using complement code. Both +0 and -0 are represented by 00000000, and the empty code 10000000 can represent one more number: -128. It should be noted that the number -128 only has complement code, there is no original code and inverse code .
  2. It can convert the subtraction operation into the addition operation, which saves the implementation of the CPU subtraction logic circuit. The CPU only needs to realize the full adder and the complement circuit to support the addition operation and the subtraction operation at the same time. As shown in the following example,
  under normal calculation, insert image description here
  we change it to an addition operation: 7+(-3), then the subtraction circuit is omitted, and the addition circuit can be used directly.
insert image description here
  During the operation of signed numbers, the sign bit also participates in the operation, and the calculation of other data bits follows the same calculation rules and carry processing. The data represented by the complement code is added, and when the highest bit has a carry, the carry is directly discarded.

1.3 Data overflow

  Each data type has a range of values ​​it can represent.

  • When the unsigned number overflows, a modulo operation will be performed to continue the "periodic cycle".
      For example, an unsigned char type of data can represent a data range of [0, 255]. When it loops to the maximum value of 255, it continues to add 1, and this number becomes 0, and a new cycle starts, and it goes round and round.
    insert image description hereThe result of the program running is as follows:
    insert image description here

  • For signed numbers, when a data overflow occurs, due to the loose syntax of the C language, no security check is performed on the data type, so no exception will be triggered, but an undefined behavior will occur. Undefined behavior, in layman's terms, is that when this situation is encountered, the C language standard does not stipulate how to operate , and each compiler has no reference standard when dealing with this situation, and each handles it in its own way. Compilers are not considered errors . This also leads to indeterminate running results when the signed number overflows, and the results may be different when compiled and run in different compiler environments .

  Therefore, data overflow may cause the program to run differently than expected.
  Methods to prevent data overflow:

  • 1. The case of adding two signed numbers. If the sum of the two positive numbers is less than 0, it means that data overflow occurred during the operation. Similarly, if the sum of two negative numbers is greater than 0, it also means that the data has overflowed.
  • 2. For the addition of unsigned numbers, if the sum of two numbers is less than any one of the addends, we can also judge that data overflow occurred during the calculation process.

1.4 Data type conversion

  There are two types of data type conversion: one is implicit type conversion, and the other is strong type conversion. If the programmer does not perform strong type conversion on the type in the program, the compiler will automatically perform implicit type conversion when compiling the program.
  Implicit type automatic conversion occurs in a C program, mainly in the following situations.

  • When the data types on both sides of the operator in arithmetic operations, logic operations, and assignment expressions are different.
  • During the function call, when the type of the passed actual parameter does not match the type of the formal parameter.
  • When the function return type does not match the type declared by the function.
      When the compiler encounters the above situation, it will automatically convert the data type, that is, implicit type conversion. The conversion rules are generally converted from low precision to high precision, and from signed numbers to unsigned numbers. insert image description here  When a signed and unsigned number are compared, the compiler converts both of them to unsigned.
      One problem needs to be paid attention to in the process of forced type conversion. The value of the data may change during the conversion process: when converting a char type data to int type data, the value remains unchanged, but the storage format has changed, and the char Type data is stored in the lower 8-bit address space of the 32-bit address space, and the remaining upper 24 bits are filled with sign bits. When converting a signed number to an unsigned number , the storage format of the data will not change, but the value will change , because the sign bit of the signed number becomes the data bit of the unsigned number.

2. Data Alignment

2.1 Why must address alignment be necessary?

  The principle of data alignment is that various basic data types in C language should be aligned according to the natural boundary: a charvariable of a type is aligned by 1 byte, an shortinteger variable of a type is sizeof(short int)aligned by a byte, and an intinteger variable of a type is aligned by sizeof(int)a byte align. The number of bytes aligned for each data type is also generally referred to as the alignment modulus.
  Why do you have to address alignment? This is mainly determined by the CPU hardware. Different processor platforms have different management of storage space. In order to simplify the CPU circuit design, some CPUs simplify address access during design and only support boundary-aligned address access. Therefore, the compiler will also select the appropriate one according to the different processor platforms. Address alignment to ensure that the CPU can access these storage spaces normally.

2.2 Structure Alignment

   The basic data types of C language should not only be aligned according to the natural boundaries, but also the composite data types (such as structures, unions, etc.) should also be aligned according to their respective alignment principles.

  • The members in the structure are aligned according to the alignment modulus of their respective data types.
  • The overall alignment of the structure: align according to the size of the largest member or an integer multiple of its size.
      Because each member in the structure must be aligned according to the alignment modulus of its own data type, there will inevitably be "holes" inside the structure, resulting in different sizes of the structure. The fundamental reason why the structure should be aligned is to speed up the CPU access to memory. In terms of specific implementation, the default alignment modulus sizeof(type) of each data type is generally used for alignment.
      If other structures are embedded in the structure, the structure as one of its members should also be aligned according to the alignment modulus of its own type. The alignment modulus of the structure itself is the size of the largest member in the structure, or an integer multiple of its size.

2.3 Union Alignment

  Unions also have their own alignment principles.

  • The overall size of the union: the largest member alignment modulus or an integer multiple of the alignment modulus.
  • The alignment principle of the union: align according to the alignment modulus of the largest member.

  In the process of compiling a C program, whether it is a basic data type or a composite data type, when the compiler allocates address space for each variable, it will perform address alignment according to everyone's default alignment modulus. In addition, we can also explicitly specify the alignment through the #pragma preprocessing command or the aligned/packed attribute declaration of the GNU C compiler.

3. Data portability

  We can use the sizeof keyword to view the size of int type data in memory, compile and run the above program in different compilation environments, and you will find that the running results may be different. In a cross-platform program, sometimes we need a fixed-size storage space, or a fixed-length data type.
  Today's operating systems generally support multiple CPU architectures and multiple processor platforms. In order to achieve cross-platform operation, the operating system generally considers data portability, such as big and small endian storage modes, data alignment, word length, etc. When we are programming, we can isolate and encapsulate the part of the program related to the system and platform in a separate header file or configuration file, so that the portable part and the non-portable part of the whole program become distinct, which is more convenient for subsequent management , maintenance and upgrades.

Fourth, the size_t type in the Linux kernel

  Many variables are defined in the Linux kernel, and various data types are used. In general, they can be divided into three categories.

  • C language basic data types: int, char, short.
  • The data type whose length is determined: long.
  • Data types for specific kernel objects: pid_t, size_t.
      The data type size_t is generally defined using the #define macro, followed by a _t suffix to indicate the data type used in some places in the Linux kernel.
    insert image description here
      The size_t data type is generally used to represent length, size and other irrelevant occasions, such as array index, data copy length, size, etc. Using size_t not only considers the portability of the data type, another advantage of size_t is that its size is not fixed, but is used to represent the maximum length for a certain platform. When we use unsigned size_t to represent the length of an address or data copy, we don't have to worry about whether the value range it represents is enough.

Five, typedef

5.1 Basic usage of typedef

  Using typedefkeywords, you can declare an alias student_t and a structure pointer type for the student student_ptr, and then you can directly use the student_t type to define a structure variable without writing struct, which will make the code more concise.
insert image description here  The running results of the program are as follows:
insert image description here  typedefIn addition to being used in combination with structures, it can also be used in combination with arrays . To define an array, usually use int array[10]; We can also typedefdeclare an array type first, and then use this type to define an array. Declaring an array type array_tand then using that type to define an array has the same arrayeffect . Can also be used in conjunction with pointers . The type is that we use a type to define a variable , which is actually a pointer to a type. Can also be used in conjunction with function pointers. To define a function pointer , we usually use the following form.   In actual programming, it can also be used in combination with enumerations. The combination of enumeration and structure is similar: you can declare a new name for the enumeration type , and then use this type to directly define an enumeration variable.arrayint array[10]
insert image description here  typedefPCHARchar*PCHARstrchar*
insert image description here  typedef
insert image description heretypedeftypedeftypedefcolorcolor_t

5.2 Advantages of typedefs

  • It can make the code more clear and concise.
    insert image description here

  • Increased code portability.
    insert image description here
      If we want to use a 32-bit fixed-length unsigned type data in the code, we can use the above method to declare a U32 data type, and you can safely use U32 in the program. When porting the code to a different platform, it is sufficient to modify this statement directly.

  • Works better than macro definitions.
      The preprocessing directive of C language #defineis used to define a macro, and typedefis used to declare a type of alias. Compared with macros, typedef is not a simple string replacement, but can use this type to define multiple objects of the same type at the same time.
    insert image description here

  • Make complex pointer declarations more concise.
       We can use typedefoptimization: first declare a function pointer type func_ptr_t, and then define an array, it will be clearer and simpler, and the readability will increase a lot. insert image description here   typedefAlso a storage class keyword. typedefSyntactically a storage class keyword. Like common storage class keywords (such as auto、register、static、extern), when modifying a variable, you cannot use more than one storage class keyword at the same time, otherwise the compilation will report an error.
    insert image description here

5.3 Scope of typedefs

   Compared with the global nature of macros, typedef, as a storage class keyword, has scope. Types declared using typedef follow the same scope rules as ordinary variables, including code block scope, file scope, etc.
insert image description here
The macro definition has been replaced in the preprocessing stage and is global, as long as the place where it is referenced is guaranteed to be after the definition. Types declared using typedef follow the same scope rules as ordinary variables.
insert image description here

5.4 Scope of application of typedef

Generally speaking, when encountering the following situations, it may be more appropriate to use typedef, otherwise it may be counterproductive.

  • Create a new data type.
  • Cross-platform specified length type, such as U32/U16/U8.
  • Data types related to the operating system, BSP, and network word width, such as size_t, pid_t, etc.
  • An opaque data type that needs to hide the details of the structure and can only be accessed through a function interface.

Guess you like

Origin blog.csdn.net/qq_41866091/article/details/130576158