Why do you need a simple analysis memory alignment and justification rules

Ubuntu 16.04.2 compiler environment arm-linux3.4.5 linux-2.6.22.6

First need to know is: CPU takes out the data from the memory or instruction, not an imaginary one byte taken splicing, but according to their word length, that is, the data length of a CPU can process the extracted memory block , such as 32-bit 32-bit processor is taken out four byte blocks of memory for processing. Here's a question: is only two bytes how to do? The answer is removed or four bytes, then the processor memory will help complete the data transfer in the pick to the CPU.

In short, CPU will be its "most comfortable" data length to read the memory data, which will lead to yet another question:

If there is a 4-byte length instruction is ready to be read into the CPU, there will be two cases occurs:

    1,4-byte starting address in the address just read by CPU, in this case, one can put the CPU instruction read out, and executed, as follows memory

     2, when 4 bytes according to the following distribution As shown in FIG.

   

 

Suppose also fetch data from the same CPU address, to fetch the first quadlet data of the obtained 1,2 bytes, but the data does not meet the required number, ah, the CPU will continue in subsequent memory value, this is taken to the back of a 4-byte units derived 3, 4 bytes of data to get to the front, and assembled into a complete 2-byte data, but this two memory read operations, compared with the first direct out more than one operation, a quick look at seems to be one more little impact, but considering the CPU to do a lot of data and computing operations, if this happens a lot, then, will make a non-negligible CPU the amount of "extra action" seriously affect processing speed.

 

Therefore, the system needs to perform memory alignment, thereby increasing the CPU processing speed, and this task is given to the compiler to make the appropriate address allocation and optimization, the compiler will be provided according to the corresponding memory alignment parameter or target environment. (Of course, there are reasons for aligned memory hardware aspects, some of the hardware to read a predetermined address, the instruction address if the addresses do not match, and its provisions, the hardware may crash like situation is unknown)

 

The memory alignment is divided into natural alignment and regular alignment .

Wherein the natural alignment refers to the type of a variable corresponding to address values ​​into the corresponding memory space, i.e., the data to be stored in the address of its data type is a multiple of the data according to its type. E.g. Type 1 byte char space, all numbers are multiples of 1, it can be allowed to be placed at any address, int is 4 bytes space to address multiple of 4 have like 0,4,8 . The compiler will preferentially natural alignment data in accordance with address assignment.

 

In the structure of an example rule is aligned in the natural alignment, the compiler will generate natural alignment voids filled invalid data memory, and after filling structures occupy memory space within a structure representing the maximum integer data memory space variable type member times. Next, the alignment rules mentioned above will be explained:

(Note: No special instructions are 32-bit environment using a specified environment compiler gcc -m32)

First, the following test code are

#include <stdio.h>
int main(int argc,char** argv)
{
    printf("%d",sizeof(struct name));
    return 0;
}

The output is the memory space occupied structure.

 

 

1. First, look at this structure

typedef struct test_32
{
	char a;
	short b;
	short c;
	char d;
}test_32;

First, in accordance with the natural alignment, to obtain the position below the profile memory (a first grid address 0, is incremented back, following the same test)

 

Then aligned in accordance with the rules of the compiler will fill invalid data blank, and finally the resulting structure of this much memory space is 8 bytes, 2 bytes of data of this value is the largest integer multiple of type short, the program is compiled , the result obtained is 8 bytes.

 

 

2. If a look a little change of location of the structure

typedef struct test_32
{
	char a;
	char b;
	short c;
	short d;
}test_32;

Also according to the distribution of natural alignment below:

 

Can be seen in a natural alignment, no gap occurs between the variables, so the rules do not have to fill alignment, but here there are six color squares, which is 6 bytes, aligned in accordance with the rules, this configuration is 6 bytes an integer multiple of the maximum data body type short, so this structure is a 6-byte, the latter blank and ignore it, may actually compile. It runs, and consistent with the results of the analysis of 6 bytes.

 

 

From the above two examples can be substantially aligned memory know the specific circumstances, there is a need to add the case, the case is double, we know that 32-bit processor can only handle 32 is four bytes of data, and double 8-byte data type, which is how to deal with it? If a 64-bit processor, the first 8 bytes of data can be processed, and in the 32-bit processors, to be able to handle double8 byte data, while the processing will be split into two to double the number of bytes 4 processing, here will be a situation as follows:

typedef struct test_32
{
	char a;
	char b;
	double c;
}test_32;  

This structure is in the 32-bit memory space is occupied by 12 bytes, while in the 64-bit environment share memory space is 16 bytes, the above-described reason is due to the different processing mode, only the lower 32-bit split into two 4-byte processing, so there will determine the maximum alignment rule data type of the structure is 4 bytes, the total length is an integral multiple of 4 bytes, i.e. 12 bytes. 64 determines the maximum of 8 bytes, so the result is an integral multiple of 8 bytes: 16 bytes. Here the structure is not placed in a natural double align to a multiple of 8 bytes at addresses theory, I believe that there are also the rules compiler aligned accordingly optimized, saving extra 4 bytes. This part they can test their own analysis in accordance with the above rules.

 

(Note: The following hardware-related content and relatively large, aligned relationship with the content of the memory is not large, can not see)

Under further expand the alignment for the C language is applicable, but for compilation, as the compilation of the user to control address more transparent, basically can be called arbitrary, so can only be aligned as much as possible, in general, will be normal when aligned, but I have encountered startup code written in assembly in the arm architecture, which calls the printf function to do the test in assembly code, similar to the use of

.ascii  "Hello ARM!\0"

The embodiment defines a string, and is placed in front of reference numerals, in order to pass the printf function as a parameter, and after this command is the main function of my bl main jump instruction,

It found that the actual test system crash, and said output is normal. After disassembly found that as the string is DAMA address assignment, leading to subsequent instructions can not aligned,

Remember when memory is not a multiple of 4, know add .align 4 Align directive in front of it after this case would solve the problem.

So for some hardware or memory alignment it is very important.

Also remind themselves also have to consider these aspects of the thinking underlying the preparation of the program, write code that will be optimized, closer to the perfect conclusion to keep learning - ???

 

 

Welcome to exchange, discuss or correct me! Common progress!

Published 19 original articles · won praise 7 · views 6938

Guess you like

Origin blog.csdn.net/G_METHOD/article/details/79535178