stackoverflow:Purpose of memory align（原文+翻译）

原文：https://stackoverflow.com/questions/381244/purpose-of-memory-alignment
翻译：joey

The memory subsystem on a modern processor is restricted to accessing memory at the granularity and alignment of its word size; this is the case for a number of reasons.

现代处理器上的内存系统，都对于内存存取的粒度和是否对齐有所限制，这就是很多理由的源头。

Speed

Modern processors have multiple levels of cache memory that data must be pulled through; supporting single-byte reads would make the memory subsystem throughput tightly bound to the execution unit throughput(aka cpu-bound); this is all reminiscent of how PIO mode was surpassed by DMA for many of the same reasons in hard drives.

现代处理器都有多级缓存，数据必须经过这些缓存。由于要支持单字节的存取，使得内存系统的吞吐量和执行单元（EU）的吞吐量密切相关（又名:cpu-bound）。这让人联想起编程实现IO（PIO）模式是怎样被直接内存存取（DMA）取代的，在许多硬件中也是因为如上相同的理由。

The CPU always reads at its word size(4 bytes on a 32-bit processor), so when you do a unaligned address access -- on a processor that supports it -- the processor is going to read multiple words. The CPU will read each word of memory that your requested address straddles. This causes an amplification of up to 2X the number of memory transactions required to access the requested data.

CPU总是一次读取一个字的数据（对于一个32位的处理器，一个字是4字节，64位则是8字节），所以当你进行一次非对齐的内存存取时——如果处理器支持的话（译注：有的处理器不支持非对齐的内存存取）——处理器会从内存中读取多个字。CPU会读取每个你请求读取的变量横跨过的单元，这就造成了在请求存取指定数据的时候（可以是float，double等），至多会有相较于对齐时2倍的内存单元访问量。

Because of this, it can very easily be slower to read two bytes than four. For example, say you have a struct in memory that looks like this:

正因为这样，读取两个字节很容易就可以比读四个字节慢。比如，假设在内存中有一个如下的结构体：
（译注：比如两个字节是分散在两个不对齐的word里，就要读取两次，而如果四个字节都在一个对齐的word里，就只需要读取一次）

struct mystruct {
    char c; // one byte
    int i; // four bytes
    short s; // two bytes
}

stackoverflow:Purpose of memory align（原文+翻译）

Speed

猜你喜欢