I / O registers and conventional memory

Regardless of the strong similarity between hardware registers and memory, to access I / O registers of the programmer must be careful to avoid CPU (or compiler) teasing optimization, it may be desirable to modify the I / O behavior.

I / O registers and the main difference is the RAM I / O operations have side effects, and the memory operation is not: only effect a memory storing a value is written to a location, and returns a memory read latest value written because there. memory access speed of the CPU performance is critical, such a case where no marginal effect has been optimized in many ways: the value is cached, and the read / write command is re-organized.

Compiler cache data value written to memory without CPU registers, and even if it stores them, read and write operations can be performed without physical contact in the buffer memory RAM. Re-arrangement may compiler level and at the hardware level in takes place: usually a sequence of instructions can be executed more quickly if it occurs in a different order than the program text is performed, for example, to avoid pipeline interlocks in the RISC processor, CISC, it takes a considerable amount of time. the operation can be executed concurrently with other, faster.

When applied to a conventional memory (at least in a single-processor system) These optimizations are transparent and useful, but they may correct for I / O operations are fatal because they interfere with those "marginal effect", which is the main the reason why a driver to access I / O register. the processor unforeseen this case, some other operating (running on a separate processor, or occurs in a I / O controller things) sequence-dependent memory accesses the compiler or the CPU may just try to beat you and you re-arrange the operation request; the result may be a strange error very difficult to debug therefore, a driver must ensure that no buffering and no read or write occurs when accessing the register. heavy orchestration.

Buffer hardware problem is most likely to face: the underlying hardware is already configured (either automatically or by Linux initialization code) to inhibit any hardware buffer when accessing I / O area (memory or whether they are port regions).

Of hardware and compiler optimizations weight solution arrangement is placed between a memory barrier must be visible in a particular order of hardware (or another processor) to provide 4 Linux operating macro need to deal with possible orderings:

#include <linux/kernel.h> void barrier(void)

This function tells the compiler inserts a memory barrier but has no effect on the hardware. The compiled code and changes all current values stored in the resident memory of the CPU registers, and then re-read them when needed. Barriers call prevent the compiler optimization across the barrier, and left free to make its hardware heavy orchestration.

#include <asm/system.h> void rmb(void);

void read_barrier_depends(void);

void wmb(void); void mb(void);

These functions insert hardware memory barrier instruction stream compiled; their practical example is platform a rmb (read memory barrier) ensure that any read-out of the front barrier being completed prior to any subsequent read wmb guaranteed write operation. order and instructions to ensure mb. each of these instructions are a superset of the barrier.

read_barrier_depends is a special read barrier, weaker form while rmb blocking all across the barrier heavy schedule read, read_barrier_depends only block dependent weight Arrangement read data from another read of. the difference is slight, and it is not all existing in the system. unless you understand exactly what to do, and you have reason to believe that a complete reading is indeed a barrier excessive performance overhead, you should probably stick with rmb.

void smp_rmb(void);

void smp_read_barrier_depends(void); void smp_wmb(void);

void smp_mb(void);

These versions barriers to insert the hardware barrier only when the kernel is compiled SMP systems; otherwise, they are extended to a simple barrier call.

Use In a typical device driver in a memory barrier may have the form:

writel(dev->registers.addr, io_destination_address); writel(dev->registers.size, io_size);

writel(dev->registers.operation, DEV_READ); wmb();

writel(dev->registers.control, DEV_GO);

In this case, it is important to ensure that all of the control device registers a special operation before it began to be told in the right setting. Order to force the write memory barrier needed to complete.

Because memory barriers affect performance, they should only be used where really need them. Different types of barriers also have different performance characteristics, it is worth using the most specific possible types. For example, on the x86 architecture, wmb () currently does nothing do not be re-written because the choreography external processor. However, reading is re-organized, so mb () is wmb () slow.

Most noteworthy of the other processing cores synchronization primitives, such as spin locks and operation _t atoms, a function as a memory barrier is also worth noting that some of the peripheral bus (e.g., PCI bus) have their own buffering problems; we discuss them in later chapters when encountered.

Some systems allow an efficient combination of an assignment and a memory barrier kernel provides several macros to complete the composition; by default, they are defined as follows:

#define set_mb(var, value) do {var = value; mb();} while 0

#define set_wmb(var, value) do {var = value; wmb();} while 0

#define set_rmb(var, value) do {var = value; rmb();} while 0

Where appropriate, <asm / system.h> system using these macros to define specific instruction to quickly complete the task. Set_rmb Note that only a small amount defined in the system. (A do ... while using a standard structure is C terminology to make the expanded macro statement as a normal C can be operated in all contexts).

I / O registers and conventional memory

Guess you like