Five DMA and Cache consistency

Whole catalog

1. Basic principles of Linux virtual memory, MMU, and paging
2. OOM scoring factor, oom_adj and oom_score
three pages of alloc and free, Buddy algorithm and CMA
four page_fault, memory IO interaction, VSS, LRU
five DMA and Cache consistency

=================================================================================

The role of cache:

  • When the CPU accesses the memory, it first determines whether the content to be accessed is in the Cache. If it is, it is called a "hit". At this time, the CPU directly calls the content from the Cache; otherwise, it is called "missing". The CPU has to go to the memory to call the required subroutines or instructions.
  • The CPU can not only read content directly from the Cache, but also write content directly into it.
  • The access rate of the Cache is quite fast, which greatly improves the utilization of the CPU, which in turn improves the performance of the entire system.
  • Cache consistency is the data in the direct Cache, which is consistent with the data in the corresponding memory.

The role of dma:

  • DMA directly manipulates the bus address, which is treated as a physical address first (system bus address and physical address are only different from the perspective of observing memory). If the memory area of ​​the cache does not include the area allocated by DMA, then there is no consistency problem. But if the cache cache includes the DMA destination address, the consistency will be problematic, because after the DMA operation, the memory data corresponding to the cache cache has been modified, and the CPU itself does not know (DMA transmission does not pass through the CPU). It is still considered that the data in the cache is the data in the memory. When accessing the memory mapped by the Cache in the future, it still uses the old Cache data. In this way, an error of "inconsistency" between Cache and memory data occurs.
    The bus address is the memory seen from the device perspective, and the physical address is the untranslated memory seen from the CPU perspective (the converted virtual address is)

=============================================================

Reasons for memory inconsistency

There are two ways for CPU to write memory:

  1. write through: The CPU writes directly to the memory without going through the cache.
  2. write back: CPU only writes to the cache. The hardware of the cache uses the LRU algorithm to replace the contents of the cache to the memory. Usually this way.

DMA can complete data transfer directly from memory to peripherals. But DMA cannot access the CPU's cache. When the CPU is reading the memory, if the cache hits, it will only read from the cache instead of reading from the memory. When writing to the memory, it may actually not write to the memory, but just write directly to the cache. .

In this way, if the DMA writes data from the peripheral to the memory, the data (if any) in the cache in the CPU is old data. At this time, the CPU hits the cache while reading the memory, which means that the old data is read; When the CPU writes data to the memory, if it only writes to the cache first, the data in the memory is the old data. In both cases (both directions), cache consistency problems exist. For example, when the network card sends a package, the CPU writes data to the cache, and the network card's DMA reads the data from the memory, and the wrong data is sent.
Insert picture description here

How to solve the consistency problem

1. Coherent DMA buffers

The memory required by DMA is requested by the kernel. The kernel may need to remap this memory. The characteristic is that these pages are marked without cache during the mapping. This feature is also stored in the page table.
The above said "may" need to be remapped. If the kernel applies for memory in the highmem mapping area and maps this address to the vmalloc area through vmap, you need to modify the corresponding page table entry and set the page to be non-cached, and if the kernel is from lowmem To apply for memory, we know that this part has been linearly mapped, so there is no need to modify the page table, just modify the corresponding page table entry to be non-cache.
The related interfaces are dma_alloc_coherent() and dma_free_coherent(). dma_alloc_coherent() will pass a device structure to indicate which device to apply for consistent DMA memory. It will generate two addresses, one for the CPU and one for the DMA. The CPU needs to access this memory through the returned virtual address, which is non-cached. As for the internal implementation of dma_alloc_coherent(), it is not necessary to pay attention to it. It is related to how the architecture implements non-cache (such as kseg1 of mips), and may also be related to hardware features (such as whether to support CMA).
Because the memory is cacheable, when DMA reads the memory (memory to device direction), because there may be new data in the cache, the data in the cache must be written back to the memory ( writeback ); in the DMA write memory ( When the device reaches the memory direction), there may still be data in the cache that has not been written back. In order to prevent the cache data from overwriting the content to be written by the DMA, the cache must be invalidated first . Note that the vaddr parameter of this function receives a virtual address.

2. DMA Streaming Mapping

The related interfaces are
dma_map_sg(), dma_unmap_sg(),
dma_map_single(), dma_unmap_single().
Streaming DMA mapping is more complicated to implement, has a shorter life cycle, and disables cache. Some hardware is optimized for streaming mapping. To establish a streaming DMA mapping, you need to tell the direction of the kernel data flow; dma_alloc_writecombine

=============================================================

Reference

You can check the website for details: Linux memory management-DMA and coherent cache

Guess you like

Origin blog.csdn.net/baidu_38410526/article/details/104111786