Will memset cause a large chunk of memory to be cached?

In the Arm architecture, we know that most normal memory configurations are write allocation and read allocation. That is, when writing a piece of memory or reading a piece of memory, if a miss occurs, the physical memory will be cached in the cache. . Then this brings me to this thought. If I execute memset(a, b, len), len is a large number, that is, clearing 0 for a large piece of memory, then this large piece of memory data (at this time is all 0 data) need to be cached in the cache? Doesn't this cause a waste of cache? The cache is full in no time?

In the previous micro-architecture, this problem may have really existed. Then in recent years' microarchitecture, it can be solved through "Write streaming mode". The specific introduction is as follows. We will take the Cortex-A720 as an example to further explain.

The Cortex-A720 core supports Write streaming mode, sometimes called read allocation mode, for both L1 and L2 caches.
On a read miss or write miss, a cache line is allocated to the L1 or L2 cache. However, writing large chunks of data can fill the cache with unnecessary data. Not only does this waste power, it also reduces performance because the entire line will be overwritten by subsequent writes (e.g. using memset() or memcpy()). In some cases, cache lines do not need to be allocated on write. For example, when executing the C standard library's memset() function to zero out a large chunk of memory to a known value.
To prevent unnecessary cache line allocations, the memory system detects when the core has written a complete sequence of cache lines. If this is detected on a configurable number of consecutive line fills, it switches to write streaming mode.

In write streaming mode, load operations behave as normal and may still cause line filling.
The write still looks in the cache, but if it misses, it is written to the L2 or L3 cache without initiating line filling of L1.

The CHI master or AXI master interface may observe more than a specified number of line fills before the memory system switches to write streaming mode.

Write streaming mode remains enabled until one of the following conditions occurs:
• A cacheable write burst that is not a full cache line is detected.
• There is a subsequent load operation whose target is the same as the outstanding write stream.

When the Cortex-A720 core switches to write streaming mode, the memory system continues to monitor bus traffic. When it observes a complete sequence of cache line writes, it signals the L2 or L3 cache to enter write streaming mode.
The write stream threshold defines the number of consecutive cache lines that can be written before storage operations cease causing cache allocation. You can configure the write stream threshold for each cache (L1, L2 and L3) by writing to the register IMP_CPUECTLR_EL1.

Guess you like

Origin blog.csdn.net/weixin_42135087/article/details/133560137