Cache coherency

reference

Overview

In a shared memory multiprocessor system with a separate cache memory for each processor, it is possible to have many copies of shared data: one copy in the main memory and one in the local cache of each processor that requested it. When one of the copies of data is changed, the other copies must reflect that change. Cache coherence is the discipline which ensures that the changes in the values of shared operands (data) are propagated throughout the system in a timely fashion.[1]

The following are the requirements for cache coherence:[2]

Write Propagation

Changes to the data in any cache must be propagated to other copies (of that cache line) in the peer caches.

For shareable memory allocation, if a core changes it, all cores/requestors must see this change.

Transaction Serialization

Reads/Writes to a single memory location must be seen by all processors in the same order.

For the read and write of the same shareable memory location, the read and write order seen by all cores is the same. When reading after writing, the data read is the last written data.

The following paragraph introduces write propagation:

In a multiprocessor system, consider that more than one processor has cached a copy of the memory location X. The following conditions are necessary to achieve cache coherence:[4]

In a read made by a processor P to a location X that follows a write by the same processor P to X, with no writes to X by another processor occurring between the write and the read instructions made by P, X must always return the value written by P.
In a read made by a processor P1 to location X that follows a write by another processor P2 to X, with no other writes to X made by any processor occurring between the two accesses and with the read and write being sufficiently separated, X must always return the value written by P2. This condition defines the concept of coherent view of memory. Propagating the writes to the shared memory location ensures that all the caches have a coherent view of the memory. If processor P1 reads the old value of X, even after the write by P2, we can say that the memory is incoherent.

Consistency is the premise of coherency. The explanation is as follows:

Therefore, in order to satisfy Transaction Serialization, and hence achieve Cache Coherence, the following condition along with the previous two mentioned in this section must be met:

Writes to the same location must be sequenced. In other words, if location X received two different values A and B, in this order, from any two processors, the processors can never read location X as B and then read it as A. The location X must be seen with values A and B in that order.[

Coherence mechanisms

The two most common mechanisms of ensuring coherency are snooping and directory-based, each having their own benefits and drawbacks. Snooping based protocols tend to be faster, if enough bandwidth is available, since all transactions are a request/response seen by all processors. The drawback is that snooping isn't scalable. Every request must be broadcast to all nodes in a system, meaning that as the system gets larger, the size of the (logical or physical) bus and the bandwidth it provides must grow. Directories, on the other hand, tend to have longer latencies (with a 3 hop request/forward/respond) but use much less bandwidth since messages are point to point and not broadcast. For this reason, many of the larger systems (>64 processors) use this type of cache coherence.

Snooping[edit] Bus snooping - Wikipedia

Main article: Bus snooping

First introduced in 1983,[7] snooping is a process where the individual caches monitor address lines for accesses to memory locations that they have cached.[4] The write-invalidate protocols and write-update protocols make use of this mechanism.

For the snooping mechanism, a snoop filter reduces the snooping traffic by maintaining a plurality of entries, each representing a cache line that may be owned by one or more nodes. When replacement of one of the entries is required, the snoop filter selects for the replacement of the entry representing the cache line or lines owned by the fewest nodes, as determined from a presence vector in each of the entries. A temporal or other type of algorithm is used to refine the selection if more than one cache line is owned by the fewest nodes.[8]

Types of snooping protocols[edit]

There are two kinds of snooping protocols depending on the way to manage a local copy of a write operation:

Write-invalidate[edit]

When a processor writes on a shared cache block, all the shared copies in the other caches are invalidated through bus snooping. This method ensures that only one copy of a datum can be exclusively read and written by a processor. All the other copies in other caches are invalidated. This is the most commonly used snooping protocol. MSI, MESI, MOSI, MOESI, and MESIF protocols belong to this category.

Write-update[edit]

When a processor writes on a shared cache block, all the shared copies of the other caches are updated through bus snooping. This method broadcasts a write data to all caches throughout a bus. It incurs larger bus traffic than write-invalidate protocol. That is why this method is uncommon. Dragon and firefly protocols belong to this category.[3]

Directory-based[edit]

Main article: Directory-based cache coherence

In a directory-based system, the data being shared is placed in a common directory that maintains the coherence between caches. The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory to its cache. When an entry is changed, the directory either updates or invalidates the other caches with that entry.

A snoop filter determines whether a snooper needs to check its cache tag or not. A snoop filter is a directory-based structure and monitors all coherent traffic in order to keep track of the coherency states of cache blocks. It means that the snoop filter knows the caches that have a copy of a cache block. Thus it can prevent the caches that do not have the copy of a cache block from making the unnecessary snooping. There are three types of filters depending on the location of the snoop filters. One is a source filter that is located at a cache side and performs filtering before coherence traffic reaches the shared bus. Another is a destination filter that is located at receiver caches and prevents unnecessary cache-tag look-ups at the receiver core, but this type of filtering fails to prevent the initial coherence message from the source. Lastly, in-network filters prune coherence traffic dynamically inside the shared bus.[5] The snoop filter is also categorized as inclusive and exclusive. The inclusive snoop filter keeps track of the presence of cache blocks in caches. However, the exclusive snoop filter monitors the absence of cache blocks in caches. In other words, a hit in the inclusive snoop filter means that the corresponding cache block is held by caches. On the other hand, a hit in the exclusive snoop filter means that no cache has the requested cache block

in conclusion

Cache coherency means that the data seen by all cores is consistent. It has two aspects: 1) the data written by one core must be seen by other cores; 2) the read and write order seen by all cores is the same.

There are two implementation mechanisms: 1) snoop; 2) directory based (snoop filter) . In fact, they are all snoop, but the latter filters snoop.

There are two types of snoop: 1) write invalid; 2) write update. The former is relatively simple and used more.

Cache coherency

Cache coherency

Overview

Coherence mechanisms

Snooping[edit] Bus snooping - Wikipedia

Directory-based[edit]

in conclusion

Guess you like