Common caching algorithms and caching strategies

Cache algorithm : The cache method improves the hit rate of cached content through well-designed algorithms such as data partitioning, prefetching, sequential prefetching, and cache replacement. Caching algorithms can be divided into access time-based strategies, access frequency-based strategies, strategies for both access time and frequency, and time-distance distribution strategies.

Caching strategy: There are three main aspects of caching strategy:

  • what to cache
  • When to cache
  • How to replace when the cache space is full, that is, the cache replacement algorithm.

For the second aspect, most cache algorithms use prefetch strategy to put some disk data into the cache in advance to further reduce disk I/O and increase cache hit rate. By recording and analyzing past data request patterns, it predicts the data segments that may be requested in the future, and puts the data segments with high access possibility into the cache.

Data block segmentation for caching strategy:

Header caching and block caching strategies are commonly used in VoD movie files. The header cache puts the first part of the movie file into the cache to reduce the start-up delay of the on-demand user, and the access to other parts of the movie file needs to read the disk directly.

The block cache divides the movie file into small blocks, and performs the cache operation in block units. The block cache is divided into fixed-length blocks and variable-length blocks. The fixed-length block divides the file into blocks of the same size, and the variable-length block. The variable-length algorithm is based on the inference that the later part of the movie file has a lower probability of being accessed. The file is divided into blocks according to the first and last positions. The size increases exponentially.

However, the above fixed-length and variable-length blocks ignore two points:

  • There will be some "hot clips" in the movie file, and these hot clips are not all at the beginning of the movie
  • The popularity of "hot clips" in the same movie will change over time, and the popularity of different movies will also change over time.

A well-designed algorithm is required to adapt to the different locations and changes of video hotspots.

Classification of caching strategies:

Due to the different data access patterns of different systems, it is difficult for the same caching strategy to achieve satisfactory performance under various data access patterns. Researchers propose different caching strategies to meet different needs.

Caching strategies can be divided into the following categories:

  • Based on access time: This type of algorithm organizes the cache queue according to the access time of each cache item, and decides to replace the object. Such as LRU.
  • Based on access frequency: This type of algorithm organizes the cache by how often cache items are accessed. Such as LFU, LRU-2, 2Q, LIRS.
  • Consideration of access time and frequency: By taking into account the access time and frequency, the cache strategy still has better performance when the data access mode changes. Such as FBR, LRFU, ALRFU. Most of these algorithms have an adjustable or adaptive parameter, through which the caching strategy can achieve a certain balance based on access time and frequency.
  • Based on access mode: Some applications have clear data access characteristics, and then generate appropriate caching strategies. For example, the A&L cache strategy specially designed for VoD system, and the SARC strategy which adapts to the random and sequential access modes at the same time.
  • Cache strategy based on access time: LRU (LeastRecentlyUsed) is a widely used cache algorithm. The algorithm maintains a queue of cache items, and the cache items in the queue are sorted by the last access time of each item. When the cache space is full, it will be at the end of the queue, that is, delete the item whose last accessed time is the longest, and put the new section at the head of the queue. However, the LRU algorithm only maintains the access time information of the cache block, and does not consider the access frequency and other factors, and cannot obtain the ideal hit rate in some access modes. For example, for VoD system, in the absence of VCR operation, data is accessed sequentially from front to back, and the accessed data will not be accessed again. Therefore, the LRU algorithm will replace the latest accessed data last, which is not suitable for VoD systems.
  • Cache strategy based on access frequency: LFU (Least Frequently Used) sorts the blocks in the cache according to the access frequency of each cache block. When the cache space is full, the item with the lowest access frequency in the cache queue is replaced. Similar to the shortcomings of LRU, LFU only maintains the access frequency information of each item. For a cache item, if the item has a very high access frequency in the past and a low access frequency recently, it is difficult to access the item when the cache space is full. are replaced from the cache, resulting in a drop in hit rate.
    The LRU-2 [2, 3] algorithm records the time when each cached page was accessed last two times. When replacing a page, replace the item with the longest time since the penultimate visit. In the IRM (IndependentReferenceModel) access mode, LRU-2 has the best expected hit rate. Since the LRU-2 algorithm needs to maintain a priority queue, the complexity of the algorithm is logN (N is the number of cache items in the cache queue) .
    2Q[4] (2 Queues) replaces the priority queue in LRU-2 with LRU queue, reduces the time complexity of the algorithm from logN to a constant, and the space required to maintain the information of cache items is smaller than that of LRU-2.
    LIRS[5] (Low Inter-ReferenceRecency Set) maintains a variable-length LRU queue, and the LRU end of the queue is the Llirs item that has been visited at least twice recently (Llirs is an algorithm parameter). The LIRS algorithm can achieve a high hit rate in the IRM access mode, but the efficiency of the SDD access mode is significantly reduced.
    For the VoD system, the policy based on the access frequency can capture the hot movie clips, so that a large number of requests for the hot clips can avoid slow disk I/O. But video hotspots will change over time, and such strategies cannot take advantage of VoD's sequential access mode, so it is not suitable for VoD systems to perform replacement operations purely based on access frequency.
  • Taking into account the access time and frequency: FBR[6] (Frequency Based Replacement) maintains an LRU queue and divides the queue into New, Middle, and Old. A count value is maintained for each cache item in the queue. When an item in the cache is hit, the hit cache item is moved to the MRU side of the New segment. If the item was originally located in the Old or Middle segment, its count value is incremented by 1, and if it was originally located in the New segment, the count value remains unchanged. . When performing the replacement operation, delete the item with the smallest count value of the Old segment (LRU side).
    LRFU[7] (LeastRecently FrequentlyUsed) maintains a weight C(x) for each cache item, whose initial value is 0, and C(x) changes according to the following formula.
    At time t, C(x) = 1+2-λC(x): x is accessed to 2-λC(x) : when x is not accessed for the replacement operation, the item with the smallest value of C(x) is deleted. At that time, the behavior of LRFU algorithm was similar to LFU; and at that time, the behavior of this algorithm was close to LRU algorithm. The algorithm obtains the balance between time and frequency factors by choosing the appropriate λ value.
    Although LRFU adopts a value to take into account the access time and frequency factors, because the value is fixed, when the access mode changes, the algorithm cannot make corresponding adjustments, resulting in performance degradation. ALRFU[8] (Adaptive LRFU) improves LRFU in this regard. By monitoring the history of data access patterns, ALRFU dynamically adjusts the value to adapt to changes in data access patterns, showing better adaptability than LRFU.
  • Cache strategy based on access mode: A&L algorithm is proposed according to the characteristics of VoD system. The algorithm estimates the future access frequency of each cache item by recording the historical access time and access number of each cache item. Taking this frequency value as a measure, the item with the smallest value is replaced when performing cache replacement. Since this algorithm takes into account the data access characteristics of VoD systems, it is widely used in VoD systems.
    However, the A&L algorithm generates cache weights by directly calculating the total number and frequency of accesses since the cache segment is generated, and does not consider that the access hotspots of VoD movies will change with time. When the historical access frequency of some cache sections is high but the recent access frequency decreases, the weights still maintain a large value, which affects the cache of new hot spots and cannot adapt to the dynamic changes of movie hot spots.
    The SARC[10] proposed by IBM is a caching algorithm for large-scale servers, which can dynamically adapt to the data access modes of random access and sequential access. SARC realizes the adaptation to two different access modes by dividing random access and sequential access into two queues to manage them separately. And by analyzing the simulation test data curve of cache size-hit rate, the cost function of replacing random queue and sequential queue items is obtained. When performing cache replacement, the cost function of the two queues is used to replace the low-cost queue.

http://blog.csdn.net/it_yuan/article/details/8489125

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326941868&siteId=291194637