How to implement a general cache center with ReadWriteLock?

Abstract: In concurrent scenarios, the Java SDK provides ReadWriteLock to meet the scenario of more reads and fewer writes.

This article is shared from Huawei Cloud Community " [High Concurrency] Create a High-Performance Cache Based on ReadWriteLock ", author: Binghe.

written in front

In actual work, there is a very common concurrency scenario: that is, the scenario of reading more and writing less. In this scenario, in order to optimize program performance, we often use caching to improve application access performance. Because caching is very suitable for use in scenarios with more reads and fewer writes. In the concurrent scenario, the Java SDK provides ReadWriteLock to meet the scenario of more reads and fewer writes. In this article, we will talk about how to use ReadWriteLock to implement a general cache center.

The knowledge points involved in this article are:

read-write lock

Speaking of read-write locks, I believe that friends are no strangers. In general, read-write locks need to follow the following principles:

A shared variable is allowed to be read by multiple reading threads at the same time.
A shared variable can only be written by one writer thread at a time.
When a shared variable is written by the writing thread, the shared variable cannot be read by the reading thread at this time.

Here, friends need to pay attention to: an important difference between read-write locks and mutex locks is: read-write locks allow multiple threads to read shared variables at the same time, while mutex locks do not. Therefore, in high concurrency scenarios, the performance of read-write locks is higher than that of mutex locks. However, the write operation of the read-write lock is mutually exclusive, that is to say, when using the read-write lock, when a shared variable is written by the writing thread, the shared variable cannot be read by the reading thread at this time.

The read-write lock supports fair mode and unfair mode, which is controlled by passing a boolean variable in the construction method of ReentrantReadWriteLock.

public ReentrantReadWriteLock(boolean fair) {
    sync = fair ? new FairSync() : new NonfairSync();
 readerLock = new ReadLock(this);
 writerLock = new WriteLock(this);
}

In addition, one thing to note is: in the read-write lock, calling newCondition() on the read lock will throw an UnsupportedOperationException exception, that is to say: the read lock does not support condition variables.

cache implementation

Here, we use ReadWriteLock to quickly implement a common utility class for caching, and the overall code is as follows.

public class ReadWriteLockCache<K,V> {
 private final Map<K, V> m = new HashMap<>();
 private final ReadWriteLock rwl = new ReentrantReadWriteLock();
 // 读锁
 private final Lock r = rwl.readLock();
 // 写锁
 private final Lock w = rwl.writeLock();
 // 读缓存
 public V get(K key) {
 r.lock();
 try { return m.get(key); }
 finally { r.unlock(); }
 }
 // 写缓存
 public V put(K key, V value) {
 w.lock();
 try { return m.put(key, value); }
 finally { w.unlock(); }
 }
}

It can be seen that in ReadWriteLockCache, we define two generic types, K represents the key of the cache, and V represents the value of the cache. Inside the ReadWriteLockCache class, we use Map to cache the corresponding data. Everyone knows that HashMap is not a thread-safe class, so here we use a read-write lock to ensure thread safety. For example, we get() The method uses a read lock, and the get() method can be read by multiple threads at the same time; the put() method uses a write lock internally, that is, the put() method can only have one thread write to the cache at the same time operate.

It should be noted here that whether it is a read lock or a write lock, the release operation of the lock needs to be placed in the finally{} code block.

In past experience, there are two ways to load data into the cache, one is: when the project is started, the full amount of data is loaded into the cache, and the other is to load the required cache data on demand during the running of the project.

Next, let's take a look at how to load the cache in full and load the cache on demand.

Full load cache

It is relatively simple to load the cache in full, which is to load the data into the cache at one time when the project starts. This situation is suitable for scenarios where the amount of cached data is not large and the data changes infrequently. For example, some systems can be cached The data dictionary and other information. The general flow of the entire cache loading is as follows.

After the full amount of data is loaded into the cache, the corresponding data can be directly read from the cache later.

The code implementation of fully loading the cache is relatively simple. Here, I will directly use the following code for demonstration.

public class ReadWriteLockCache<K,V> {
 private final Map<K, V> m = new HashMap<>();
 private final ReadWriteLock rwl = new ReentrantReadWriteLock();
 // 读锁
 private final Lock r = rwl.readLock();
 // 写锁
 private final Lock w = rwl.writeLock();
 public ReadWriteLockCache(){
 //查询数据库
 List<Field<K, V>> list = .....;
 if(!CollectionUtils.isEmpty(list)){
 list.parallelStream().forEach((f) ->{
m.put(f.getK(), f.getV);
});
 }
 }
 // 读缓存
 public V get(K key) {
 r.lock();
 try { return m.get(key); }
 finally { r.unlock(); }
 }
 // 写缓存
 public V put(K key, V value) {
 w.lock();
 try { return m.put(key, value); }
 finally { w.unlock(); }
 }
}

Load cache on demand

Loading the cache on demand can also be called lazy loading, that is to say: data will be loaded into the cache when it needs to be loaded. Specifically: when the program starts, the data will not be loaded into the cache. When running, some data needs to be queried. First, check whether the required data exists in the cache. If so, directly read the data in the cache. If it does not exist, query the data in the database and write the data into the cache. Subsequent read operations, because the corresponding data already exists in the cache, just return the cached data directly.

This way of querying the cache is suitable for most scenarios of caching data.

We can use the following code to represent the on-demand query cache business.

class ReadWriteLockCache<K,V> {
 private final Map<K, V> m = new HashMap<>();
 private final ReadWriteLock rwl = new ReentrantReadWriteLock();
 private final Lock r = rwl.readLock();
 private final Lock w = rwl.writeLock();
 V get(K key) {
 V v = null;
 //读缓存
 r.lock(); 
 try {
            v = m.get(key);
 } finally{
 r.unlock(); 
 }
 //缓存中存在，返回
 if(v != null) { 
 return v;
 } 
 //缓存中不存在，查询数据库
 w.lock(); 
 try {
 //再次验证缓存中是否存在数据
            v = m.get(key);
 if(v == null){ 
 //查询数据库
                v=从数据库中查询出来的数据
 m.put(key, v);
 }
 } finally{
 w.unlock();
 }
 return v; 
 }
}

Here, in the get() method, the data is first read from the cache. At this time, we add a read lock to the operation of querying the cache. After the query returns, the unlock operation is performed. Determine whether the data returned in the cache is empty, if it is not empty, return the data directly; if it is empty, obtain the write lock, and then read the data from the cache again, if there is no data in the cache, query the database and save the result The data is written into the cache and the write lock is released. Finally return the result data.

Here, some friends may ask: why the program has already added a write lock, why do you need to query the cache once inside the write lock?

This is because in high concurrency scenarios, there may be multiple threads competing for write locks. For example: when the get() method is executed for the first time, the data in the cache is empty. If three threads call the get() method at the same time and run to the w.lock() code at the same time, due to the exclusiveness of the write lock. At this time, only one thread will acquire the write lock, and the other two threads will be blocked at w.lock(). The thread that has acquired the write lock continues to query the database, writes the data into the cache, and then releases the write lock.

At this time, the other two threads compete for the write lock, and a certain thread will acquire the lock and continue to execute. If there is no v = m.get(key); after w.lock(); query the cached data again, then this The thread will directly query the database, write the data into the cache and release the write lock. The last thread will also execute according to this process.

Here, in fact, the first thread has already queried the database and written the data into the cache. There is no need for the other two threads to query the database again, and the corresponding data can be queried directly from the cache. Therefore, adding v = m.get(key); after w.lock() to query the cached data again can effectively reduce the problem of repeatedly querying the database in high concurrency scenarios and improve system performance.

Upgrading and downgrading of read-write locks

Regarding the upgrading and upgrading of locks, friends need to pay attention to: in ReadWriteLock, the lock does not support upgrading, because when the read lock is not released, acquiring the write lock at this time will cause the write lock to wait forever, and the corresponding thread It will also be blocked and cannot be woken up.

Although lock escalation is not supported, ReadWriteLock supports lock downgrade, for example, let's take a look at the official ReentrantReadWriteLock example as shown below.

class CachedData {
 Object data;
 volatile boolean cacheValid;
 final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
 void processCachedData() {
 rwl.readLock().lock();
 if (!cacheValid) {
 // Must release read lock before acquiring write lock
 rwl.readLock().unlock();
 rwl.writeLock().lock();
 try {
 // Recheck state because another thread might have
 // acquired write lock and changed state before we did.
 if (!cacheValid) {
                    data = ...
 cacheValid = true;
 }
 // Downgrade by acquiring read lock before releasing write lock
 rwl.readLock().lock();
 } finally {
 rwl.writeLock().unlock(); // Unlock write, still hold read
 }
 }
 try {
 use(data);
 } finally {
 rwl.readLock().unlock();
 }
 }
}}

Data synchronization problem

First of all, the data synchronization mentioned here refers to the data synchronization between the data source and the data cache. To put it more directly, it is the data synchronization between the database and the cache.

Here, we can adopt three solutions to solve the problem of data synchronization, as shown in the following figure

timeout mechanism

This is easier to understand. When writing data to the cache, give a timeout period. When the cache times out, the cached data will be automatically removed from the cache. At this time, when the program accesses the cache again, because there is no corresponding After querying the database to obtain the data, write the data into the cache.

Regularly update the cache

This scheme is an enhanced version of the timeout mechanism. When writing data to the cache, it also gives a timeout period. Different from the timeout mechanism, a separate thread is started in the background of the program, the data in the database is regularly queried, and then the data is written into the cache, which can avoid the cache penetration problem to a certain extent.

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~