Using Java Design to Implement an Efficient and Scalable Calculation Result Cache

overview

Almost all applications in current software development use some form of caching. Reusing previous calculation results can reduce latency and improve system throughput, but it consumes more memory, which is a method of exchanging space for time. Like many 重复造的轮子, the cache looks very simple. It is nothing more than saving all the calculation results. When using it next time, the results that have been saved in the cache will be used first, and the calculation will be recalculated when there is no one. However, unreasonable caching mechanism design will affect the performance of the program. This article introduces the iterative design of a calculation result cache, analyzes the concurrency defects of each version, and analyzes how to fix these defects, and finally completes an efficient and scalable The calculation result cache.

1. Cache implementation

To demonstrate, we define a computing interface Computable<A,V>, and declare a function compute(A arg) in the interface, whose input value is of type A, and the returned value is of type V. The interface definition is as follows :

 
 

java

copy code

public interface Computable<A,V> { V compute(A arg) throws InterruptedException; }

1.1 Using HashMap+Synchronized to implement caching

The first way is that we use HashMap as a cache container, because HashMap is not thread-safe, so we need to add a synchronized synchronization mechanism to ensure data access security.

code show as below:

 
 

java

copy code

public class HashMapMemoizer<A,V> implements Computable<A,V>{ private final Map<A,V> cache = new HashMap<>(); private final Computable<A,V> computable; private HashMapMemoizer(Computable<A,V> computable){ this.computable = computable; } @Override public synchronized V compute(A arg) throws InterruptedException { V res = cache.get(arg); if (res == null) { res = computable.compute(arg); cache.put(arg,res); } return res; } }

As shown in the above code, we use HashMap to save the previous calculation results. Every time we calculate the result, we first check whether it exists in the cache, and if it exists, return the result in the cache, otherwise recalculate the result and put it in in the cache before returning the result. Since HashMap is not thread-safe, we cannot ensure that two threads will not access HashMap at the same time, so we add the synchronized keyword to the entire compute method to synchronize the method. This method can guarantee thread safety, but there will be an obvious problem, that is, only one thread can execute the compute at a time, if another thread is computing the result, because the calculation is time-consuming, then other calls to the compute method Threads may be blocked for a long time. If multiple threads are queuing up for results that have not yet been calculated, the calculation time of the compute method may be longer than the calculation time without caching operations, and then the cache loses its meaning.

1.2 Use ConcurrentHashMap instead of HashMap to improve cache concurrency

Since ConcurrentHashMap is thread-safe, it does not need to be synchronized when accessing the underlying Map, so it can avoid the problem of multiple threads queuing up for results that have not yet been calculated when synchronizing the compute method

The improved code looks like this:

 
 

java

copy code

public class ConcurrentHashMapMemoizer<A,V> implements Computable<A,V>{ private final Map<A,V> cache = new ConcurrentHashMap<>(); private final Computable<A,V> computable; private ConcurrentHashMapMemoizer(Computable<A,V> computable){ this.computable = computable; } @Override public V compute(A arg) throws InterruptedException { V res = cache.get(arg); if (res == null) { res = computable.compute(arg); cache.put(arg,res); } return res; } }

注意:这种方式有着比第一种方式更好的并发行为,多个线程可以并发的使用它,但是它在做缓存时仍然存在一些不足,这个不足就是当两个线程同时调用compute方法时,可能会导致计算得到相同的值。因为缓存的作用就是避免相同的数据被计算多次。对于更通用的缓存机制来说,这种情况将更严重。而假设用于只提供单次初始化的对象来说,这个问题就会带来安全风险。

1.3 Complete the final solution for scalable and efficient caching

The problem with using ConcurrentHashMap is that if a thread starts an expensive calculation, and other threads don't know that the calculation is in progress, then it is very likely to repeat this calculation. So we hope that there is a way to express "thread X is doing the time-consuming calculation of f(10)", so that when another thread looks for f(10), it can know that there are already threads calculating what it wants It's worth it. At present, the most efficient way is to wait for the calculation of thread X to finish, and then check the cache to find the result of f(10). And FutureTask just can realize this function. We can use FutureTask to represent a calculation process, which may have been calculated or may be in progress. If there is a result available, the FutureTask.get() method will return the result immediately, otherwise it will block until the result is calculated and then return it

We redefine the previous Map used to cache values ConcurrentHashMap<A, Future<V>>​​to replace the original one ConcurrentHashMap<A, V>, the code is as follows:

 
 

java

copy code

public class PerfectMemoizer<A, V> implements Computable<A, V> { private final ConcurrentHashMap<A, Future<V>> cache = new ConcurrentHashMap<>(); private final Computable<A, V> computable; public PerfectMemoizer(Computable<A, V> computable) { this.computable = computable; } @Override public V compute(final A arg) throws InterruptedException { while (true) { Future<V> f = cache.get(arg); if (f == null) { Callable<V> eval = new Callable<V>() { @Override public V call() throws Exception { return computable.compute(arg); } }; FutureTask<V> ft = new FutureTask<>(eval); f = cache.putIfAbsent(arg, ft); if (f == null) { f = ft; ft.run(); } } try { return f.get(); } catch (CancellationException e) { cache.remove(arg); } catch (ExecutionException e) { throw new RuntimeException(e); } } } }

As shown in the above code, we first check whether a corresponding calculation has started. If not, we create a FutureTask and register it in the Map, and then start the calculation. If the calculation has already started, we wait for the result of the calculation. The result may be available soon, or it may still be in the process of calculation. But it is transparent to the Future.get() method.

注意:我们在代码中用到了ConcurrentHashMap的putIfAbsent(arg, ft)方法,为啥不能直接用put方法呢?因为如果使用put方法,那么仍然会出现两个线程计算出相同的值的问题。我们可以看到compute方法中的if代码块是非原子的,如下所示:

 
 

java

copy code

// compute方法中的if部分代码 if (f == null) { Callable<V> eval = new Callable<V>() { @Override public V call() throws Exception { return computable.compute(arg); } }; FutureTask<V> ft = new FutureTask<>(eval); f = cache.putIfAbsent(arg, ft); if (f == null) { f = ft; ft.run(); } }

因此两个线程仍有可能在同一时间调用compute方法来计算相同的值,只是概率比较低。即两个线程都没有在缓存中找到期望的值,因此都开始计算。而引起这个问题的原因复合操作(若没有则添加)是在底层的Map对象上执行的,而这个对象无法通过加锁来确保原子性,所以需要使用ConcurrentHashMap中的原子方法putIfAbsent,避免这个问题

1.4 Test code

I originally wanted to make a dynamic graph to show the speed comparison between using cache and not using cache, but the resulting graph is too large to be uploaded, so let the test code readers verify it by themselves:

 
 

java

copy code

public static void main(String[] args) throws InterruptedException { Computable<Integer, List<String>> cache = arg -> { List<String> res = new ArrayList<>(); for (int i = 0; i < arg; i++) { Thread.sleep(50); res.add("zhongjx==>" + i); } return res; }; PerfectMemoizer<Integer, List<String>> memoizer = new PerfectMemoizer<>(cache); new Thread(new Runnable() { @Override public void run() { List<String> compute = null; try { compute = memoizer.compute(100); System.out.println("zxj 第一次计算100的结果========: " + Arrays.toString(compute.toArray())); compute = memoizer.compute(100); System.out.println("zxj 第二次计算100的结果: " + Arrays.toString(compute.toArray())); } catch (InterruptedException e) { throw new RuntimeException(e); } } }).start(); System.out.println("zxj====>start===>"); }

In the test code, we use the Thread.sleep() method to simulate time-consuming operations. If we want to test the situation of not using the cache, we just f = cache.putIfAbsent(arg, ft);need to comment this code: as shown in the figure below

Conclusion: When the cache is used, the calculation result will be obtained quickly. When the cache is not used, each calculation will take time.

2. Summary of concurrency skills

So far: a scalable and efficient cache has been designed, so far we can summarize the skills of concurrent programming, as follows:

1. Try to declare the domain as final type, unless they are mutable, that is, when designing the domain, you should consider whether it is mutable or immutable. 2. Immutable objects must be thread-safe and can be shared arbitrarily without using add Mechanisms such as locks or protective replication. 3. Use a lock to protect each mutable variable 4. When protecting all variables in the same invariant condition, use the same lock 5. During the execution of compound operations, hold the lock 6. Consider it during the design process thread safe.

Guess you like

Origin blog.csdn.net/BASK2312/article/details/131305725
Recommended