Dark Horse Review Study Notes 1

Dark Horse Review Study Notes


黑马点评这个很早就学过了,最近在整理简历所以也对进行了笔记整理和补充

Dark Horse Review Study Notes

1. SMS login

1.1 Login

Optimize the returned information (that is, the information stored in the session)

1.1.1 Why optimize?

  1. Memory pressure: session is the memory space of tomcat. The more information stored in it, the greater the pressure will be on the entire service, so it is unnecessary to store some unimportant and irrelevant information in it.
  2. Sensitive information: Generally speaking, after successful login, we only need to return the user’s account number, avatar, and nickname. Sensitive information such as creation time, password, and phone number do not need to be returned, because there is a risk of leakage

1.1.2 How to optimize?

Just return some necessary information

Specifically, it is to define a UserDTO, which only needs to store the necessary information

We convert user to userDTO before storing it in session

How to transfer?

Normally, we can create a new dto object, and then manually save it one by one.

But we can use the ready-made tool class BeanUtil (cn.hutool.core.bean), which has a copyProperties method, which means copy properties

Later, there will be conversion of user to map, beanToMap

1.2 Verify login status

In the step of checking the login status, we need to judge whether the user exists. If there is, we can directly release it, but the checking step cannot be done in vain, because the subsequent operations may use the logged-in user information, so we The logged-in user information can be cached for later specific business use

1.2.1 Where is user information stored locally?

Generally, the user will be saved to ThreadLocal, then the follow-up business can be obtained directly from ThreadLocal

What is ThreadLocal?

ThreadLocal is a thread domain object. In business, every request arriving at our service (entering tomcat) is an independent thread

What can go wrong if ThreadLocal is not used?

If it is directly saved to a local variable, there may be security issues with multi-threaded concurrent modification of data

But using ThreadLocal can save data inside each thread, and a map will be created inside the thread to save, so each thread has its own independent storage space, then each request has its own independent storage space after it comes , without interfering with each other (this is thread isolation)

1.2.2 Supplementary knowledge points: ThreadLocal can easily cause memory leaks!

How did the ThreadLocal memory leak problem cause it?

ThreadLocalMapThe key used in is ThreadLocala weak reference, while the value is a strong reference. Therefore, if ThreadLocalthere is no external strong reference, the key will be cleaned up during garbage collection, but the value will not be cleaned up.

In this way, ThreadLocalMapthere will be an Entry whose key is null. If we do not take any measures, the value will never be reclaimed by GC, and a memory leak may occur at this time.

Simply put

Because the bottom layer of ThreadLocal is ThreadLocalMap, when thread Threadlocal is used as key (weak reference) and user is used as value (strong reference), then jvm will not recycle the value of strong reference, so the value is not released

How to solve the problem of memory leak?

ThreadLocalMapThis situation has been considered in the implementation. When calling set(), get(), remove()methods, the records whose key is null will be cleared. ThreadLocalIt is better to call the method manually when you are done using remove()the method

What are weak references?

If an object has only weak references, it is similar to dispensable household items .

The difference between weak references and soft references is that only objects with weak references have a shorter life cycle.

In the process of scanning the memory area under its jurisdiction by the garbage collector thread, once an object with only weak references is found, its memory will be reclaimed regardless of whether the current memory space is sufficient or not.

However, since the garbage collector is a very low-priority thread, objects that only have weak references may not be found quickly.

A weak reference can be used in conjunction with a reference queue (ReferenceQueue). If the object referenced by the weak reference is garbage collected, the Java virtual machine will add the weak reference to the reference queue associated with it.

1.3 Login Interceptor

We wrote the previous verification login status in userController, but in fact there will be many businesses that need to check the user’s login status. But we can’t write this bunch of verification codes in the econtroller of every business

At this time, you should think of the interceptor in SpringMVC , which can be done before all controllers are executed.

If we use an interceptor, then the user's request will no longer be able to directly access our controller. All requests must first pass through the interceptor , and then the interceptor will judge whether it should be released to let the request reach the controller.

So with the interceptor, we can put all the codes related to verifying the login status into the interceptor, so that all controllers don’t need to write code related to verification.

But there is a small problem with this approach:

Interceptors can indeed help us to verify user logins, but we may need to use user information in subsequent businesses after verification.

We can get the user information in the verification step, so how to get the follow-up business?

We need to transfer the user information intercepted in the interceptor to the controller, and we need to pay attention to the thread security problem during the transfer process. What should be used to solve it?

On the whole, the ThreadLocal mentioned above is still used, and we can save the intercepted user information in the ThreadLocal in the interceptor

1.3.1 How to achieve it?

  1. Implement an interface HandlerInterceptor
  2. Implement three methods: preHandle pre-interception, postHandle after Controller execution, and afterCompletion before returning to the user after view rendering

When saving user information, we can write the relevant code of ThreadLocal into the tool class UserHolder

In preHandle, the general release path is stored in the database, which is convenient for management (but we write it out now)

1.4 The operating principle of Tomcat

  • When the user initiates a request, it will access the port we registered with tomcat (any program that wants to run needs a thread to monitor the current port number, and tomcat is no exception)
  • When the listening thread knows that the user wants to connect to tomcat, it will create a socket connection by the listening thread. The sockets appear in pairs, and the users pass data to each other through the socket.
  • When the socket on the tomcat side receives the data, the listening thread will take out a thread from the thread pool of tomcat to execute the user request
  • After our service is deployed to tomcat, the thread will find the project that the user wants to access, and then use this thread to forward it to the controller, service, and dao in the project, and access the corresponding DB
  • After the user executes the request, it returns in a unified manner. At this time, the socket on the tomcat side is found, and then the data is written back to the socket on the user side to complete the request and response.

That is to say

Each user actually finds a thread in the tomcat thread pool to complete the work, and then recycles after use. Each thread is independent (this is why ThreadLocal is used later to achieve thread isolation. Each thread operate one's own data)

1.5 session sharing problem

Problem: Multiple Tomcats do not share session storage space, and data loss occurs when requests are switched to different tomcat services

Specifically

Each tomcat has its own session, assuming that the user visits the first tomcat for the first time and stores his information in the session of the first server

But the second time this user accesses the second tomcat, then on the second server, there must be no session stored in the first server

So at this time , there will be problems with the entire login interception function (assuming that the login interception function is deployed on the first server)

1.5.1 How can we solve this problem?

  • Early solution: session copy

    That is, when the session of any server is modified, it will be synchronized to the sessions of other tomcat servers

    But this solution has two problems:

    • Memory loss: Each server has a complete session data, and the server is under too much pressure
    • Synchronization problem: when the session copies data, there may be a delay
  • Completed based on redis: that is, save the information originally saved in the session to redis

1.5.2 Session sharing should satisfy the following three aspects

  • data sharing
  • Memory storage (because the session is based on memory, its reading and writing efficiency is relatively high. For example, the access frequency of login verification is relatively high. If the reading and writing efficiency is low, it is difficult to meet the high concurrency requirements)
  • kv structure

1.6 Implementation of redis instead of session

1.6.1 How to achieve it?

  1. Data Type Selection

    Since the stored data is relatively simple, we can consider using String or hash, as shown in the figure below. If you use String, students should pay attention to its value and take up a little more space. If you use hash, its value Only the data itself will be stored. If you don’t care about memory, you can actually use String.

    There are two common ways to save a single object in redis:

    • String structure:

      In fact, it is to serialize our Java objects into json strings

      It seems more intuitive, but the entire data is turned into a string, and the fields are coupled into a whole, so you can only do curd on the whole

      It takes up a lot of memory, because there will be json string format in it, such as braces, colons, quotation marks, etc. If the data is longer and contains more and more symbols, there will be some additional data storage

      insert image description here

    • hash structure

      There is a big difference between the hash structure and the String structure

      It is to save each field in our Java object as a field and value in value

      Each field is independent, we can do curd for a single field

      The memory usage is less, because the Harbin structure only needs to save the data itself

      insert image description here

    Using the String structure is a simple way of key and value key-value pairs

  2. How to design keys?

    When using session to save user information, we use code as key, for example, can we also use code as key in redis?

    this is obviously not possible

    Because the session has a characteristic, each different browser has an independent session when it initiates a request, that is to say, many sessions are maintained inside tomcat, so when the mobile phone numbers carried by different browsers come For their own independent sessions, they all use code as the key, but they do not interfere with each other .

    But redis is a shared memory space. No matter who initiates the request, there is only one redis in the background, and everyone saves it in it. If different mobile phone numbers use code as the key, they will be overwritten all the time, so most of the data will be lost .

    Therefore, we need to ensure that the key saved when each mobile phone number comes is different.

    Since each mobile phone number must have a different key, we can directly use the mobile phone number as the key

    This has two advantages:

    • Make sure each phone number has its own unique key
    • It also helps us to obtain verification codes for verification later

    Questions about fetching data:

    Because tomcat will automatically maintain the session for us. When the browser initiates a request, a new session will be created for the browser. If the session exists, there is no need to create it. How does tomcat know where your session is? When the session is created, the sessionid will be automatically created and written to the cookie of the user's browser. Then each request will carry the cookie and the sessionid in the future, so that the session will be found naturally, and then it will automatically help us retrieve data from the session. You don't have to worry about fetching data

    But how to fetch data in redis now?

    In redis, we use the mobile phone number as the key to store it (the value is the verification code), and the user must bring this information to get the verification code when logging in. When logging in and registering with SMS verification code, the user will submit the mobile phone number and verification code, then we can retrieve the data in redis according to the mobile phone number (this is why we use the mobile phone number as the key)

    But if we use the data of phone: mobile phone number to store, it is of course possible, but it is not appropriate to store such sensitive information in redis and bring it from the page

    So we use a random token to store user data

    Random token is actually a random string, for example, it can be generated by uuid

    There are two types of UUID tools, one is java.util and the other is cn.hutool.core.lang (this is used here)

    UUID.randomUUID().toString(true), random value without underscore

    Unlike session, tomcat will automatically write the sessionid to the browser for us. We need to manually return the ttoken to the front end (that is, we need to return the token to the client (browser) after saving the user to redis) and then the client The terminal (browser) will save this token, and each request will carry this token in the future. When the server sees that it holds this token, we can get data from redis based on this token

    We return the token to the front end, how does the front end ensure that the token can be carried every time?

    After the front end receives the token, it will save the token in sessionStorage

    sessionStorage is a storage method of the browser

    The specific implementation that is carried every time:

    • First get the saved token from sessionStorage
    • Then use the interceptor (the axios interceptor used here) to put this token as a request header every time a request is sent (a name will be given here, such as authorization)
    • Then in the future, all such requests initiated by axios, that is, all ajax requests will carry the authorization header, which is token
    • In the future, we will be able to get the authorization request header on the server side, so as to get the token, so as to realize the verification of login
  3. The validity period of the key

    We'd better set an expiration date for the key

    Just like we usually get the verification code when we log in, the system will prompt that the verification code is valid within five minutes or two minutes

    So why do we have to add this limit?

    Because if we do not impose restrictions, this verification code will be ignored once it is stored in redis. Then whenever someone logs in or registers and needs to send a verification code, such a piece of data will be stored in redis. In the long run, countless pieces of data will be stored in redis. The data is never deleted, so redis will be full

    So in order to avoid such problems, we must set an expiration date for the key we store in redis

    There are two ways to set the validity period:

    • Specify time and time unit
    • Use duration directly to set the validity period

1.7 Solve the problem of status login refresh

1.7.1 Solution 1:

At the beginning, we set the validity period in the relevant code of login, but what is the problem with this?

It will expire after 30 minutes after successful login

But we all know that the user may still be active for more than 30 minutes (that is, has been visiting different pages), so the valid time also needs to be reset to 30 minutes , otherwise the user may be invalid after using it!

As long as the user does not perform any operation for 30 minutes , it will be invalid; otherwise, the valid time will be refreshed according to the user's operation

The specific implementation is to write the code related to the valid time on the login interceptor

Problems with manually created class injection objects

In LoginIntercepter, we need StringRedisTemplate to set the validity period and obtain user information, that is, we need to inject StringRedisTemplate, but we cannot use annotations such as @Autoware or @Resourse for injection in this place, we can only use constructors to inject

Because the object of the LoginInterceptor class is manually new, it is not constructed through some of our component and other annotations, that is to say, the object of this class is not created by spring (objects created by spring, spring Can help us do this kind of dependency injection, such as @Autoware ), so interceptors cannot use these annotations

Then use constructor injection, who will help us inject it?

It depends on who uses the object of this class

It was used in MvcConfig (new LoginInterceptor()), so we inject it here

So how to inject StringRedisTemplate here?

Because this MvcConfig is annotated with @Configuration, it means that this class will be built by spring in the future (objects built by spring can do dependency injection

@Configuration
public class MvcConfig implements WebMvcConfigurer {
    
    

    @Override
    public void addInterceptors() {
    
    
        // 登录拦截器
        registry.addInterceptor(new LoginInterceptor())
                .excludePathPatterns(
                        "/shop/**",
                        "/voucher/**",
                        "/shop-type/**",
                        "/upload/**",
                        "/blog/hot",
                        "/user/code",
                        "/user/login"
                );
    }
}

Then why can't you add @Component directly to the interceptor?

Because an interceptor is a very lightweight component that is called only when needed and doesn't need to be available throughout the application like a controller or service. Therefore, declaring the interceptor as a Spring
Bean may cause performance degradation.

The interceptor is executed before the initial haul of the spring container. It is useless to add any Component annotation. The interceptor of the mvc configuration class is new, and the annotation will cause a null pointer exception

One more thing to pay attention to is:

StringRedisTemplate has a feature that requires both key and value to be String

So when there is data that is not of String type in our data, it will be wrong when it is stored in redis through StringRedisTemplate. For example, here we want to store UserDTO in redis, because the userid is of type Long and there is a type conversion problem.
Therefore When we store data in this map

insert image description here

It must be ensured that each value inside must be stored in the form of String, that is, the key and value in the map must be of String type.
Two methods:

  • Instead of using the tool class BeanUtil, create a new map yourself

  • Still use the tool class BeanUtil, which allows you to customize the key and value

insert image description here

Summarize

But this solution actually has problems.
In this solution, he can indeed use the interception of the corresponding path, and at the same time refresh the survival time of the login token token, but now this interceptor only intercepts the path that needs to be intercepted, assuming the current user If some paths that do not need to be intercepted are accessed, the interceptor will not take effect, so the token refresh action will not actually be executed at this time

Solution two:

Since the previous interceptor cannot take effect on paths that do not need to be intercepted, we can add an interceptor to intercept all paths in the first interceptor, and put what the second interceptor does into the first interception In the interceptor, refresh the token at the same time, because the first interceptor has threadLocal data, so at this time the second interceptor only needs to judge whether the user object in the interceptor exists, and complete the overall refresh function.

The first interceptor:

public class RefreshTokenInterceptor implements HandlerInterceptor {
    
    

    private StringRedisTemplate stringRedisTemplate;

    public RefreshTokenInterceptor(StringRedisTemplate stringRedisTemplate) {
    
    
        this.stringRedisTemplate = stringRedisTemplate;
    }

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
    
    
        // 1.获取请求头中的token
        String token = request.getHeader("authorization");
        if (StrUtil.isBlank(token)) {
    
    
            return true;
        }
        // 2.基于TOKEN获取redis中的用户
        String key  = LOGIN_USER_KEY + token;
        Map<Object, Object> userMap = stringRedisTemplate.opsForHash().entries(key);
        // 3.判断用户是否存在
        if (userMap.isEmpty()) {
    
    
            return true;
        }
        // 5.将查询到的hash数据转为UserDTO
        UserDTO userDTO = BeanUtil.fillBeanWithMap(userMap, new UserDTO(), false);
        // 6.存在,保存用户信息到 ThreadLocal
        UserHolder.saveUser(userDTO);
        // 7.刷新token有效期
        stringRedisTemplate.expire(key, LOGIN_USER_TTL, TimeUnit.MINUTES);
        // 8.放行
        return true;
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
    
    
        // 移除用户
        UserHolder.removeUser();
    }
}
	

The second interceptor:

public class LoginInterceptor implements HandlerInterceptor {

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        // 1.判断是否需要拦截(ThreadLocal中是否有用户)
        if (UserHolder.getUser() == null) {
            // 没有,需要拦截,设置状态码
            response.setStatus(401);
            // 拦截
            return false;
        }
        // 有用户,则放行
        return true;
    }
}

Register two interceptors

@Configuration
public class MvcConfig implements WebMvcConfigurer {
    
    

    @Resource
    private StringRedisTemplate stringRedisTemplate;

    @Override
    public void addInterceptors(InterceptorRegistry registry) {
    
    
        // 登录拦截器
        registry.addInterceptor(new LoginInterceptor())
                .excludePathPatterns(
                        "/shop/**",
                        "/voucher/**",
                        "/shop-type/**",
                        "/upload/**",
                        "/blog/hot",
                        "/user/code",
                        "/user/login"
                ).order(1);
        // token刷新的拦截器
        registry.addInterceptor(new RefreshTokenInterceptor(stringRedisTemplate)).addPathPatterns("/**").order(0);
    }
}

ps: The larger the value of order, the lower the execution priority

2. Cache

The read and write performance of the cache is high, which is why it is used as a buffer for data exchange

A common example is the computer

The main components of a computer are the CPU, memory, and disk

The computing power of the CPU has far exceeded the ability to read and write data such as memory and disk.

However, any calculation performed by the CPU needs to read data from the memory or disk and put it in its own register before performing the calculation. It is precisely because this data reading and writing ability is far lower than the computing power of the CPU, so the performance of the computer is limited.

So in order to solve this problem, a cache is added inside the CPU

That is to say, the CPU will put some data that often needs to be read and written into the CPU cache, so when we do high-speed calculations, we don’t need to wait for half a day to read the data from the memory and disk before computing, but Get the data directly from the cache to perform an operation

In this way, the computing power of the CPU can be fully released

Therefore, one of the criteria to measure whether the CPU is powerful is the size of the CPU's cache.

The larger the cache, the more data that can be cached, and the better the processing performance will be

2.1 Web application development is also inseparable from caching

For example, as an external application, the user must initiate a request to us through the browser

  1. browser cache

    Then at this time, the browser can first create a cache

    What can the browser cache?

    For example, some static resources of our page (we visit a page, there are a lot of css, js and pictures in the page, these things are generally unchanged, and the browser can cache it locally, so that there is no need for each visit It is necessary to load these data, so that the delay of the network can be greatly reduced, and the response speed of the page can be improved. This is the browser cache)

    Some data that misses in the browser cache will go to our tomcat, which is the Java applications we wrote,

  2. application layer cache

    In tomcat, which is our Java application, we can also add application layer cache

    What is application layer caching?

    To put it simply, we create a map, and then put the data we found from the database into the map and then read it directly from the map to you, which reduces the query efficiency of the database

    So this is also an application layer cache

    Of course, in general, we don't use map for caching, we can use the required redis for caching

    Because redis itself has strong read and write capabilities and fast speed, and the read and write delay is often at a subtle level, so it is very suitable to use it as an application layer cache

  3. Database layer caching

    When the cache misses, the request will still fall to the database. Then it can also add cache at the database level

    What does the database cache?

    cache index

    MySQL database clustered index, he will create an index for id, then we can cache these index data

    In this way, when we query based on these indexes, we can quickly retrieve the results in the memory without having to read the disk every time, so the efficiency will be greatly improved

    This is the cache at the database level

  4. CPU's multi-level cache and disk

    Of course, the data lookup will eventually fall to the disk, and some complex sorting or some table associations will be done, and the CPUU will be used for calculations, etc.

    So the final database will also access our CPU and disk

    At this time, we will naturally use the multi-level cache of the CPU we mentioned before, and the disk can also create a read-write cache.

insert image description here

in conclusion

Therefore, caching can be added at every stage of the entire web development. The application scenarios of caching are very rich, but caching cannot be used indiscriminately. Everything is a double-edged sword. After introducing caching, it will bring benefits but at the same time there will be some costs. .

2.2 The role and cost of caching

2.2.1 Function:

  1. Reduce backend load

    After the request enters our tomcat, we used to check the database first, and the database itself is relatively inefficient because it needs to read and write data to the disk, which leads to a relatively high delay in our entire business , especially for some complicated SQL, the query is slower, which often puts a lot of pressure on the database.

    At this time, if there is a cache request, after entering tomcat, the data can be directly found in the cache and returned to the front end, without having to check the database, which will greatly reduce the pressure on the back end

  2. Improve read and write efficiency and reduce response time

    For example, the reading and writing of the database is often the reading and writing of the disk, and the time is often relatively long. If we use a cache, such as redis, its read and write delay is often at the microsecond level, then this time will be shortened. Reading and writing efficiency is greatly improved. At this time, we can cope with higher concurrent requests. Therefore, using cache in some businesses with a large number of users and high concurrency can solve such high concurrency problems

2.2.2 Cost:

  1. Data Consistency Cost

    The data is originally saved in the database, and now it is cached in the memory, such as redis, then the user will first query redis when querying, which can reduce the pressure on the database, but if the data in the database changes, and at this time The data in redis or in the cache is still old data, so what is obtained or read is old data, and at this time there is an inconsistency between the two. If it is some more important data inconsistency, it may even cause some serious problems, so this is the cost of data consistency

  2. code maintenance cost

    In order to solve the consistency problem, it will bring a huge cost to our code maintenance, because we need to have some very complicated business coding in the process of solving this consistency, and there are still some problems in the process of cache consistency processing. There will be problems such as cache breakdown. In order to solve these problems, the complexity of the code will increase a lot, and then the cost of development and maintenance will become higher and higher

  3. Operation and maintenance cost

    In order to avoid problems such as cache avalanche, there is also the high availability of the cache

    Cache often needs to be built into a cluster mode, and such a deployment and maintenance of cache storage will have some additional human costs

    And there will be some hardware costs during the deployment of these clusters

2.3 How to solve some cache problems?

2.3.1 Cache update strategy

Cache update is a thing designed by redis to save memory, mainly because the memory data is precious. When we insert too much data into redis, it may cause too much data in the cache at this time, so redis will process some
data It is more appropriate to update or call him eliminated

在企业当中主要的三种缓存更新策略为内存淘汰、超时剔除、主动更新

  • Memory elimination:

    Redis is used to solve the problem of insufficient memory

    Because redis is based on memory storage, and memory is not like disk, it is limited and more precious, so often redis memory will set an upper limit.
    In other words, the more and more data we store may eventually lead to insufficient memory.
    So there will be a memory elimination mechanism in redis, which we can configure by ourselves.

    内存淘汰这种机制**默认是开启的**,不需要我们进行管理,我们也不用考虑redis内存不足

    • data consistency

      This mechanism can also ensure data consistency to a certain extent
      because when the memory is insufficient, it eliminates part of the data, and this part of the data is gone in redis. At this time, if the user queries this part of the data, if the cache misses It will go to the database to check, and then write the data in the database to the cache, then the data will remain consistent

      But this kind of consistency is beyond our control. Which data is eliminated when it is eliminated, and when it is eliminated, it is very likely that these data will not be eliminated for a long time, because the memory is always sufficient, so Firstly, every time these data that will not be eliminated are queried, all the queried data are old data, so the consistency of the data cannot be guaranteed

      因此它的一致性是比较差的

    • maintenance cost

      The maintenance cost of this mechanism is almost 0, because it is controlled by redis itself

  • timeout culling

    Timeout elimination is slightly better than memory elimination, because it will use the expire command in our redis to add tags to our data-expiration time.

    • data consistency

      Because the data will be automatically deleted after the time expires, the cache will be updated the next time the user checks for a miss, thus ensuring consistency

      这种机制的一致性强弱就取决于我们设置的这个过期时间的长短

      If the update time is set to be shorter, such as 30 minutes, then the update frequency is actually good

      If the setting time is longer, such as one day, its update efficiency may be relatively low

      这是我们可以控制的一致性

      But this is not completely consistent.
      For example, if you set it for 30 minutes, if the database changes during these 30 minutes, then the redis data and the database data are still inconsistent at this time.

      因此这种机制不是一种强的机制,一致性一般,但肯定好于内存淘汰机制

    • maintenance cost

      The maintenance cost is relatively low, because we only need to add an expiration time to the logic that originally set the cache

  • active update

    Active update simply means that we write our own business logic - we are going to modify the database data and modify the cache at the same time

    • data consistency

      Since modifying the database data will also modify the cache, data consistency can be ensured

      The data consistency of this mechanism is relatively good, but it is not completely guaranteed, because the healthy operation of the program is not guaranteed, and there may be some accidents

    • maintenance cost

      The maintenance cost of this mechanism is relatively high, because we need to code ourselves. Originally, we just wrote the database (adding, deleting, modifying and checking are enough), but now we also need to update the cache while updating the database. Therefore, its business coding is relatively complicated, and the maintenance cost is relatively high.

Where is the complexity of active updates? (i.e. the solution to database cache inconsistencies)
  • The first is the Cache Aside Pattern, which
    is simply a manual coding method. The cache caller updates the cache after updating the database, also known as the double-write scheme.

    It may be a little complicated for the caller, but we can control it manually

  • The second is Read/Write Through Pattern

    It is done by the system itself, which is to integrate our cache and database into one service. This service does not care what its bottom layer is, it is a transparent service to the outside world, because this service processes the cache database at the same time internally, it can guarantee the success and failure of both processes at the same time, so by it to maintain the consistency of the two

    That is to say, as a caller, we only need to call the service, and we don't need the problem of relationship consistency, just call it directly. (This is an advantage over Option 1)

    The biggest problem with this method is that it is more complicated to maintain such a service. It is not easy to find a ready-made service of this kind on the market, so the development cost is still relatively high (disadvantages)

  • The third is the Write Brhind Caching Pattern

    This mode is similar to the second way, their role is to simplify the development of the caller, the caller does not need to care about consistency

    The difference is that the second method is controlled by a service that integrates the cache and the database. The caller does not know that he is operating the cache, and the database is transparent to the outside world.

    For the third write-back method, the caller only operates the cache , does not care about the database, does not need to deal with the database, and does not need to deal with consistency. All additions, deletions, changes, and queries are done in the cache.

    So who will ensure the consistency of the data?

    由其他线程异步地将缓存数据持久化到数据库,保证最终一致

    That is to say, adding, deleting, checking and modifying are only done in the cache, so the cached data is the latest data. Now there is such a thread to see if there is any change in the cache in time. If there is any change, it will help us write the cached data to the database, and this write is asynchronous, so it will execute it every once in a while.

    What are the benefits of asynchronous write operations?

    For example, we have done ten write operations in the cache, and after the end of these ten write operations, it happens to be our asynchronous write operation, then he will combine these ten operations into one write operation, forget to write in the database, and do one batch processing.
    这可以把多次对数据库的写合并成一次写,因此在效率上有大大的提升

    For example, between its two asynchronous updates, if we update a certain key in the cache n times, in fact only the last update is valid, then when doing a wave of update operations, we only need to update the last Just write the result once to the database

    What is the biggest problem with this scheme?

    • It is more complicated to maintain such an asynchronous task, and it is necessary to monitor the changes of the data in the cache in real time

    • Consistency in this place is hard to come by. Because we operate the cache first and then update it in one step. If the cache has performed hundreds of operations and no asynchronous update has been triggered at this time, then during this period of time, the cache data and the database are completely inconsistent . And if the cache is down at this time, most of the cache is stored in memory. Once the downtime data is lost, then this time means that this piece of data is completely lost

      所以说它的一致性和可靠性都会存在一定的问题

    To sum up, although the first method requires our callers to code themselves, it is relatively more controllable.
    因此在一般情况下,企业里用的最多的正是这个方案

The best practice solution for the cache update strategy:
low consistency requirements: use the built-in memory elimination mechanism of redis
High consistency requirements: actively update and use timeout elimination as a bottom-up solution
Read operations: return directly if the cache hits; query if the cache misses Write to the database and write to the cache to set the timeout period
Write operation: write to the database first and then delete the cache; ensure the consistency of the database and cache operations

What issues need to be considered during the encoding process of the Cache Aside Pattern? (i.e. what scheme is used for database and cache inconsistencies?)
  1. Delete cache or update cache?
  2. How to ensure that the cache and database operations succeed or fail at the same time?
  3. Do you operate the cache first or the database first?
  1. Delete cache or update cache?
  • Updating the cache is to update the cache while updating the database
  • Deleting the cache means updating the database without updating the cache but directly deleting the cache
  • The difference between the two:
    Because updating the cache is to update the cache every time the database is updated, if you have performed hundreds of operations on the database, you need to perform hundreds of operations on the cache, but if there is no Anyone who does the query, that is, writes more and reads less, then the n modifications to the cache at this time are invalid,
    but the delete operation does not have this problem, because the cache is invalidated when the database is updated, that is, it Deleted, updated a hundred times but only needs to be deleted once, and between these hundred times, if no one visits, the cache will not be updated. When will someone visit and when will the cache be updated? , which is equivalent to a lazy loading mode . To put it simply, the cache is deleted when the database is updated, and the cache is updated when the query is made. This solution will write less frequently and effectively update more
  • Therefore, we generally choose to delete the cache.
  1. How to ensure that the cache and database operations succeed or fail at the same time?
    If we want to delete the cache while updating the database, we need to ensure that the two operations of updating the database and deleting the cache succeed or fail at the same time, that is to say, we must ensure the atomicity of the two operations (if we say that when updating the database, delete the cache If this thing fails then it doesn't make sense)

How to guarantee success or failure at the same time?

  • Monolithic system: Both the cache and the database are in one project, even in one method, consider making it a transaction, and using this feature of the transaction itself can ensure simultaneous success and failure
  • Distributed system: The cache operation and database operation of a distributed system are likely to be two different services, so how to ensure the consistency of these two things? This has to use a distributed transaction scheme like tic, so what is a distributed transaction? Here is
    the relevant content of spring cloud
  1. Operate the cache first or operate the database?
    如果我们能确保了这种原子性那就意味着我们的更新就一定能成功了吗?
    Not yet, we have to consider thread safety issues

Because there are two operations, cache operation and database operation, at this time, in the case of multi-thread concurrency, there may be multiple threads between these two operations at the same time. To, whoever has the operation will bring different thread safety issues

So what should we choose to do first?
In fact, it’s all possible. Which one to choose needs to be compared before making a choice. First operate the database and then delete the cache. In this case, the probability of thread safety is relatively low.
Specific analysis:

  • Delete the cache before operating the database

1. Under normal circumstances,
two threads execute concurrently.
As soon as the thread updates the cache, it first deletes the cache, and then updates the database to update the data to 20.
At this time, no matter which thread is querying, there will only be one situation, that is, no query is found. Cache, that is, the cache miss
. At this time, the database will be checked. After the database is 20 and the data is obtained, the thread writes the data to the cache, and the data in the cache becomes 20. At this time, the two are consistent
.

2. Abnormal situation
In the process of thread execution, another thread also comes in and executes, because we have not locked, so they can be executed in parallel.
If a thread wants to update the cache, first delete the cache
and the cache is gone, then continue it It is about to update
, but because the update business is more complicated, another thread 2 sneaks in. Its operation is to query
and because the thread deletes the cache, the query result of thread 2 is a cache miss. If the cache misses, thread 2 will query the database, because it took advantage of the situation, and the database has not been fully updated at this time, so the database is still the old value .
This causes thread 2 to query the old value. After querying, it will write the old data into the cache. At
this time, thread 1 finally starts to execute the update operation, and executes the update operation to change the value in the database to 20.
As a result, the database data is inconsistent with the cache data, and
the probability of such inconsistency is still very high, because thread one deletes the cache first and then updates the database, and deleting the cache is fast, but the action of updating the database is very slow , updating the database first It is necessary to organize the data and then update it, and it is a write operation.
The second thread is to query the cache first and then directly write to the cache. Because the write cache is written to redis, and the write operation of redis is often very fast (subtle level), so this write operation is faster than writing to the database
. It is fast to write and fast, so thread 2 can easily execute between the two operations of thread 1
, so the probability of this happening is quite high

  • Operate the database first and then delete the cache

1. Under normal circumstances,
it is assumed that thread 2 is going to complete the update. The operation it needs to complete the update is to operate the database first and then cache. Now update the database and change the value to 20. At this time, the cache will be deleted after updating the database, so it will go to Delete the cache.
At this time, any thread to query will miss the cache. At this time, go back to the database to query the data and write the obtained data into the cache. At this time, the query is an updated database, which is 20, so the database and the cache are consistent
.
2 .Abnormal situation
Assuming that there is a thread to query, and the cache is invalid at this time (maybe the cache time is up),
after the invalidation, when the thread checks, a cache miss will occur. The time is 10, and then write the obtained data into the cache.
But at this time another thread 2 is inserted, this thread is to update the database, the database is changed to 20, and then this thread deletes the cache, but the cache has expired, this deletion is equal to no deletion. Then thread two performs all operations.
Immediately after the thread starts to execute the write cache, the write cache at this time writes the old data (because another thread has updated it and he doesn't know it). At this
time, the two are inconsistent,
but the possibility of this happening is not High , because it needs to meet the following conditions:
first, two threads execute in parallel,
second, when the thread comes to query, the cache just fails , and then when the thread just checks the database and prepares to write to the cache when it fails ( note the write cache The operation is often at the subtle level ) A thread suddenly comes in this subtle range, it first updates the database ( the speed of updating the database is often relatively slow ) after the update, it deletes the cache, and then it is the turn of the thread one write.
In this regard, the reason why the probability of occurrence is small is that the possibility of so many database write operations to be completed within this microsecond time is not high. Because the cache speed is much higher than this database

Small details that need to be paid attention to when implementing the update strategy:

If an exception is thrown when deleting the cache, the transaction related to this piece must be rolled back, so this whole method should have a unified transaction, that is, add @Transactional outside the method that implements the update strategy

Because we are writing a single project now, the database operations and cache operations are all in one method, and we can control their atomicity through transactions.
However, if it is a distributed system, it will be more troublesome to achieve atomicity. It may be the database you updated but not the cache you deleted. Deleting the cache is done by another system. At this time, you may need to notify the other party asynchronously through mq, and then the other party will complete the processing of the cache. To ensure consistency on both sides, a scheme like TCC must be used to ensure consistency

Guess you like

Origin blog.csdn.net/weixin_52055811/article/details/131847523