Enterprise-level solutions

Cache warm-up

problem

"Downtime", the server goes down quickly after it is started

Troubleshooting

  • High number of requests
  • The data throughput between the master and the slave is large, and the data synchronization operation frequency is high

solution

Preparatory work

1. Daily routine statistical data access records, and statistics on hot data with high frequency of access

2. Use the LRU data deletion strategy to build a data retention queue

Ready to work:

3. Categorize the data in the statistical structure. According to the level, redis loads some hot data with a higher level

4. Use distributed multiple servers to read data at the same time to speed up the data loading process

Implementation:

1. Use a script program to permanently trigger the data warm-up process

2. If conditions permit, using CDN (Content Delivery Network), the effect will be better

Summary: Cache warm-up is to load the relevant cache data into the cache system in advance before the system starts to avoid the problem of querying the database first and then caching the data when the user requests it! The user directly queries the pre-heated cache data.

Cache avalanche

Database server crashed

1. During the smooth operation of the system, the database connection volume suddenly increased

2. The application server cannot process the request in time

3. A large number of 408 and 500 pages appeared

4. Customers repeatedly refresh the page to obtain data

5. The database crashes

6. The application server crashes

7. Restarting the application server is invalid

8. Redis server crashes

9. Redis cluster crashes

10. After restarting the database, it was overwhelmed by instantaneous traffic again

Troubleshooting

1. In a short period of time, more keys in the cache will expire in a concentrated manner

2. Request access to expired data during this period, redis misses, redis gets data from the database

3. The database receives a large number of requests at the same time and cannot be processed in time

4. A large number of Redis requests are backlogged and timeouts begin to occur

5. The database traffic surges and the database crashes

6. After restarting, there is still no data available in the cache

7. Redis server resources are seriously occupied and Redis server crashes

8. The Redis cluster collapsed and the cluster collapsed

9. The application server cannot get the database response request in time, the number of requests from the client is increasing, and the application server crashes

10. Application server, redis, and data are all restarted, but the effect is not ideal

Solution (Road)

1. More static page processing

2. Build a multi-level cache architecture: Nginx cache + redis cache + ehcache cache

3. Detect Mysql's serious time-consuming business for optimization

4. Disaster warning mechanism: monitor redis server performance indicators

5. Current limit and downgrade: Sacrifice some customer experience in a short period of time, restrict access to some requests, and reduce the pressure on the application server

Solution (Surgery)

1. Switch between LRU and LFU

2. Adjustment of data validity period strategy

  • According to the validity period of the business data, classify the peak misalignment, 90 minutes for class A, 80 minutes for class B, and 70 minutes for class C
  • Expired data uses the form of fixed time + random value to dilute the number of expired keys in the set

3. Use permanent keys for super hot data

4. Regular maintenance: analyze the access volume of the data that is about to expire, confirm whether it is delayed, and cooperate with the access volume statistics to delay the hot data

5. Lock it, but use it with caution! !

Summary: Cache avalanche means that the amount of data that expires instantly is too large, which causes pressure on the database server. If the concentration of expiration time can be effectively avoided, it can effectively solve the avalanche phenomenon (about 40%). Use it in conjunction with other strategies and monitor the server's performance. The running data can be adjusted quickly according to the running record.

Cache breakdown

Database server crashed

1. During the smooth operation of the system

2. Database connections surge in an instant

3. There is no expiration of a large number of keys in the Redis server

4. Redis memory is stable without fluctuations

5. The Redis server CPU is normal

6. The database crashes

Troubleshooting

1. A key in redis has expired, and the key has a huge amount of visits

2. After multiple data requests are directly pressed from the server to redis, none of them are hit

3. Redis initiated a large number of accesses to the same data in the database in a short period of time

problem analysis

Single key hot data

key expired

solution

1. Pre-set

Taking e-commerce as an example, each merchant designates a number of flagship products according to the store level, and increases the expiration time of such information keys during the shopping festival.

Note: Shopping Festival not only refers to the same day, and the following days, the peak visits show a gradual decrease trend

2. On-site adjustment

Monitor the traffic, extend the expiration time or set a permanent key for the data with a surge in natural traffic

3. Refresh data in the background

Start the scheduled task, refresh the data validity period before the peak period to ensure that it is not lost

4. Level 2 cache

Set different expiration times to ensure that they will not be eliminated at the same time

5. Locking

Distributed locks can prevent breakdown, but it is also a performance bottleneck. Use it with caution! ! !

Summary: Cache breakdown is the moment when a single hot data expires. The amount of data access is large. After redis is missed, a large number of database accesses to unified data are initiated, which causes pressure on the database server. The countermeasures should be business data analysis and prevention. Go ahead, cooperate with the running monitoring test and time adjustment strategy. After all, it is more difficult to monitor the expiration of a single key, just cooperate with the avalanche processing strategy.

Cache penetration

Database server crashed

1. During the smooth operation of the system

2. Application server traffic increases with time

3. The redis server hit rate gradually decreases over time

4. Redis memory is stable and there is no pressure on memory

5. Redis server CPU usage surged

6, the database server pressure surge

7. The database crashes

Troubleshooting

Misses in a large area in redis

Unusual URL access

problem analysis

  1. The acquired data does not exist in the database, and the corresponding data is not obtained by the database query
  2. Redis gets the null data without persisting, and returns directly
  3. Repeat the above process the next time such data arrives
  4. Hacker attacked the server

solution

1. Cache null: cache the data whose query result is null (long-term use, regular cleaning), set a short time limit, such as 30-60 seconds, up to 5 minutes

2. Whitelist strategy

  • Pre-heat the bitmaps corresponding to various categorized data IDs in advance. The ID is used as the offset of the bitmaps, which is equivalent to setting a data whitelist. It is released when normal data is loaded, and directly intercepted when abnormal data is loaded.
  • Use bloom filter

3. Implement monitoring

Real-time monitoring of the comparison between redis hit rate and null data

  • Fluctuations during inactive periods: 3-5 times are usually detected, and more than 5 times are included in the key investigation objects
  • Activity period fluctuations: 10-50 times are usually detected, and more than 50 times are included in the key investigation objects

According to the different multiples, start different investigation procedures, and then use the blacklist for prevention and control (operation)

4. Key encryption

  • After the problem occurs, temporarily start the disaster prevention business key, perform the business layer transmission encryption service on the key, set the verification procedure, and verify the key coming over
  • For example, randomly allocate 60 encrypted strings every day, choose 2 to 3, and confuse them into the page data id, find that the access key does not meet the rules, and reject the data access

Summary: Cache breakdown refers to accessing non-existent data, skipping the redis data caching phase of legitimate data. Each time you access the database, it causes pressure on the database server. Usually, the occurrence of such data is a low value. When such a situation arises, the use of poisons to fight poisons, and promptly report to the police, the response strategy should be more fuss about temporary prevention. Whether it is a blacklist or a whitelist, it is a pressure on the overall system, and it will be removed as soon as possible after the alarm is lifted. 

Guess you like

Origin blog.csdn.net/kidchildcsdn/article/details/113923168