How the retail industry prepares for events

acb4f70fb50df0d7f42c1beb3d9b1cb5.gif

The new titanium cloud service has shared 743 technical dry goods for you

4e48ec6419f5d0ff76dc56f496d18e3d.gif

background

04988c8b939c8893a746e9b06b7a6c61.jpeg

The retail industry usually faces events such as 618, Double Eleven, and anniversary celebrations. In the face of these important activities, you usually worry about whether resources need to be expanded ? Can the application withstand large concurrent requests ?

I have faced thousands of concurrent requests and the experience and lessons of problems in the activities of customers. It is hoped that the optimization and improvement brought about by these experiences and lessons can help more companies relieve the anxiety caused by engaging in activities.

solution

Before the event, we need to do some preparatory work, which can effectively avoid various situations in our application during the event.

First of all , we need to prepare an environment consistent with production for pressure testing. The purpose of the pressure test is to simulate the actual activity request to see if it can resist the concurrency pressure brought by the activity. It should be noted here that the pressure test must be carried out according to the actual possible requests. If the interfaces involved in several activities are simply pressed, it is impossible to fully expose the possible problems of real requests.

Secondly , we need to make the following preparations at the resource architecture level:

d93e2d40b14559809ca11fe1d9b9b9ce.jpeg

01

All resources are evaluated by DevOps, stress testers, and R&D leaders according to the pressure test, and then the resource configuration is given, including the limit of the container.

02

The virtual machine and K8S environment require auto-scaling to be enabled, and the configuration of auto-scaling is given by the DevOps and R&D director after evaluation.

03

If the database and other middleware are shared instances, you need to evaluate whether the impact is large if there is a problem. If the impact is large, you need to migrate at least one week before the event, and then migrate back after the event. A separate instance and lower configuration is recommended if the activity will be frequent throughout the year.

04

The database on the cloud needs to enable the read-write separation function, and test to confirm the consistency of the data about a week in advance.

05

If the configuration of the database on the cloud cannot be confirmed by the pressure test, you need to enable the elastic expansion configuration, and confirm that the application of the interruption time of the elastic upgrade is acceptable. Individual instances are enabled, and shared instances are disabled.

9f783959fa14f28445ebf0fa9037e681.jpeg

Finally , the following preparations should be made at the application level:

01

If you use the container application gateway, you need to enable circuit breaker and current limit. And conduct a test to evaluate the impact of the trigger after it is turned on. Used to ensure that the application will not be blown. Indicators are given after evaluation by DevOps and stress testers, as well as the person in charge of R&D.

02

If there are abnormal requests, block protection is required at the application level. For example, if the same user ID sends more than 10 coupon requests in the same second to an API, we consider it a non-human operation and requires a block account.

03

If there are a large number of normal requests to access the application, the application layer can set up a queuing page cache, and send the requests to the backend cache or database in batches according to the order in which the requests come in. For example, 500 requests are placed at a time, and another 500 requests are placed after processing. This can not only prevent the application from crashing, but also prevent the backend cache or database from being overwhelmed.

04

If there are a large number of normal requests to the database to fetch the same data, these data should be put into the cache after the first request. The request goes to the cache first, and then goes to the database if there is no cache. If the database is updated, the request goes to the database to get the data. , the cache is updated and then restored to the cache to fetch data. This reduces the pressure on the database.

05

According to the slow SQL feedback from the pressure test, the necessary indexes are established in advance.

06

Load possible hot data into Redis in advance, or extend the expiration time.

07

For data whose key does not exist in redis and does not exist in the database, the strategy can be to assign null and write it back to redis to prevent malicious attacks with non-existent ids from destroying the database.

Summarize

The preparations before the event should be prepared from two aspects: resource architecture and application. Prioritize preparations for optimization at the application level, supplemented by preparations for resource architecture optimization.

Because the resources themselves cannot solve the problem of large concurrency, but only provide a hosting environment. If there are some serious slow SQL, no matter how well the resource architecture is optimized, it will be blown up one day.

Therefore, we must focus on optimizing the application architecture. Combining the two, we will no longer feel anxious about engaging in activities, and can focus on business promotion.

daf2d70a17d2268a7d39bb0a4d712b47.jpeg

    recommended reading   

304e69b8498142a4e3b96db6db3ceefc.png

97c6cdf3c912958092fbf1e3b5f9bc71.png

    Learn about the new titanium cloud service   

Guess you like

Origin blog.csdn.net/NewTyun/article/details/130417944