Spike system solutions

Spike system solutions

Some solutions are summarized for the spike scenario (which can be extended to all high-concurrency scenarios) from the four levels of architecture, product, front-end, and back-end.

Summary of key points:

1. Architecture: capacity expansion, business separation, data separation

2. Product: Order button control, spike answering and peaking, simplify page design

3. Front end: current limit (anti-cheating) static

4. Backend: memory queue program counter distributed lock

1. The spike generally brings 2 problems:

1. High concurrency

The relatively hot number of online spikes starts at 10w. Such a high number of online users is a test for the structure of the website from front to back.

2. Oversold

There is a limit on the quantity of any product. How to avoid the number of people who successfully place an order to buy the product does not exceed the upper limit of the number of products. This is a difficult problem that every panic buying activity must face.

In fact, the oversold problem is a sub-problem brought about by high concurrency, but because this problem is too fatal, we take out his solution separately.

2. How to solve it?

1. Architecture level:

Spike architecture design principles:

  1. Try to intercept requests upstream of the system

  2. Read more write less commonly used and more use cache

Expansion

To put it bluntly, add a machine

System isolation

In order to avoid the impact of the large number of visits in a short period of time on the existing website business, the spike system can be deployed independently. System isolation is more of runtime isolation , which can be separated from the other 99% through group deployment. Spike also applied for a separate domain name, the purpose is to let the request fall into different clusters. Even if the spike system crashes, it will not affect the website.

Data isolation

Maintain the hot data that is about to be spiked to redis. Most of the data called by Spike is hot data. For example, a separate cache cluster or MySQL database will be used to store hot data. At present, we don't want 0.01% of the data to affect the other 99.99%.

Reduce inventory operation

One is to take photos to reduce inventory, and the other is to pay to reduce inventory; the current "photographed to reduce inventory" approach is just a moment, and the user experience will be better.

2. Product level:

1. Control the availability/disabling of the snap-purchase button on the spike product page.

The purchase button can only be lit when the spike starts. Before that, it is gray, indicating that the activity has not started.

2. Added spike answering questions, peak clipping based on time slicing

A very important purpose of the spike answer is to prevent spike devices. Another important function is to lengthen the peak order request from the previous 1s to 2~10s. The request peak is based on time slicing, and the time slicing is concurrent to the server. It is very important and will relieve a lot of pressure. In addition, due to the order of the requests, the later requests are naturally out of stock, and the final ordering step is not reached at all, so the real concurrent writing is very limited. In fact, this design idea is also very common at present. For example, Alipay’s "shoo-a-hoo" has been shaken by WeChat.

3. Simplified design of spike page:

The business needs of the spike scenario are different from general shopping. Users are more concerned about the ability to grab the goods rather than the user experience. Therefore, the seckill product page should be as simple as possible and the personal information such as the address after being photographed should use the default information to reduce the system load during the seckill. If there is any change, it can be changed after the seckill is over.

3. Front-end level

Static and page caching

Make static parts of the page static, and cache the static pages in the CDN , reverse proxy server, and possibly rent a server temporarily.
Using page static, data static, reverse proxy and other methods can avoid bandwidth and SQL pressure, but there is a problem that the page grabbing button will not be refreshed, and the js file can be placed on the js server separately. Another server writes timing tasks to control js push. Another problem is that js files will be cached by most browsers. We can use xxx.js?v= random number to avoid js being cached.

Current limit (anti-cheating)

1. For the same user id, the front-end js controls a client can only send the same request within a few seconds, and the back-end verifies the same uid and returns the same page within a few seconds

2. Realize for the same ip, perform ip detection, do not send a request or only return the same page within a few seconds of the same ip

3. To achieve multi-user and multi-ip, rely on data analysis

4. In order to prevent users from directly accessing the URL of the order page, the URL needs to be dynamic. Even the developers of the spike system cannot access the URL of the order page before the spike starts. The method is to add a random number generated by the server as a parameter to the URL of the order page, which can be obtained at the beginning of the spike.

4. Back-end level:

1. Add cache redis:

Because the spike is a typical scenario with more reads and less writes, it is suitable for operating memory rather than operating hard disks; the operation of the cache tool redis itself is guaranteed to be atomic, so the thread safety of the operation that requests redis writes can be guaranteed.

2. Join the message queue and use the queue to cut peaks:

Place user requests in one or more queues. The sum of the elements in the queue is equal to the sum of the product inventory, and all requests that have not entered the queue fail. Use multi-threaded polling to take out user requests from one or more queues. Operate redis to perform inventory reduction operations, return success after successfully reducing inventory, and store user information and product information in another queue to generate orders. Use two queues to process business asynchronously to reduce server load during spike peak periods.

3. Program counter:

Queue and cache In order to ensure that the number of requests for redis does not exceed the total inventory, a program counter is used to do this. The program counter can be implemented with atomic classes under the JUC package.

4. Distributed lock

In a distributed situation, distributed locks can be used to solve tasks that can only be executed by one service at a time and cannot be executed repeatedly.
Implementation of
distributed locks: optimization of zk and redis distributed locks: first consider whether the lock can be removed, and then consider using optimistic locks as much as possible, and less pessimistic locks. There is a problem here. If there is a concurrency conflict every time, the performance of optimistic locking is not as good as that of pessimistic locking. Is it true that the performance of optimistic locking is higher than that of pessimistic locking? Election considers ha, such as heartbeat detection.

5. Distributed lock removal scheme

Use clusters to join the queue concurrently, and the election queue processing service is executed at a single point, which can ensure that the concurrency is the same as the locking but will not affect the performance.

Guess you like

Origin blog.csdn.net/qq_37469055/article/details/105330524