How to fix spike performance issues and oversold discussions

How to fix spike performance issues and oversold discussions - billy Peng

Recently, the business tested water e-commerce, and received a spike. Before, I often saw Taobao's colleagues discussing spikes and e-commerce. This time it is finally our turn to combine theory with practice.

ps: Before entering the text, let me tell you a little about my personal feelings. I understood the feeling of Taobao’s ppt before. When I came up with a solution, I found that there were still many unexpected places that I didn’t understand, which once again verified the theory of “details are the devil”. . And a person's ability is limited, only by discussing together can we think more comprehensively and carefully. Well, let’s stop talking, let’s enter the text below.

1. What did the spike bring?


Lightning strikes or snap-buying activities generally go through three major links: [reservation], [order grabbing], and [payment]. Among them, [order grabbing] is the link that tests the business provider’s ability to withstand pressure the most.

The order grab link generally brings 2 problems:

1. High concurrency

The number of people online in the most popular spikes starts from 10w, and such a high number of people online is a test for the website structure from front to back.

2. Oversold

There is a limit on the quantity of any product. How to avoid the number of people who successfully place an order to buy the product does not exceed the upper limit of the number of products. This is a difficult problem that every panic buying activity must face.

2. How to solve it?


First, we will not discuss product solutions. We only discuss technical solutions

1. Front end

In the face of high-concurrency snap-up activities, the commonly used three-axes at the front end are [expansion] [static] [current limit]

A: Expansion

Add machines, which are the easiest way to resist spikes by increasing the overall capacity of the front-end pool.

B: static

Make all static elements on the active page static and minimize dynamic elements. Anti-peak through CDN.

C: Current limiting

Generally, IP-level current limiting is used, that is, for a certain IP, the number of requests initiated per unit time is limited.

Or add games or problem links to reduce peaks at the event entrance.

D: loss of service

As a last resort, when approaching the water level limit of the front-end pool's carrying capacity, randomly reject some requests to protect the overall availability of the activity.

2. Backend

So what problems will the back-end database encounter under high concurrency and overselling? There will be the following three main problems: (mainly discuss the problem of writing, the problem of reading can be easily solved by increasing the cache)

I: First of all, MySQL itself will have problems with high concurrency processing performance. Generally speaking, MySQL processing performance will increase with the increase of concurrent threads, but after a certain degree of concurrency, there will be an obvious inflection point, and then it will decline all the way. In the end it will even be worse than single thread performance.

II: Secondly, the root cause of oversold is that the inventory reduction operation is a transactional operation, which requires first select, then insert, and finally update -1. The last -1 operation cannot have negative numbers, but when multiple users operate concurrently with inventory, negative numbers are unavoidable.

III: Finally, when inventory reduction and high concurrency come together, since the number of operating inventory is in the same row, there will be a problem of competing for InnoDB row locks, resulting in mutual waiting or even deadlock, which greatly reduces MySQL processing. performance, and eventually lead to a timeout exception on the front-end page.

How to solve the above problems? Let's take a look at Taobao's high-end solutions:

I: Disable deadlock detection to improve concurrent processing performance.

II: Modify the source code to mention the queue before entering the engine layer to reduce the concurrency at the engine layer.

III: Group submission, reduce the number of interactions between the server and the engine, and reduce IO consumption.

For the above content, please refer to the article "The Inefficiency of MySQL in the Second Kill Scenario" shared by Ding Qi at DTCC2013 . After all the optimizations in this article are used, TPS soars from the original 150 to 8.5w under high concurrency, an increase of nearly 566 times, which is very scary! ! !

However, combined with our actual situation, the high-level solution of changing the source code is obviously a little impractical. So the friends need to discuss a solution that suits our actual situation. Here are the solutions we discussed:

First of all, set a premise. In order to prevent oversold, all stock reduction operations need to be checked after reduction to ensure that the reduction cannot be equal to a negative number. (Due to the nature of MySQL transactions, this method can only reduce the amount of oversold, but it is impossible to avoid oversold completely)

updatenumberset x=x-1where (x -1 ) >=0;    

Solution 1:

Move the repository from MySQL to Redis, and put all write operations in memory. Since there is no lock in Redis, there will be no waiting for each other, and because the write performance and read performance of Redis are much higher than MySQL, this Solved the performance problem under high concurrency. Then, the changed data is asynchronously written to the DB through asynchronous means such as queues.

Pros: Solve performance issues

Disadvantages: The oversold problem is not solved. At the same time, due to asynchronous writing to the DB, there is a risk of data inconsistency in the DB and Redis at a certain moment.

Solution 2:

A queue is introduced, and then all write DB operations are queued in a single queue and processed completely serially. When the inventory threshold is reached, it will not be in the consumption queue, and the purchase function will be closed. This solves the oversold problem.

Pros: Fix oversold issues, slightly improve performance.

Disadvantages: The performance is limited by the shortest between the processing performance of the queue processor and the writing performance of the DB. In addition, multiple queues need to be prepared when multiple products are snapped up at the same time.

Solution 3:

The write operation is moved forward to the MC, and the lightweight lock mechanism CAS of the MC is used to realize the inventory reduction operation.

Advantages: Read and write in memory, fast operation performance, after the introduction of lightweight locks, it can ensure that only one write succeeds at the same time, solving the problem of reducing inventory.

Disadvantages: There is no actual measurement. Based on the characteristics of CAS, I do not know whether there will be a large number of update failures under high concurrency? However, after locking, it will definitely affect the concurrency performance.

Solution 4:

Change the submission operation into two-stage, apply first and then confirm. Then use Redis's atomic auto-increment operation (compared to MySQL's auto-increment), and use Redis's transaction feature to issue numbers to ensure that anyone who gets a number less than or equal to the inventory threshold can successfully submit an order. Then the data is updated asynchronously to the DB.

Advantages: Solve the oversold problem, the inventory read and write are all in memory, so the performance problem is solved at the same time.

Disadvantage: Due to asynchronous writing to DB, there may be data inconsistency. In addition, there may be less purchases, that is, if the person who gets the number does not actually place an order, the inventory may be reduced to 0, but the number of orders has not reached the inventory threshold.

3. Summary


1. Front-end three axes [expansion] [current limit] [static]

2. Two paths at the back end [memory] + [queuing]

4. Non-technical impressions


1. The power of the team is endless, and various solutions (not to mention the feasibility) are discussed in the small partners. We need to give everyone a voice and not rush to deny it.

2. Optimization needs to think from the overall level, don’t just focus on the part that you are responsible for, if you only focus on one point of thinking, you may end up in a dead end.

3. There are many things that you think you will understand after reading it, but it is not. Still need to practice, otherwise the knowledge of others will never become one's own.

4. Think more about why and what will happen, and don't take it for granted. Only then can it go deep, not stay on the surface.

ps: The above are just some of the solutions we discussed, and you are welcome to discuss various feasible solutions together. 

 

Other related articles:

The emergence and solution of Taobao oversold phenomenon

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326924410&siteId=291194637