Tens of millions of passenger queuing system reconstruction & stress testing program - summary

I. Introduction

After publishing the last article, Case Sharing of Online Real Queuing System Refactoring—Practice , some friends asked us how the refactoring progress is. Up to now, the refactoring of our passenger queuing system has been launched, and it has been grayed out for one month. At present, it has been running stably, and judging from the current results, it is still far beyond expectations. This article mainly talks about the stress test scheme for passenger queuing scenarios and some personal summaries.

2. How to evaluate the performance of a queuing system

Regarding the pressure test of the queuing system, I also collided with the operation and maintenance classmates and test classmates for a long time, and everyone disagreed. Because before, there was no real evaluation of the performance of the queuing system, and there was no standard. Based on the current online scene (currently the top 10 cities), my analysis is as follows:

  1. Passengers queuing to form a queuing timing peak period, time period (8:00~10:00 18:00~19:00 21:00~23:00)

  2. Average waiting time in line (departure time - entry time) 1min ~ 5min

  3. Queuing ratio in major cities during the peak period (number of queued orders/total orders of the day) 10% ~ 38%

It can be seen that the queuing performance evaluation index—the 5-minute time window supports the maximum queuing quantity (take the limit value of 5 minutes).

3. Pressure measurement target

Currently: Passengers line up nationwide, 10% ~ 38% of orders enter the queue, we calculate as 50% enter the queue, the current peak period is 30,000/QPM, and the calculation is: 30,000 * 5 * 0.5 = 75,000

Goal: According to the target pressure test of increasing the current order volume by 5 times, that is, within 5 minutes, 375,000 orders can be queued at the same time

4. Pressure testing steps

serial number step Observation index operate
01 Order dispatched after placing an order——historical process The historical process supports the maximum number of queued orders in 5 minutes, and the interface QPS situation Turn off the switch, and the order will be canceled within 5 minutes
02 Order dispatched after placing an order - new process The new process supports the maximum number of queued orders in 5 minutes, and the interface QPS situation Turn on the switch, and the order will be canceled within 5 minutes

In the historical process, when the number of simultaneous enqueues reached 100,000 orders within 5 minutes, a large number of timeout exceptions occurred on the interface, reaching the performance bottleneck.

The new process is as follows - 50W orders are queued within 5 minutes, and there is no abnormality. At this time, the important interfaces are as follows:

interface Current QPS Pressure target Pressure measurement QPS Average RT
enqueue 300 1500 3000 12ms
dequeue 300 1500 3000 40ms
is in the queue 3000 15000 15000+ 4ms
Query the queue position - - 8500 8ms

5. Passenger queuing reconstruction and comparison between old and new


The number of orders queued at the same time in 5 minutes The maximum number of queues supported by a single hive
before refactoring <10W <1000
After refactoring >50W unlimited

After refactoring, the average RT of the query interface is reduced by 65% ​​as a whole, and the average RT of the update interface is reduced by 40%. There is no performance bottleneck, and it can be expanded horizontally in the later stage.

Six summary

Only 2 people were invested in the development manpower for this refactoring (manpower is limited), and the development time only took 7 days, a total of more than 20 interface transformations, 3 scheduled task scripts, and background configuration management. Under the premise of tight time and heavy tasks, it is still carried out in an orderly manner, and there are few bugs in the test feedback in the later test stage.

Up to now, there have been 6-7 leading refactoring projects, and I have done a lot of refactoring, and I have formed my own routines and methods. The scheme is very mature, and many pitfalls in details can be avoided. Here, friends who encounter bottlenecks in the system or have refactoring needs are welcome to communicate with friends who encounter difficulties.

Welcome to pay attention to the "Talking about Architecture" public account, and share original technical articles from time to time.

5a5232fd0e9d54927b15f449b475046a.png

file

Guess you like

Origin blog.csdn.net/weixin_38130500/article/details/126093194
Recommended