I. Introduction
After publishing the last article, Case Sharing of Online Real Queuing System Refactoring—Practice , some friends asked us how the refactoring progress is. Up to now, the refactoring of our passenger queuing system has been launched, and it has been grayed out for one month. At present, it has been running stably, and judging from the current results, it is still far beyond expectations. This article mainly talks about the stress test scheme for passenger queuing scenarios and some personal summaries.
2. How to evaluate the performance of a queuing system
Regarding the pressure test of the queuing system, I also collided with the operation and maintenance classmates and test classmates for a long time, and everyone disagreed. Because before, there was no real evaluation of the performance of the queuing system, and there was no standard. Based on the current online scene (currently the top 10 cities), my analysis is as follows:
Passengers queuing to form a queuing timing peak period, time period (8:00~10:00 18:00~19:00 21:00~23:00)
Average waiting time in line (departure time - entry time) 1min ~ 5min
Queuing ratio in major cities during the peak period (number of queued orders/total orders of the day) 10% ~ 38%
It can be seen that the queuing performance evaluation index—the 5-minute time window supports the maximum queuing quantity (take the limit value of 5 minutes).
3. Pressure measurement target
Currently: Passengers line up nationwide, 10% ~ 38% of orders enter the queue, we calculate as 50% enter the queue, the current peak period is 30,000/QPM, and the calculation is: 30,000 * 5 * 0.5 = 75,000
Goal: According to the target pressure test of increasing the current order volume by 5 times, that is, within 5 minutes, 375,000 orders can be queued at the same time
4. Pressure testing steps
serial number | step | Observation index | operate |
---|---|---|---|
01 | Order dispatched after placing an order——historical process | The historical process supports the maximum number of queued orders in 5 minutes, and the interface QPS situation | Turn off the switch, and the order will be canceled within 5 minutes |
02 | Order dispatched after placing an order - new process | The new process supports the maximum number of queued orders in 5 minutes, and the interface QPS situation | Turn on the switch, and the order will be canceled within 5 minutes |
In the historical process, when the number of simultaneous enqueues reached 100,000 orders within 5 minutes, a large number of timeout exceptions occurred on the interface, reaching the performance bottleneck.
The new process is as follows - 50W orders are queued within 5 minutes, and there is no abnormality. At this time, the important interfaces are as follows:
interface | Current QPS | Pressure target | Pressure measurement QPS | Average RT |
---|---|---|---|---|
enqueue | 300 | 1500 | 3000 | 12ms |
dequeue | 300 | 1500 | 3000 | 40ms |
is in the queue | 3000 | 15000 | 15000+ | 4ms |
Query the queue position | - | - | 8500 | 8ms |
5. Passenger queuing reconstruction and comparison between old and new
The number of orders queued at the same time in 5 minutes | The maximum number of queues supported by a single hive | |
---|---|---|
before refactoring | <10W | <1000 |
After refactoring | >50W | unlimited |
After refactoring, the average RT of the query interface is reduced by 65% as a whole, and the average RT of the update interface is reduced by 40%. There is no performance bottleneck, and it can be expanded horizontally in the later stage.
Six summary
Only 2 people were invested in the development manpower for this refactoring (manpower is limited), and the development time only took 7 days, a total of more than 20 interface transformations, 3 scheduled task scripts, and background configuration management. Under the premise of tight time and heavy tasks, it is still carried out in an orderly manner, and there are few bugs in the test feedback in the later test stage.
Up to now, there have been 6-7 leading refactoring projects, and I have done a lot of refactoring, and I have formed my own routines and methods. The scheme is very mature, and many pitfalls in details can be avoided. Here, friends who encounter bottlenecks in the system or have refactoring needs are welcome to communicate with friends who encounter difficulties.
Welcome to pay attention to the "Talking about Architecture" public account, and share original technical articles from time to time.