written in front
In the previous article, we talked about how to design a feasible refactoring technical solution——theoretical article. This article mainly introduces a complete system refactoring project based on the recent online refactoring project—passenger queuing system refactoring. Construct technical solutions.
Detailed technical solution introduction
1. Background
1. Status:
* At present, the performance bottleneck of online passenger queuing is obvious, and the Redis List storage structure is mainly used. As the number of orders in the queue increases, the RT of operations such as querying, inserting, and judging whether the order is in the queue increases exponentially.
* The current passenger queuing structure cannot meet the needs of business expansion. In order to support the rapid iteration of business in the future, passenger queuing reconstruction is imminent.
2. Research items
* Feasibility analysis of using Mysql to store queuing information (offline environment pressure test)
* Combing of external interfaces and scope of influence (analysis of about 20 external interfaces currently provided),
The form is as follows:
interface name | interface path | caller | SWC | RT(995) | Average RT | Remark |
---|---|---|---|---|---|---|
enqueue | /queue/enter | XXX | XXX | XXX | XXX |
2. Goals
1. The external interface remains unchanged, transformed from the underlying storage, compatible with the current online display scene, and the passenger ranking display and dequeue decoupling.
The ranking display reserves ordinary queues, channel queues, and priority queues (including absolute priority), sorted by enqueue time
The queue sorting factor is calculated according to fixed rules when entering the queue, and a more flexible strategy algorithm is used to calculate the queue priority.
2. Redis orderly collection is used for data storage ranking, and mysql storage is added for queue information, which is divided into 128 tables.
3. Solve the current performance bottleneck problem, support the rapid iteration of subsequent business, and the expansion of subsequent requirements.
3. The overall plan
1. Comparison of old and new solutions
Storage architecture before refactoring: redis: list data structure, key: honeycomb center point + car model + queue type
Refactored storage architecture:
Ranking queue: redis ordered collection, key: honeycomb center point + model + queue type (for compatibility with old ones)
Queue information table: queue_info_xxx, stored in mysql, divided into tables according to the hash of the honeycomb center point, and build a joint unique index based on order number + model
New-old comparison of some interfaces
interface | view ranking | is in the queue | enqueue | dequeue | jump in line |
---|---|---|---|---|---|
before refactoring | 1. Loop through all elements in all queues, and loop through to determine the calculation position. 2. Query the algorithm group to calculate the estimated time | Traverse and query all elements of the queue, loop to determine whether to contain | First judge whether it exists in the queue, and here it will also judge whether it is written into the redis queue (list) according to different queue types hit | According to the model cycle & multi-queue type cycle out of the team, and record the log | Benefit card jumping in line |
After refactoring | Query the queuing information from the "queue information table" through the order number. If there is a queuing record, judge whether there is a ranking. If there is no ranking, M+ is displayed (the ranking queue has online control), otherwise query the "ranking queue" and return the order directly. Query algorithm group to calculate estimated time | Directly query the "queue information table" to determine whether there is a record | Write into the "queue information table" first, and if it does not exceed the ranking threshold, write the corresponding "ranking queue" | Update the status of the "queue information table". If there is a ranking, it will be removed from the ranking queue, and the candidate will be notified asynchronously, and the log will be recorded | The queue order can be changed directly by updating the "order_by" field of the queue information table |
Bottleneck analysis before refactoring: Each request will take out all the elements in the queue and loop through it (when the number of queued orders increases, the RT will increase exponentially, which is a big deal. You can think about the reason?)
Advantages of the refactored storage architecture: Change the original O(n) time complexity to O(1) complexity.
2. Architecture diagram after refactoring
Questions about queue size statistics:
Ranking Unlimited Flow Queue: Obtain directly through ZCARD (O(1) time complexity)
Ranking current-limiting queue: obtain the total length (O(1)) through the counter, and obtain the downgrade through ZCARD
2) Regarding new capacity matching—the query list [orange part] may have a bottleneck problem—there are 2 optimization directions in the later stage, which can be ranked top N and extracted from the buffer collection queue.
Other flowcharts: enqueue and dequeue flowcharts (omitted here)
table structure design
queue_info_[001 ~ 128] : queuing information table is divided into tables according to the hive centerline point hash % 128 rule, and the data is archived by day
queue_manager : The ranking queue management table mainly controls whether the current limit state is present, and the hive queue information
queue_log_[001~128]: Order entry & exit record table, divided into tables according to the hive midline point hash % 128 rule, and will be considered for archiving later.
Detailed table structure - omitted
4. Design of sort field (order_by)
For queuing scenarios, the shorter the time, the earlier. The time difference can be calculated in reverse order, the formula is as follows: ~(-1L << 39L) & (~(millisecond time difference))
Other rules are omitted here.
5. Compatibility issues with historical queue scenarios
Rank display: common queue, channel queue, priority queue
Order out of the queue: Through different configurations of weight coefficients, different sorts are finally calculated
6. Grayscale scheme
According to the gray scale of the city, choose a city with low traffic first.
7. Rollback scheme
Turn off the city grayscale switch, the existing data in the queue will be affected, and the migration tool needs to refresh the data
8. Data archiving plan & bottom-up plan
Data archiving: Passenger queuing information is archived by day
Bottom line strategy: long-term (configurable) queuing status has not changed (may be abnormal), forced to exit
9. Data monitoring & alarm
Passenger queuing Grafana monitoring: Monitoring indicators: city, hive, model, number of common queues, number of channel queues, number of priority queues Alarm: Dingding alarm when the number of queues exceeds the threshold
10. Time Planning
Interfaces for program research (20 interfaces) add renovation programs, responsible persons, and progress items
interface name | interface path | caller | SWC | RT(995) | Average RT | Remark | Retrofit plan | Responsible | schedule |
---|---|---|---|---|---|---|---|---|---|
enqueue | /queue/enter | XXX | XXX | XXX | XXX |
Note: Interface self-test and CR are completed in the development phase, monitoring alarms do not affect the development of the test, and can be developed in the testing phase.
11. Association group
slightly
12. Required resources
slightly
Summarize
Refactoring needs to take into account a lot of details, and needs to take into account every possible bottleneck, as well as subsequent optimization and expansion issues.
All changes must be personally responsible (to avoid omissions), and all self-tests (unit tests) must be passed before testing.
At present, the code development of this solution has been basically completed. The next article will continue to use the reconstruction of the queuing system as a scenario, and will talk about how to design the stress test solution in the grayscale stage. Please look forward to it.
Welcome to pay attention to the official account of "Talking about Architecture", share original technical articles from time to time, and have the opportunity to share the technical details of system reconstruction with you in the future.