12306 technology upgrade

The core of the upgrade is the upgrade of the remaining ticket query. The remaining ticket query uses stored procedures and sybase databases. The result is tragic. When the business concurrency is high, it cannot be scaled horizontally.

Secret 12306 Technical Transformation (3): Traditional Framework Cloud Migration to In-Memory Data The architecture description of the remaining ticket calculation/query subsystem before the platform

's 2012 renovation

1. The railway bureau provides ticket sales records in "real time"

. The seat sales of train tickets are planned and regulated by each railway bureau (there are 18 railway bureaus in the railway company), And 12306 shares the same "ticket bank" with each road bureau. In other words, "online" 12306 is the "offline" train station ticket window, telephone ticket sales, and ticket sales at various ticket offices. Each road bureau transmits the sales records of each ticket in "real time" to 12306 in the "main database server" of the Railway General Data Center for summarization.

2. The database is replicated to the remaining votes cluster:

in the data center of the railway company - the remaining votes calculation server cluster is composed of the "main database server" and 72 Unix minicomputers; A mechanism for database replication to "copy" data to 72 Unix minicomputers in real time.

3. Parallel processing of remaining ticket calculation: Among them, 8 minicomputers do ticket preprocessing, and then send the preprocessing results to 64 minicomputers, and 64 minicomputers process the remaining ticket calculation in "simultaneous parallel".

4. Application cache server: 64 minicomputers aggregate the results of the remaining votes calculation to generate the remaining votes table, and place the remaining votes table in the front-end application cache server cluster; this is to solve the problem of high concurrency in the cache server Provides a faster remaining ticket inquiry service.

5. CDN (Content Delivery Network) server: CDN network is deployed at the outermost periphery of 12306. Its purpose is to enable users to obtain the required information nearby, redirect users' requests to the service nodes closest to users to solve the situation of Internet network congestion, and Adjust the function of load balancing to improve the response speed of user access.

6. Remaining ticket information update mechanism: Remaining ticket information update is based on "train number" and is updated every 10 minutes. When the user submits the remaining votes query of the "interval site", the data is first inquired from the CDN server. If the CDN data has exceeded 10 minutes, the latest data will be obtained from the application cache server. Similarly, when the data on the application cache server exceeds the time limit, the remaining votes calculation process will be triggered, and the remaining votes will be recalculated to generate the remaining votes table, which will be submitted to the application cache server. The parameters updated every 10 minutes are adjustable.



12306 Transformation principles and goals

(1) Setting performance indicators:

Taking the remaining votes calculation subsystem as an example, the remaining votes calculation performance indicators of the Gemfire cluster need to reach more than 10,000 TPS, and can be flexibly expanded as the business grows. With the application cache server cluster and the front-end CDN load balancing server cluster, it is estimated that 12306 can support up to one million remaining ticket queries per second. The system performance in this section is related to the server CPU type, memory size, number of Gemfire nodes and number of x86 servers.

(2) The calculation and processing capacity of remaining tickets can increase linearly with the increase of virtual machines. The query of

remaining tickets is to reflect the latest remaining tickets as a basis for purchasing tickets. In the previous article, we have talked about the complexity of the calculation of the remaining tickets, which requires a strong CPU processing power to calculate the remaining tickets of each interval site; the calculation results are placed in the cache data server and CDN server. The calculation of the remaining votes is to use the technology of Gemfire Data Grid and Map Reduce to provide huge CPU processing power.

(3) Add a Gemfire cluster without changing the original system framework. The new and old systems operate in parallel.

To ensure the stability and smooth transition of the 12306 production environment, the new and old systems must run at the same time, and access is controlled by the front-end CDN server. Traffic, and gradually pressurize the requests that need to calculate the remaining votes to the Gemfire cluster, test the carrying capacity of Gemfire, and verify the scalable function.

(4) System High Availability (High Availability)

Gemfire's data nodes have backup data from other nodes, and can also persist memory data to hard disk or synchronously/asynchronously write to the database.

(5) x86 server

Considering the deployment of multiple data centers and hybrid clouds in the future, x86 servers must be used as production machines.

According to the 12306 transformation design principle, the Gemfire cluster will be added without changing the original system framework. During the transition period, the new and old systems will operate in parallel. After the transition period, the Gemfire cluster will be the main one. Therefore, the transformation of this step is to analyze and transform the second and third items of the above "Remaining vote calculation/query subsystem architecture before transformation".

12306 transformation steps:

(1) System architecture transformation and data upload

In order to ensure the stable operation and smooth transition of the 12306 production system, Gemfire cluster will be added without changing the original system framework, and the new and old systems will operate in parallel. Then you must consider how to synchronize the data from the database to the Gemfire cluster in real time for the calculation of the remaining votes.

A. Database replication server: add a database server to replicate the data aggregated by the "main database server" of the iron total data center to this database server in real time.

b. Real-time synchronization mechanism and SQL parsing server: This step is to take out Log data from the database and parse the SOL statement, because the original code was developed with Stored Procedure.

c. Rabbit MQ server: The synchronization mechanism server transmits the parsed SQL statement to the Gemfire cluster through Rabbit MQ to calculate the remaining votes.

D. After the Gemfire cluster summarizes the calculation results of the remaining votes, it submits it to the front-end application cache server to provide a quick query of the remaining votes.

(2) System data structure analysis

Data structure analysis and design is the most critical step in the transformation process, which affects the subsequent performance improvement; using the feature of "Data Grid Share Nothing", the data is cut and the data with "strong data correlation" is placed in the same Gemfire data nodes, combined with the design of business logic, reduce the delay of data exchange between nodes and improve the processing capacity of the system.

(3) Benchmark test

    Benchmark test is performed to evaluate the TPS and remaining vote calculation response time of each Gemfire server
    - to verify that the performance of the Gemfire cluster can increase linearly with the increase of x86 virtual machines.
    Test the "real-time synchronization" mechanism, the amount of Measure the stability, latency and throughput of data synchronization to the Gemfire cluster

(4) System stability, security and HA design

The Gemfire data nodes have backup data from other nodes to achieve the purpose of HA, and can also persist memory data to hard disk or database.

The implementation process of using Gemfire transformation:

1. Requirement analysis stage:

    demand research, determine transformation plan,
    analyze data table structure, sample data, data volume and application access characteristic information

2. Architecture design:

    design Region structure, determine partition method, and design The secondary index
    data partition (partition) is placed on Gemfire nodes, and the data size of each node is determined according to the memory of the physical machine. For example, the nodes in Alibaba Cloud are generally around 30G, and the nodes in the China Railway Data Center are around 60G-120G. From the project experience, Gemfire data nodes can get the best results with no more than 128G.
    Utilize the distributed technology of Data Grid and Map Reduce to provide powerful CPU processing power.
    The Gemfire functions used refer to the following description

3. Coding and unit testing:

    12306 is developed using Stored Procedure, which must be parsed, developed and coded in Java, and the business logic code and data are placed on the Gemfire node. This can be done
    Unit test

to     achieve     the

    highest system
    performance Redesign of the detailed architecture 5. HA and hot deployment test 6. Online operation and maintenance:     Create a Release version and incorporate it into the management     configuration to deploy the production environment and conduct continuous monitoring Main Gemfire features: During the 12306 renovation process, the following Gemfire Features:     Rich Objects: Express and code in the form of objects, define your own data structure     Elastic Growth w/o pausing: Elastic expansion and hot deployment     Partitioned Active Data: According to data attributes, such as: customer, order, train number, station name, do Reasonable data cutting, put in different data nodes.     Redundancy for instant FT: HA settings


















    Colocated Active Data: The concept of share nothing, putting highly correlated data in the same Gemfire data node, such as train number, station name
    Replicated Master Data: Common data with a small amount of data, such as station name dictionary and seat category, etc. , there is a copy in each Gemfire node, this is to reduce the exchange of data.
    Server-side Event Listeners: When the data changes on the server side, the corresponding operation will be triggered.
    Client-side Durable Subscriptions: When the server side satisfies some conditions of the client subscription, it will push information
    Parallel Map-Reduce Function Execution: number of trips The remaining votes are calculated in parallel on each Gemfire node, and the calculation results are aggregated
    Parallel OQL Queries
    Continuous Queries
    LRU Overflow to disk in native format for fast retrieval
    Parallel, Shared Nothing Persistence to disk w/ online backup
    Asynchronous Write Behind, Write Through or Read Through

geode Open source version of Gemfire

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326988514&siteId=291194637