Sharing of LeTV Group's payment system architecture that processes 100,000 concurrent orders per second

With the continuous escalation of LeTV hardware rush purchases, the request pressure faced by LeTV Group’s payment has increased hundreds or even thousands of times. As the last link of commodity purchase, it is particularly important to ensure that users complete the payment quickly and stably. Therefore, in November 2015, we carried out a comprehensive architecture upgrade to the entire payment system, enabling it to stably process 100,000 orders per second. It has provided strong support for various forms of panic buying and killing activities in the LeEco ecosystem.

1. Library sub-table

In the Internet era when redis, memcached and other cache systems are prevalent, it is not complicated to build a system that supports 100,000 read-only per second. It is nothing more than expanding cache nodes through consistent hashing and horizontally expanding web servers. The payment system needs to process hundreds of thousands of orders per second, requiring hundreds of thousands of database update operations (insert plus update) per second, which is an impossible task on any independent database, so we must first It is the sub-library and sub-table of the order table (referred to as order).

When performing database operations, there is generally a user ID (referred to as uid) field, so we choose to use uid for sub-database and sub-table.

We chose the "binary tree sub-database" as the sub-database strategy. The so-called "binary tree sub-database" means that when we expand the database, we always expand the capacity in multiples of 2. For example: 1 to 2, 2 to 4, 4 to 8, and so on. The advantage of this sub-database method is that when we expand capacity, we only need the DBA to perform table-level data synchronization, and we do not need to write scripts for row-level data synchronization.

It is not enough to have sub-databases alone. After continuous stress testing, we found that in the same database, the efficiency of concurrent updates to multiple tables is much greater than that of one table. Split the order table into 10 parts: order_0, order_1, ...., order_9.

Finally, we put the order table in 8 sub-databases (numbered 1 to 8, corresponding to DB1 to DB8 respectively), and each sub-database has 10 sub-tables (numbered 0 to 9, corresponding to order_0 to order_9 respectively), the deployment structure As shown below:

image description

Calculate database number based on uid:

database number = (uid / 10) % 8 + 1

Calculate the table number according to the uid:

table number = uid % 10

When uid=9527, according to the above algorithm, the uid is actually divided into two parts 952 and 7, of which 952 modulo 8 plus 1 equals 1 for the database number, and 7 for the table number. Therefore, the order information of uid=9527 needs to be searched in the order_7 table in the DB1 library. The specific algorithm flow can also be seen in the following figure:

image description

With the structure and algorithm of sub-database and sub-table, the last thing is to find the implementation tool for sub-database and sub-table. Currently, there are two types of sub-database and sub-table tools on the market:

  1. Client sub-database sub-table, complete the sub-database sub-table operation on the client, directly connected to the database
  2. Using the sub-database and sub-table middleware, the client is connected to the sub-database and sub-table middleware, and the sub-database and sub-table operations are completed by the middleware

These two types of tools are available on the market, and I will not list them all here. In general, these two types of tools have their own advantages and disadvantages. The client-side sub-database and sub-table are directly connected to the database, so the performance is 15% to 20% higher than that of using the sub-database and sub-table middleware. The use of sub-database and sub-table middleware, due to unified middleware management, isolates sub-database and sub-table operations from clients, and the module division is clearer, which is convenient for DBA to manage in a unified manner.

We chose to sub-database and sub-table on the client side, because we developed and open-sourced a set of data layer access framework, which is code-named "Mango". The Mango framework natively supports sub-database and sub-table function, and it is very simple to configure.

  • Mango Homepage: mango.jfaster.org
  • Mango source code: github.com/jfaster/mango

2. Order ID

The ID of the order system must have globally unique characteristics. The easiest way is to use the sequence of the database to obtain a globally unique self-incrementing ID for each operation. If it is to support processing 100,000 orders per second, it will be at least It is necessary to generate 100,000 order IDs, and it is obviously impossible to complete the above requirements by generating self-incrementing IDs through the database. So we can only get the globally unique order ID through memory calculation.

The most famous unique ID in the JAVA field should be UUID, but UUID is too long and contains letters, so it is not suitable as an order ID. Through repeated comparison and screening, we borrowed Twitter's Snowflake algorithm to achieve a globally unique ID. Below is a simplified diagram of the order ID:

image description

The picture above is divided into 3 parts:

  1. timestamp

The granularity of the timestamp here is milliseconds. When generating the order ID, System.currentTimeMillis() is used as the timestamp.

  1. machine code

Each order server will be assigned a unique number. When generating an order ID, you can directly use the unique number as the machine number.

  1. Self-incrementing serial number

When there are multiple requests to generate an order ID in the same millisecond on the same server, the sequence number will be incremented in the current millisecond, and the sequence number will continue to start from 0 in the next millisecond. For example, there are 3 requests to generate order IDs on the same server in the same millisecond, and the self-incrementing sequence numbers of these 3 order IDs will be 0, 1, and 2, respectively.

Combining the above three parts, we can quickly generate a globally unique order ID. However, global uniqueness is not enough. In many cases, we will directly query the order information only based on the order ID. At this time, since there is no uid, we do not know which sub-database to query and traverse all the tables of all the libraries? This obviously doesn't work. Therefore, we need to add the information of the sub-database and sub-table to the order ID. The following is a simplified structure diagram of the order ID with sub-database and sub-table information:

image description

We added the information of sub-database and sub-table to the header of the generated global order ID, so that we can quickly query the corresponding order information only based on the order ID.

What exactly does the sub-database sub-table information include? As discussed in the first part, we split the order table into 8 databases according to the uid dimension, and each database has 10 tables. The simplest sub-database sub-table information only needs a string of length 2 to be stored. 1 bit stores the database number, ranging from 1 to 8, and the second bit stores the table number, ranging from 0 to 9.

Or according to the algorithm of calculating database number and table number according to uid in the first part, when uid=9527, sub-database information=1, sub-table information=7, combine them, the two-digit sub-database and sub-table information is "17" ". The specific algorithm flow is shown in the following figure:

image description

There is no problem in using the table number as the sub-table information above, but there are hidden dangers in using the database number as the sub-database information. Considering the future expansion requirements, we need to expand the 8 database to the 16 database. At this time, the value ranges from 1 to 8. The database information will not be able to support the sub-database scenarios from 1 to 16, and the sub-database routing will not be completed correctly. We will refer to the appeal problem as the loss of sub-database information accuracy.

In order to solve the problem of losing the accuracy of sub-database information, we need to make redundant the accuracy of sub-database information, that is, the sub-database information we save now should support future expansion. Here we assume that we will eventually expand to 64 databases, so the new sub-database information algorithm is:

Branch information = (uid / 10) % 64 + 1

When uid=9527, according to the new algorithm, the sub-database information=57, the 57 here is not the number of the real database, it redundant the accuracy of the sub-database information finally extended to 64 databases. We currently only have 8 databases, and the actual database number needs to be calculated according to the following formula:

Actual database number = (sub-database information - 1) % 8 + 1

When uid=9527, sub-database information=57, actual database number=1, sub-database and sub-table information=”577”.

Since we choose modulo 64 to save the sub-database information after precision redundancy, the length of the saved sub-database information is changed from 1 to 2, and the length of the final sub-database and sub-table information is 3. The specific algorithm flow can also be seen in the following figure:

image description

As shown in the figure above, when calculating the sub-database information, the modulo 64 method is used to redundant the accuracy of the sub-database information, so that when our system needs to be expanded to 16 libraries, 32 libraries, and 64 libraries in the future, there will be no problems.

The above order ID structure has been able to meet our current and future expansion needs, but considering the uncertainty of the business, we add a digit to the front of the order ID to identify the version of the order ID, this version number It is redundant data and is not currently used. The following is a simplified structure diagram of the final order ID:

image description

Snowflake algorithm: github.com/twitter/snowflake

3. Eventual Consistency

So far, we have achieved ultra-high concurrent writing and updating of the order table by sub-database and sub-table of the uid dimension of the order table, and can query order information through uid and order ID. However, as an open group payment system, we also need to query order information through the business line ID (also known as merchant ID, bid for short), so we introduced the order table cluster of the bid dimension, and the order table cluster of the uid dimension was redundant When you want to query order information based on bid in an order table cluster of bid dimension, you only need to query the order table cluster of bid dimension.

Although the above solution is simple, it is very troublesome to maintain the data consistency of the two order table clusters. The two table clusters are obviously in different database clusters. If a distributed transaction with strong consistency is introduced in writing and updating, it will undoubtedly greatly reduce system efficiency and increase service response time, which is unacceptable. Therefore, we introduce a message queue for asynchronous data synchronization to achieve eventual consistency of data. Of course, various exceptions in the message queue will also cause data inconsistency, so we have introduced a real-time monitoring service to calculate the data difference between the two clusters in real time and perform consistent synchronization.

Below is a simplified consistent synchronization diagram:

image description

Fourth, the database is highly available

No machine or service is guaranteed to run smoothly online without failure. For example, at a certain time, the main database of a certain database is down. At this time, we will not be able to perform read and write operations on this database, and online services will be affected.

The so-called database high availability refers to: when the database fails due to various reasons, the database service can be restored in real time or quickly and the data can be repaired. From the perspective of the entire cluster, it seems that there is no problem. It should be noted that restoring the database service here does not necessarily mean repairing the original database, but also switching the service to another standby database.

The main work of database high availability is database recovery and data repair. Generally, we use the length of time to complete these two tasks as the standard to measure the quality of high availability. There is a vicious circle problem here, the longer the database recovery time, the more inconsistent data, the longer the data repair time will be, and the longer the overall repair time will become. Therefore, the rapid recovery of the database has become the top priority of the high availability of the database. Imagine if we can complete the database recovery within 1 second of the database failure, repairing inconsistent data and costs will be greatly reduced.

The following figure is a most classic master-slave structure:

image description

In the above figure, there is 1 web server and 3 databases, of which DB1 is the master database, and DB2 and DB3 are slave databases. We assume here that the web server is maintained by the project team and the database server is maintained by the DBA.

When there is a problem with the database DB2, the DBA will notify the project team. The project team will delete DB2 from the configuration list of the web service and restart the web server, so that the faulty node DB2 will no longer be accessed, and the entire database service will be restored. Wait for the DBA When DB2 is repaired, the project team will add DB2 to the web service.

When there is a problem with the main database DB1, the DBA will switch DB2 to the main database and notify the project team. The project team uses DB2 to replace the original main database DB1 and restarts the web server, so that the web service will use the new main database DB2, while DB1 It will no longer be accessed, and the entire database service will be restored. When DBA repairs DB1, DB1 can be used as a slave library of DB2.

The above classic structure has great drawbacks: no matter if there is a problem with the master library or the slave library, the DBA and the project team need to cooperate to complete the database service recovery, which is difficult to automate, and the recovery project is too slow.

We believe that database operation and maintenance should be separated from the project team. When there is a problem with the database, the DBA should achieve unified recovery, and the project team does not need to operate the service, which facilitates automation and shortens the service recovery time.

First look at the high-availability structure diagram from the library:

image description

As shown in the figure above, the web server will no longer directly connect to the slave libraries DB2 and DB3, but will connect to the LVS load balancing, which will be connected to the slave library by LVS. The advantage of this is that LVS can automatically sense whether the slave database is available. After the slave database DB2 goes down, LVS will not send the read data request to DB2 again. At the same time, when the DBA needs to increase or decrease the slave database node, it only needs to operate LVS independently, and it is no longer necessary for the project team to update the configuration file and restart the server to cooperate.

Let's look at the high availability structure diagram of the main library:

image description

As shown in the figure above, the web server will no longer directly connect to the main library DB1, but connect to a virtual ip virtualized by KeepAlive, and then map this virtual ip to the main library DB1, and add the DB_bak slave library to synchronize the data in DB1 in real time. data. Under normal circumstances, the web still reads and writes data in DB1. When DB1 is down, the script will automatically set DB_bak as the main library, and map the virtual IP to DB_bak. The web service will use the healthy DB_bak as the main library for reading and writing. access. This only takes a few seconds to complete the primary database service recovery.

Combine the above structures to get the master-slave high-availability structure diagram:

image description

Database high availability also includes data patching. When we operate core data, we first record logs and then perform updates. In addition, we have achieved near real-time rapid recovery of database services, so the amount of patched data is not large, and a simple recovery The script can quickly complete the data repair.

Five, data classification

In addition to the core payment order table and payment flow table, the payment system also has some configuration information tables and some user-related information tables. If all read operations are done on the database, the system performance will be greatly reduced, so we introduce a data classification mechanism.

We simply divide the data of the payment system into three levels:

Level 1: order data and payment flow data; these two pieces of data have high requirements for real-time and accuracy, so without adding any cache, read and write operations will directly operate the database.

Level 2: User-related data; these data are related to users and have the characteristics of reading more and writing less, so we use redis for caching.

Level 3: Payment configuration information; these data have nothing to do with users, and have the characteristics of small amount of data, frequent reading, and almost no modification, so we use local memory for caching.

There is a data synchronization problem when using the local memory cache, because the configuration information is cached in the memory, and the local memory cannot perceive the modification of the configuration information in the database, which will cause inconsistency between the data in the database and the data in the local memory.

In order to solve this problem, we have developed a high-availability message push platform. When the configuration information is modified, we can use the push platform to push the configuration file update message to all the servers of the payment system, and the server will automatically update the configuration information after receiving the message. , and give success feedback.

6. Thick and thin pipes

Hacker attacks, front-end retries and other reasons will cause the volume of requests to skyrocket. If our service is killed by a surge of requests, it will be a very painful and tedious process to restore it.

To give a simple example, our current order processing capacity is 100,000 orders per second on average, and 140,000 orders per second at peak value. If 1 million order requests enter the payment system in the same second, there is no doubt that our entire The payment system will collapse, and the continuous flow of requests will make our service cluster unable to start at all. The only way is to cut off all traffic, restart the entire cluster, and then slowly import traffic.

We can solve the above problems well by adding a layer of "thick and thin pipes" to the external web server.

The following is a simple structure diagram of thick and thin pipelines:

image description

Please see the structure diagram above. Before the http request enters the web cluster, it will go through a layer of thick and thin pipelines. The entry port is foul language. We set a maximum support of 1 million requests per second, and excess requests will be discarded directly. The exit port is a thin port, and we set it to 100,000 requests per second for the web cluster. The remaining 900,000 requests will be queued in the thick and thin pipeline. After the web cluster has processed the old requests, new requests will come out of the pipeline and be processed by the web cluster. In this way, the number of requests processed by the web cluster will never exceed 100,000 per second. Under this load, each service in the cluster will run in high schools, and the entire cluster will not stop services due to the surge in requests.

How to implement thick and thin pipes? There is already support in the nginx commercial version, please search for related information

nginx max_conns, it should be noted that max_conns is the number of active connections, the specific setting needs to determine the average response time in addition to the maximum TPS.

nginx related: http://nginx.org/en/docs/http/ngx_http_upstream_module.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325065765&siteId=291194637