E-commerce project part07 order system design and massive data processing

The problem of repeated orders (idempotent)

When the user accidentally clicks twice on the "Submit Order" button, the browser will send two consecutive requests to create an order to the server. This is definitely not possible.
The solution is, 让订单服务具备幂等性. What is idempotence? The characteristic of an idempotent operation is that the impact of any multiple executions of the operation is the same as the impact of one execution. In other words, for an idempotent method, using the same parameters and calling it multiple times or once will have the same impact on the system.

Insert image description here

Read-write separation and sub-database sub-table

Using Redis as the front cache of MySQL can help MySQL block most query requests. This method is particularly effective for systems that are not closely related to users, such as product systems and search systems in e-commerce. Because in these systems, everyone sees the same content, that is to say, for the back-end server, anyone's query request and the returned data are the same. In this case, the hit rate of the Redis cache is very high, and almost all requests can hit the cache. However, for systems related to users (not the user system itself, user information and other related data are cached when the user logs in, the value is very high), the effect of using cache is not so good, for example, order system, account system, shopping cart system, order system, etc. For these systems, the information queried by each user is related to the user himself, and the data seen by the user is different even on the same functional interface. For example, in the "My Orders" function, all users see is their own order data. In this case, the hit rate of the cache is relatively low, and a considerable part of query requests will penetrate into the MySQL database because they cannot hit the cache. As the number of users in the system increases, more and more read and write requests penetrate into the MySQL database. What should I do when a single MySQL cannot support so many concurrent requests?

read-write separation

Reading and writing separation is the preferred solution to improve MySQL's concurrency capabilities. When a single MySQL cannot meet the requirements, multiple MySQL instances can only be used to bear a large number of read and write requests. MySQL, like most commonly used relational databases, is a typical stand-alone database and does not support distributed deployment. It is very difficult to use multiple instances of a single database to form a cluster to provide distributed database services. A simple and very effective way is to use multiple MySOL instances with the same data to share a large number of query requests, that is, "read-write separation". In many systems, especially Internet systems, the read-write ratio of data is seriously unbalanced. The read-write ratio is generally from 9:1 to dozens to 1. That is, on average, there will be only one update request every dozens of query requests. That is It is said that most of the requests that the database needs to handle are read-only query requests.
It is very difficult for a distributed storage system to support distributed writing because it is difficult to solve the problem of data consistency. But distributed reading is relatively simple. It can synchronize data to read-only instances in real time as much as possible, and they can share a large number of query requests. Another benefit of read-write separation is that it is relatively simple to implement. It is very easy to upgrade a system using stand-alone MySQL to a multi-instance architecture with separate reading and writing. Generally, there is no need to modify the business logic of the system. It only requires simple modifications to the DAO (Data Access Object, generally refers to the abstraction layer responsible for accessing the database in the application) layer. The code separates the read and write requests to the database and requests different MySQL instances. Through a simple storage architecture upgrade such as separation of reading and writing, the number of concurrency supported by the database can be increased several to more than ten times. Therefore, when the number of users in the system increases, separation of read and write should be the first expansion solution to be considered.
The main library is responsible for executing data update requests sent by the application, and then synchronizing the data changes to all slave libraries. In this way, the data in the master database and all slave databases are consistent, and multiple slave databases can share application query requests.
Insert image description here

Sub-database and sub-table

In addition to the concurrency problem of accessing MySQL, we also need to solve the problem of massive data. In many cases, distributed storage clusters are used. Because MySQI is essentially a stand-alone database, it is not suitable for storing data above the TB level in many scenarios. data.

How to plan sub-database and sub-table

Which situation is suitable for table sharding and which situation is suitable for sharding database. The purpose of choosing partitions or tables is to solve the following two problems.
First, it is to solve the problem of slow query caused by too large amount of data. The "query" mentioned here is actually mainly the query and update operations in the transaction, because read-only queries can be solved through caching and master-slave separation. Table sharding is mainly used to solve the problem of slow query caused by large amounts of data.
The second is to deal with high concurrency issues. If a database instance cannot support it, concurrent requests will be distributed to multiple instances, so sub-databases can be used to solve high concurrency problems.
Simply put, 如果数据量太大,就分表; 如果并发请求量高,就分库. Under normal circumstances, most solutions require splitting databases and tables at the same time. You can calculate how many databases and tables should be split into based on the estimated concurrency and data volume.

accomplish

The amount of data

In the design system, the estimated number of orders is 2000W per month, and the number of orders in a year can reach 240 million. The size of each order is roughly 1KB. According to MySQL knowledge, in order to control the height of the B+ tree within a certain range and ensure query performance, the data in each table should not exceed 2000W. In this case, in order to store 240 million orders, it seems that the order table should be divided into 16 ( 12 往上取最近的 2 的幂) tables.

Choose a shard key

Now that we have decided to divide the order system into databases and tables, there is another important issue, which is how to choose a suitable column as the basis for table division. This column is generally called the Sharding Key. It is very important to choose the appropriate sharding key and sharding algorithm, because it will directly affect the effect of database and table sharding.
The solution to this problem is to use the last few digits of the user ID as part of the order ID when generating the order ID. In this way, when querying by order ID, the shards can be found based on the user ID in the order ID. Therefore, after the order ID in the system obtains the ID from the unique ID service, the last two digits of the user ID are also spliced ​​together to form the final order ID.

In terms of sharding algorithms, commonly used ones include sharding by range, such as time range sharding, hash sharding, and lookup table sharding. The most commonly used method is hash sharding, which directly modulates the number of tables.

Once the database and tables are divided, the query capability of the database will be greatly limited. Originally a simple query may not be possible after the database and tables are divided. Database and table sharding must be the last resort when the amount of data and concurrent requests are so large that all other tricks are ineffective.

Implementation

How to achieve separation of reading and writing and sub-database and table in the code? Generally speaking, there are three methods.
1) Pure manual method: Modify the DAO layer code of the application, define multiple data sources, and specify the data source for each database request in every place in the code that needs to access the database.
2) Component approach: Use components like Sharding-JDBC to be integrated into the application to proxy all database requests of the application and automatically route the requests to the corresponding database instance.
3) Proxy method: Deploy a set of database proxy instances, such as Atlas or Sharding-Proxy, between the application and the database instance. For the application, the database proxy disguises itself as a single-node MySQL instance. All database requests of the application will be sent to the proxy. The proxy separates the requests and then forwards the separated requests to the corresponding database instance.

在这三种方式中一般推荐第二种, using the method of separating components. In this way, there is very little code intrusion, while still taking into account performance and stability. If the application is a microservice with very simple logic, as simple as just a few SQLs, or the programming language used by the application does not have appropriate read-write separation components, then you can also consider a purely manual approach. It is not recommended to use the proxy method (the third method) because the proxy method lengthens the calling link for database requests when the system is running, which will cause a certain performance loss, and the proxy service itself may also experience problems such as failures and performance bottlenecks. The proxy method has the advantage that it is completely transparent to the application.
Therefore, in our order service, we use the second method and introduce Sharding-JDBC. We consider supporting read-write separation and sub-database and sub-table at the same time. The configuration is as follows:
Insert image description here
For details, refer to the sharding-jdbc configuration file

Guess you like

Origin blog.csdn.net/Forbidden_City/article/details/132512040