Orders table optimization

1 Background

As users continue orders, order form and order the table records the number of single subsidiary DB table too large, affecting the front-end and performance management system to pull the order of the list. The maximum number of single table rows appropriate to specific business-related, it is difficult to draw a conclusion, but generally do not recommend more than 1 million rows, after a single table of performance degradation is more obvious.
This document compiled some common ideas large database tables optimization principles. Finally, optimization program for the Orders table.

2 common ideas

  • Single partition table
  • Large table points table
  • Business sub-libraries
  • Separate read and write and clusters
  • Hot Cache
  • Instead of using ES DB

2.1 single partition table

What is zoning?
A single large file is to a table, according to a certain logic block is split into a plurality of physical files. For applications, or a whole table; however, the bottom is actually composed of a plurality of physical blocks. DB mainstream as Oracle, MySql and others have mature support partitioning scheme

What partition type MySql support?

  • range partitioning: The key to partition the range, such as log table, by the day or month partition
  • list partition: partition key enumerated values, such as order status to create a separate partition for the key to be paid until delivery, such as Inbound
  • hash partitioning: Given the number of partitions, each partition dispersed DB hash key value according to the recording, such as the user ID as a key, the individual partitions to break Orders table
  • key partition: partition similar to hash
  • Composite partitioning: Oracle supports a rich complex partition scheme, while MySql is comparatively simpler, only the range and list partitioning support sub-partitions, partition and sub-partition must be a hash or Key.

MySql's zoning restrictions?

  • MySql (other DB is similar) in order to ensure the efficiency of a unique index, zoning requirements in each field must contain a unique index. For example, an order table to increment the primary key ID, order ID is a unique index, the index for the general user ID, if you want to build a partition to partition field for the user ID, you must be the primary key and unique index are revised to composite index, plus user ID. Here you have to pay attention to a problem, and for the composite index, when individual fields where conditions are included only if the composite index, must be from front to back, otherwise the query will not take the index. For example, there is a composite index User ID + Order ID, if the user ID as a query, the index can not be used; Order ID and use queries, can use the index, an index which is data structures determined.
  • 5.6.7 version of the single-table the maximum number of previous versions of the largest partition and after 8192 1024,5.6.7. Note the number of partitions is too small optimization is not obvious, the number of partitions much extra consumption will increase IO system but reduce performance. More than the number of partitions of 1024, I am afraid that is only suitable for hot and cold type of log apparent separation table, such table often only check the nearest several partitions; thousands of tables concurrent read and write, then, io would be a sad life.

MySql how to partition or modify an existing partition parameters on the table?

  • MySql support ALTER TABLE dynamically create or modify a table partition, this way there is pressure on the online business.
  • An alternative is to create a temporary table, the master table data to import temporary table, then the temporary table based switching table. Because online data changes in real time, this approach requires the processing of final data consistency problems.

Partition advantage?

  • The application is completely transparent and does not significantly increase the burden on DB.
  • To partition field as a query, DB will first determine the possible target partition, and then in complete partition query, can greatly reduce the amount of data queries.
  • Just delete the corresponding partition can be partitioned by month or by the day when the log table, you need to delete old logs, simple and efficient.
  • (1) class partition files can be stored in different disk, support concurrent operations such as count

Now let's think for a specific order form, how to optimize:

Example 1: User ID to do HASH partition
before runner pointed to user queries the Orders table, are based on user ID, each user can only pull their order information. So for the Orders table, the user ID may be a partition field break Orders table partitions; This requires the original primary key (increment ID) and a unique index (order ID) are modified as a combination index, respectively, is incremented user ID + ID, order ID + user ID. Retain the original general index user ID. For example, we create four partitions.

  • So pull all the orders of a user's query, only to locate a single partition, the amount of data to be processed is the original statement 1/4.
  • Order ID query using pull order records, together with the user ID requires as a condition, so that the query can be first positioned to a single partition, walk ID + Order index unique combination of a user ID, but also to optimize the amount of data to original 1/4.
  • Covering more than one partition query table or sweep operation, due to the original split a large file into multiple small files, the hard drive will be more friendly IO, theoretically optimized.

Example Two: partitioning composition, range increment ID, user ID, hash
this embodiment may be further broken up data, such as the range four partitions, each partition range do four sub hash partitions.

  • User ID case by pulling the order of the list, will be in four sub-partitioning process, the amount of data in each sub-district we believe are roughly 1/16 of the amount of data involved a query or 1/4 (4 * 1/16)
  • Pull query order records by order ID, you can optimize to first find out the range partitioning, hash partitions to find, ultimately only need to find one sub-partition data, the data volume becomes the original 1/16

With this scheme, and a unique primary key index needs to be extended to three fields.

Example Three: a combination of partition, list the order ID, hash user ID
electricity supplier orders show, there is a feature, unto the payment, to be shipped until the receipt and all orders, etc., the user is often taken to be a separate pull delivery, Inbound orders of the page, then we can order the table by the state to do a list partition, then the user ID hash sub-partitions.

  • By pulling a user ID list of orders, consistent with the previous two scenarios.
  • Pull through the order ID of orders recorded, it will only comply with the order to retrieve the dataset state, taking into account most of the orders are completed status, and pull to be shipped, pending receipt of orders may be involved in the operation of the data collection will be small

With Scheme II, using this scheme also requires a unique primary key index and extended to three fields.

2.2 points table large table

Application of a sub-partition table is an alternative scenario, different but uniform pre-created table structure table a plurality of table names, and numbers, to obtain the application ID with the table number modulo etc. prior to each read or write table to achieve a single partition table.

In this chapter we focus on the function used to divide the extended partition table, we look at the actual case of a financial trading platform, the order table is divided into multiple small tables to spread a different service request:

  • Temporary Orders table: storage ticket has not been out of order, mainly single ticketing system and concurrent read and write to the table
  • Orders Summary Table: Summary list of orders for display
  • Order Details table: The table shows the order details from the pull
  • Orders history summary table, order details the history table: based on business characteristics, over a certain period to request an order rarely, it can be moved to the history list, the number of records to ensure that the main table will not be much

For electricity supplier orders table, we can have similar ideas, such as the user pulls often points to be paid when the order data, to be shipped until the receipt of all orders, etc., in addition to all the orders list, front-end is based on the different stages to pull and display orders, so we can be at different stages of an order to a different table, reducing the number of records in a single table to improve query efficiency.
Another example is the history table, front-end opportunities for completed orders pulled a month ago or even a year ago is not much can be some time before recording moved to the history list.

2.3 sub-library services

With the further growth of the traffic when a single instance of DB has been unable to support a large number of user requests can be considered according to the service sub-libraries, or even a single service is fine-grained sub-library, the different requests to the library into different process, change the hardware cost performance. Currently our platform backstage and front-end back-office systems business unit DB are separate, which is the case of sub-libraries.

With the idea in front of the district, sub-table, and warehouses should be well understood, now, I do not see we have a demand for further sub-library of online business in a public cloud, a general increase in performance requirements can first rapid expansion achieved by a single library. If the follow-up necessary, consider separating Commodities Center, Customer Center and other strong business-related subsystems; when the volume of business continues to grow, you can also consider a more fine-grained, such as on the order form, it can also be separated from each sub-table go to a different DB. Of course, these operations for our application development put forward higher requirements of adaptation, the industry has matured as mycat middleware solutions.

2.4 cluster and separate read and write

Separate read and write and DB cluster is commonly used in the Internet Architecture program, we have achieved part of this program is mainly to solve the situation is much larger than the read write, can be a master-slave, a master multi-slave, multi-master multi-slave etc., essentially copies of this data, a plurality of simultaneous end of the load application DB instance, to change the hardware cost performance. The program in improving reading performance, HA and other aspects of the effect is obvious, but does not really solve the order form table is too big a problem, do not start here say.

2.5 Hot Cache

Cache design a cost-effective, less update for reading a plurality of data, such as commodities, a large number of user requests pull commodity information, real orders will be much less, a large amount of information in the cache product repeat request may be intercepted. For order information, each user can only pull their orders, each order is the number of times to visit is very small, so the price for the order to create a cache becomes low. Here is not to discuss the.

ES DB was replaced with 2.6

Reference Fylos recommended this blog: http://www.sohu.com/a/327627159_315839
Jingdong home orders ES cluster system is mainly dependent on pressure to bear order inquiries, the current number of documents supporting 1 billion and 500 million daily queries the amount.

3 Orders table optimization

3.1 Business Analysis

This is a plot of the main table of orders of business involved, the figures I compiled the main field conditions (index) select / update related to where, where Ux represents a unique index, Ix represents the general index:

  • In addition to order_info and order_sku table, other tables require only one field index, as all queries order_product_attr tables are screened by order_id field.
  • order_sku table, only when management system product_id / sku_id screening conditions, will be used I2 / I3 two indices, all other queries were screened by order_id.
  • Filter field order_info table query involves a lot, I did not figure the whole column. Wherein the front end of the incoming request from a user, are user-dimensions with user_id.

Currently, our order form the number of records is less than 5 million, far from the required sub-libraries, build a cluster to support performance point; and the order of query frequency is not high, mainly due to the performance of single-table limit the amount of data is too large, so build buffers the significance is not large; As for ES DB query instead of introducing a new node data needs and data synchronization, in the case of partition and division table to solve the problem, the use of ES belong to over-optimized, it is not necessary.
So let's discuss how direct district, sub-table. In addition, the partition of the business code is no perceived need to modify business logic sub-table, it is recommended to select a partition priority.

No. Table Name Records
1 order_info 4507745
2 order_sku 4885235
3 order_product_attr 25450856
4 order_sku_epay 5772024
5 order_product_ext_info 2927387
6 order_product_set_info 14677
7 order_package_info 1139441
8 package_sku_info 213238

From the table view, the number of records before the four tables is relatively large, but also the most requested concentrated form, needs to be optimized.

3.2 partitioning scheme (order_sku / order_product_attr / order_sku_epay)

Partition ideas: to order_id partition key; the current amount of data as the basis, within each partition record number of 500,000.

Table Name Partition number The average number of records for each partition
order_sku 16 305 000
order_product_attr 64 398 000
order_sku_epay 16 361 000

Under this partition design, even after our business growth 10 times the number of records in each partition of about 4 million, but also good support. Of course, later we may have to consider the design of some of the points table, the sub-library.

Problems: managing a separate screening system to product_id / sku_id, the partition can not be locked. In consideration of this request is not a lot, you can take the ordinary index.

3.2 History Table + partition scheme (order_info)

order_info table of business features:

  • User request: The request parameters are with user_id, you can press user_id partition, to support the investigation of historical data
  • Regular tasks, have to pay overtime to roll back, automatically confirm receipt, the payment timer (confirming payment status), shipping polling, daily settlement, delivery failure automatic refund, automatic time-out orders shipped mark, selling list Wait. These basic tasks are timed for outstanding orders or recently completed orders, but the filters have order status, payment time, is not suitable partitions. You can create history tables, the non-hotspot data removed.
  • Management systems pull orders, some operations involving the whole table of data, in addition to sub-library hardware stack, there is no good solution.

Combination of these operational characteristics, may be split order_info order_info primary table and a history table order_info_his, while order_info_his user_id partition table press to ensure that the pulling speed of the UE.

Main table order_info:

  • Keep the last three months of orders
  • The amount of data, no need to partition. After the business grew, the partition table can be considered
  • Hot data - not the end of this table in order for the user to pull unfilled orders, statistics and other hot back-office settlement behavior

History Table order_info_his:

  • The main user is pulling the history of the order management system and pull the list of orders, where user-based, in order to create partitions user_id

Code logic changes:

  • The user pulls pending payment until receipt page, only check the main table
  • The user pulls the page to be shipped, due to the delivery timeout is manually configured, some of the test delivery of goods is not configured timeout, the official did not rule out goods have not configured, the data on this page requires a primary table and the history table dragged take
  • Other users pull orders list page: pull two tables
  • Task Timer: need to analyze whether the main table contains all the data
  • Management system: a large amount of data related to order management page, the full amount of time-consuming long pull, it can be special treatment, pull twice, 1st pull and display data in the main table, but does not display the page area Order The total number, the 2nd and then pulling historical data summary table

appendix

Analysis found the following problems exist in the current system, temporarily no time optimization, in this record:
1, order_product_set_info, only the id / master_sku_id two indexes, but the code to pull through only order_id.
2, order_product_attr, order_product_ext_info, two tables created is a combination of three index fields, the actual use is also order_id, although order_id is the first field of this composite index, a query can take the index
3, order_sku_epay, this table the ( epay_type, epay_id) this index, I did not have to find where the use of such excess
4, within the scope of the order details timer query time data RetrieveBillDayOrderInfo, is in the production DB in the first sweep order_sku table (two million lines). order_info plus history table, then this small master table data query should go first index of order_info

Guess you like

Origin www.cnblogs.com/JoZSM/p/11784078.html