Use Mysql partition table to optimize the database

In the early work, there was not enough design. At present, the record table data is 2000w and there is no effective index. The performance is slow paging and fuzzy query pulls.

In the current business, there will be more write operations than read operations, and from time to time, we will encounter too many data connections occupied by slow SQL, resulting in write operations not being able to proceed normally. As the record table has obvious hot and cold data, consider using the data partition table to solve the problem of slow read operations

The following is a record of problem resolution:

1 Separate hot data

Partition the record table to narrow the scope of data filtering.
Here I choose the time field create_time [TIMESTAMP]

ALTER TABLE record PARTITION by RANGE(UNIX_TIMESTAMP(create_time))
(
PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2020-01-01 00:00:00') ),
PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2020-02-01 00:00:00') ),
PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2020-03-01 00:00:00') ),
PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2020-04-01 00:00:00') ),
PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2020-05-01 00:00:00') ),
PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2020-06-01 00:00:00') ),
PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2020-07-01 00:00:00') ),
PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2020-08-01 00:00:00') ),
PARTITION p9 VALUES LESS THAN ( UNIX_TIMESTAMP('2020-09-01 00:00:00') ),
PARTITION p10 VALUES LESS THAN (UNIX_TIMESTAMP('2020-10-01 00:00:00') )
)

Here are a few common mistakes

  • A PRIMARY KEY must include all columns in the table's partitioning function
  • A UNIQUE INDEX must include all columns in the table's partitioning function

It means that each unique index on the table must be on the expression of the partition table. If I choose create_time as the partition field, then this field must be a unique index. [PRIMARY KEY or UNIQUE INDEX]
So delete the original PRIMARY KEY [primary key id] to establish a joint primary key

ALTER TABLE record DROP PRIMARY KEY, ADD PRIMARY KEY(id,create_time);

Use the following command to view the number of records in each partition

SELECT PARTITION_NAME,TABLE_ROWS FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 'record';

Analyze SQL to determine whether the query distinguishes partitions

EXPLAIN PARTITIONS SELECT id,create_time FROM table_name WHERE create_time> '2020-03-01 00:00:00' AND create_time< NOW()

The query can distinguish partitions, no longer full table query, and the initial goal of optimization is achieved.

2 Optimize query efficiency

Business involves paging operations, the most common paging syntax includes 2 SQL

  • Get the total number of records SELECT COUNT(*)
  • Paginating records SELECT * FROM table_name WHERE xxxxxxx LIMIT n , m

After using the partition table, it only reduces the data filtering range [2000w data table only uses the partition of the last 2 months, the data volume is reduced to 300w], and the query efficiency is increased by 70% [45s-> 15s] The query time is still 10s The above does not completely solve the problem

2.1 Choose the right storage engine

Under the InnoDB storage engine, COUNT (*) and LIMIT become extremely time-consuming as the table data increases.
The MYISAM engine is very fast, but the engine does not support row-level locks. The read operation is a shared lock, the write operation is an exclusive lock, and it supports concurrent inserts. In the case of excessive write pressure, you may encounter a table lock situation. .
Use InnoDB under comprehensive consideration

2.2 SQL and business adjustment

I made some trade-offs in business, removed the last page of paging and input a custom page number, leaving only the page up and down and the last few pages to jump. [Refer to 58 Same City Page]

This is similar to the cursor query [scroll] in ES. The front-end and back-end cooperation are completed. The page-by-page query needs to know the current cursor, that is, the primary key ID, the previous page and the next page size.

SELECT * FROM table_name WHERE id > scroll and id < scroll + pageSize

I have also seen another SQL optimization solution, which only needs to be completed by the back end, and the efficiency is relatively low. There is a problem of too large limit

SELECT * FROM table_name where id >= (SELECT id FROM table_name LIMIT (pageNo-1) * pageSize, 1) LIMIT pageSize

2.3 Index adjustment

Each partition of the partitioned table is indexed and stored independently. The record table relates to the query, and the query field is indexed.

Add record name index: CREATE INDEX index_name ON table_name (table_field)
final query SQL: SELECT id, name, create_time FROM table_name WHERE table_field like 'xxxx%' AND create_time> '2020-03-01 00:00:00' AND create_time <NOW ()
analyzes SQL: using Explain, it is found that the index query and paging time is between 0.01-0.04, which basically meets the requirements.

The above is the optimization of large database tables using partitioned tables. There are also some business compromises and limitations. For example, in order to hit the index like query, the query must be matched from front to back, and pagination cannot jump to the specified page.

If you don't want to compromise on business, you can use ES for paging, database for basic queries, or use Sphinx for full-text search.
The complexity of business development, the accuracy of data, and the timeliness of the three are generally only two. In different business situations, make different choices, the benevolent see the benevolent see the wisdom.

Guess you like

Origin www.cnblogs.com/threecha/p/12744080.html