[reprint] MySQL development specification

1. The default use of InnoDB engine
[Lao Ye's point of view] has been called for many times. InnoDB is suitable for almost 99% of MySQL application scenarios, and the system tables in MySQL 5.7 have been changed to InnoDB. What reason is there to stick to MyISAM? .


In addition, InnoDB tables that are frequently read and written must use an integer type with auto-incrementing/sequential characteristics as an explicit primary key.

Of course, it does not mean that MyISAM is useless. For example, Lao Ye used MyISAM to temporarily import data (import the data into MyISAM, and then enter the InnoDB table after some processing), or use MyISAM for some special data statistics scenarios. (MyISAM full table sequential read has obvious advantages over InnoDB under large data volume) It may be more appropriate. The premise is that you have to be very clear about the advantages of the MyISAM engine.

[Reference]: [MySQL FAQ] series - Why is it recommended to use auto-increment columns as primary keys for InnoDB tables?

2. The physical length of InnoDB table row records does not exceed 8KB


[Laoye's point of view] The default data page of InnoDB is 16KB. Based on the characteristics of B+Tree, a data page needs to store at least 2 records. Therefore, when the actual storage length exceeds 8KB (especially TEXT/BLOB columns) large columns (large columns) will cause "page-overflow storage", similar to "row migration" in ORACLE.

Therefore, if you must use large columns (especially TEXT/BLOB types) and read and write frequently, it is better to split these columns into sub-tables instead of storing them together with the main table. If it is not frequent, consider continuing to keep it in the main table.

Of course, if the innodb_page_size option is modified to 8KB, then the physical length of the row record is recommended not to exceed 4KB.

[Reference]: [MySQL optimization case] series - Optimize the storage efficiency of BLOB columns in InnoDB tables.

3. Whether to use table partition (partition)
[Old Ye's point of view] In some scenarios where the use of table partition can obviously improve the performance or the convenience of operation and maintenance, it is still recommended to use table partition.

For example, Laoye uses the partition according to the time dimension on the premise that the database of zabbix adopts the TokuDB engine. The advantage of this is to ensure that the daily application of zabbix is ​​not affected, and it is convenient for administrators to routinely delete past data. It is only necessary to delete the corresponding partition, and there is no need to perform a very slow DELETE to affect the overall performance.

Of course, using table partitioning may not be conducive to physical expansion. For example, if you want to do hash horizontal splitting under large data volumes, this is a matter of opinion. If it is more beneficial to use table partitioning in your business scenario, you can use it with confidence. . When splitting, use the splitting scheme instead of holding on to the table partitioning scheme.

Reference: Migrating Zabbix database to TokuDB.

4. Whether to use stored procedures and triggers
[Lao Ye's point of view] In some suitable scenarios, it is completely fine to use stored procedures and triggers.

We used to use storage to complete game business logic processing, performance is not a problem, and once the demand changes, we only need to modify the stored procedure, and the cost of the change is very low. We also use triggers to maintain a frequently updated table, and all changes to this table will update some fields to another table synchronously (similar to the disguised implementation of materialized views), and there is no performance problem.

Some peers believe that the application of stored procedures and triggers may cause confusion in troubleshooting when lock waits and deadlocks occur. Well, this is a possibility, but if it does happen, it should not be difficult to check the corresponding stored procedure or trigger according to the SQL recorded on the spot at that time, but the DBA is required to have a better understanding of the online business environment. boobs.

In general, don't think of MySQL's stored procedures and triggers as beasts. If you use them well, there is no problem. If you really encounter problems, it is not too late to optimize. In addition, MySQL does not have materialized views because the processing of views is not ideal, so the views should be used as little as possible.

5. Choose the right type
[Lao Ye's point of view] In addition to the common suggestions, there are several other points:

5.1. Use INT UNSIGNED to store IPV4 addresses, and use INET_ATON() and INET_NTOA() to convert, basically there is no need to use CHAR (15) to store.

5.2. The enumeration type can use ENUM. The internal storage mechanism of ENUM is TINYINT or SMALLINT (not CHAR/VARCHAR). The performance is not bad at all. Remember not to use CHAR/VARCHAR to store enumeration data.

5.3. It is also a "common sense misleading" that has been spreading earlier. It is recommended to replace DATETIME with TIMESTAMP. In fact, starting from 5.6, it is recommended to choose DATETIME to store date and time, because its usable range is larger than that of TIMESTAMP, and the physical storage is only 1 byte more than TIMESTAMP, and the overall performance loss is not large.

5.4. By default, NOT NULL constraints are added to all field definitions, unless they must be NULL (but I can't think of any scenarios where NULL values ​​must be stored in the database, which can be represented by 0). When performing COUNT() statistics on this field, the statistical results are more accurate (the value of NULL will not be counted by COUNT), or when performing WHERE column IS NULL retrieval, the results can also be returned quickly.

5.5. Avoid direct SELECT * to read all fields. When there are large TEXT/BLOB columns in the table, it will be a disaster. It may not be necessary to read these columns, but because of lazy writing SELECT *, the memory buffer pool is washed out by these "garbage" data that really needs to be buffered.
Correspondingly, when writing INSERT, also write the corresponding field list.
The significance of requiring each field to be written clearly in SQL is also that when the business needs the table DDL to be updated, if the field is not written clearly, the old business code may become unavailable, which is a big toss.

6. Regarding indexes
[Laoye's point of view] In addition to the common suggestions, there are several points:

6.1. For string columns with more than 20 lengths, it is better to create a prefix index instead of an entire column index (for example: ALTER TABLE t1 ADD INDEX( user(20))), which can effectively improve the index utilization, but its disadvantage is that the prefix index is not used when sorting this column. The length of the prefix index can be obtained based on the statistics of the field, generally a little larger than the average length.

6.2. Regularly check and delete redundant indexes with the pt-duplicate-key-checker tool. For example, if the index idx1(a, b) already covers the index idx2(a), the idx2 index can be deleted.

6.3. When there is a multi-field joint index, the field order of the filter conditions in WHERE does not need to be consistent with the index, but if there is sorting and grouping, it must be consistent.

For example, if there is a joint index idx1(a, b, c), then the following SQL can use the index completely:
SELECT ... WHERE b = ? AND c = ? AND a = ?; -- Note that the order of fields in WHERE Not consistent with the order of the index fields SELECT ... WHERE b = ? AND a = ? AND c = ?;

SELECT ... WHERE a = ? AND b IN (?, ?) AND c = ?;

SELECT ... WHERE a = ? AND b = ? ORDER BY c;

SELECT ... WHERE a = ? AND b IN (?, ?) ORDER BY c;

SELECT ... WHERE a = ? ORDER BY b, c;

SELECT ... ORDER BY a, b, c; -- Sorting can be done using a joint index

while the following SQLs can only use partial indexes, or the ICP feature can be used:
SELECT ... WHERE b = ? AND a = ?; -- only use (a, b) part

SELECT ... WHERE a IN (?, ?) AND b = ?; -- EXPLAIN shows that only the (a, b) partial index is used, with ICP

SELECT ... WHERE (a BETWEEN ? AND ?) AND b = ?; -- EXPLAIN shows that only the (a, b) partial index is used, with ICP

SELECT ... WHERE a = ? AND b IN (?, ?); -- EXPLAIN shows that only the (a, b) partial index is used, with ICP

SELECT ... WHERE a = ? AND (b BETWEEN ? AND ?) AND c = ?; -- EXPLAIN shows that (a, b, c) is used for the entire index, with ICP

SELECT ... WHERE a = ? AND c = ?; -- EXPLAIN shows that only (a) partial indexes are used, and ICP

SELECT ... WHERE a = ? AND c >= ?; -- EXPLAIN shows that only the (a) partial index is used, with ICP

 

ICP (index condition pushdown) is a new feature of MySQL 5.6. Its mechanism allows other parts of the index to also participate in filtering, reducing data transmission and table return requests between the engine layer and the server layer, which can greatly improve query efficiency in general.

The following SQL does not use this index at all:

SELECT ... WHERE b = ?;

SELECT ... WHERE b = ? AND c = ?;

SELECT ... WHERE b = ? AND c = ?;

SELECT ... ORDER BY b;

SELECT ... ORDER BY b, a;

As can be seen from the above examples, the "common sense misleading" that the order of the fields of the WHERE condition that was emphasized in the past must be consistent with the order of the index to use the index does not need to be strictly followed.

In addition, sometimes the index or execution plan specified by the query optimizer may not be optimal. You can manually specify the optimal index, or modify the session-level optimizer_switch option to turn off some features that lead to worse effects (such as index merge usually It's a good thing, but I've also encountered worse after using index merge. At this time, either one of the indexes can be forced to be specified, or the index merge feature can be temporarily turned off).


7. Other
7.1, even if it is based on index-based condition filtering, if the optimizer realizes that the total amount of data to be scanned exceeds 30% (it seems to be 20% in ORACLE, MySQL is currently 30%, maybe it will be adjusted in the future), it will Directly change the execution plan to full table scan and no longer use indexes.

7.2. When multi-table JOIN, the table with the largest filterability (not necessarily the smallest amount of data, but only the one with the largest filterability after adding the WHERE condition) should be selected as the driving table. In addition, if there is sorting after JOIN, the sorting field must belong to the driving table, so that the index on the driving table can be used to complete the sorting.

7.3. In most cases, people who sort are usually higher, so if you see Using filesort in the execution plan, create a sorting index first.

7.4. Use pt-query-digest to regularly analyze the slow query log, and combine with Box Anemometer to build a slow query log analysis and optimization system.

[Reference]: [MySQL FAQ] series - What information should be paid attention to in EXPLAIN results.

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326833310&siteId=291194637