Summary of basic MySQL knowledge points and query optimization

This article mainly summarizes some common operations and unreasonable operations in the work, and some useful materials and information collected when optimizing slow queries. This article is suitable for developers with mysql foundation.

Table of contents

  • 1. Index related

  • 2. Useful information in EXPLIAN

  • 3. Field type and encoding

  • Four, SQL statement summary

  • Five, stepping on the pit

  • 6. Online modification of tens of millions of watches

  • Seven, slow query log

  • Eight, view the sql process and kill the process

  • Nine, some database performance thinking

1. Index related

1. Index cardinality:

Cardinality is the number of distinct values ​​a column of data contains. For example, a data column contains the values ​​1, 3, 7, 4, 7, 3, then its cardinality is 4. An index works best when its cardinality is high relative to the number of rows in the table (that is, the column contains many distinct values ​​with few duplicate values). If a data column contains many different ages, the index will quickly resolve the data row. If a data column is used to record gender (only "M" and "F" values), then the index is not very useful. If the values ​​occur almost equally, you're likely to get half of the data rows no matter which value you search for. In these cases, it is best not to use indexes at all, because when the query optimizer finds that a certain value occurs in a high percentage of the data rows of the table, it will generally ignore the index and perform a full table scan. A customary percentage cut-off is "30%".

2. Reasons for index failure:

1. Operations on index columns, operations include (+, -, *, /, !, <>, %, like'%_' (% in front).

2. The type is wrong, such as the field type is varchar, and the where condition uses number.

3. Apply internal functions to the index. In this case, a function-based index should be established, such as select * from template t where ROUND(t.logicdb_id) = 1. At this time, ROUND(t.logicdb_id) should be established as the index, starting from mysql8.0 Function index is supported, and 5.7 can support it through virtual columns. Previously, only a new ROUND(t.logicdb_id) column could be created and then maintained.

4. If the condition has or, it will not be used even if there is a condition with index (this is why it is recommended to use or less). If you want to use or and want the index to be effective, you can only add each column in the or condition on the index.

5. If the column type is a string, then the data must use quotation marks in the condition, otherwise the index will not be used.

6. The B-tree index will not go if it is null, and it will go if it is not null. The bitmap index will go if it is null or not null.

7. Composite indexes follow the leftmost principle.

3. Index establishment

1. The most important thing must be the frequently queried statements based on the business.

2. Try to choose a column with a high degree of discrimination as an index. The formula for the degree of discrimination is COUNT(DISTINCT col) / COUNT(*). Indicates the ratio of field uniqueness, the larger the ratio, the fewer records we scan.

3. If there are unique characteristics in the business, it is best to establish a unique key. On the one hand, it can ensure the correctness of the data, and on the other hand, the efficiency of the index can be greatly improved.

2. Useful information in EXPLIAN

basic usage

1. Add your sql to desc or explain.

2. Add your sql to extended explain, and then you can view the actual executed statement through show warnings. This is also very useful. In many cases, the actual executed code after sql analysis of different writing methods is the same.

Features to Improve Performance

1. Index coverage (covering index):  The data that needs to be queried can be found on the index, and there is no need to return the EXTRA column of the table to display the using index.

2. ICP feature (Index Condition Pushdown):  Originally index is just an access mode of data access. The data obtained by the storage engine through the index back to the table will be passed to the MySQL server layer for where condition filtering. Starting from version 5.6, when ICP is turned on , if some where conditions can use indexed fields, MySQL server will push this part down to the engine layer, and you can use the where conditions filtered by index to filter data at the storage engine layer. EXTRA shows using index condition. It is necessary to understand that the architecture diagram of mysql is divided into server and storage engine layers.

3. Index merge (index merge):  conditionally scan multiple indexes separately, and then merge their respective results (intersect/union). Generally, OR will be used. If it is an AND condition, consider building a composite index. The index type displayed by EXPLAIN will display index_merge, and EXTRA will display the specific merge algorithm and the index used.

extra field

1. Using filesort:  It means that MySQL will use an external index to sort the data, instead of reading according to the index order in the table. The sorting operation in MySQL that cannot be done using indexes is called "file sorting". In fact, it is not necessarily file sorting, and quick sorting is used internally.

2. Using temporary:  Use temporary tables to save intermediate results, and MySQL uses temporary tables when sorting query results. It is common in sorting order by and grouping query group by. 3. Using index:  Indicates that the covering index (Covering Index) is used in the corresponding SELECT operation to avoid accessing the data rows of the table, and the efficiency is good.

4. Impossible where:  The value of the WHERE clause is always false and cannot be used to obtain any tuple.

5. select tables optimized away:  In the case of no GROUP BY clause, optimize the MIN/MAX operation based on the index or optimize the COUNT(*) operation for the MyISAM storage engine. You don’t have to wait until the execution stage to perform calculations. The stage of query execution plan generation is Complete the optimization.

6. distinct:  optimize the distinct operation, and stop the operation of finding the same value after finding the first matching ancestor.

When the two items of using filesort and using temporary appear, you need to pay attention. These two items are very performance-consuming. When using group by, although order by is not used, if there is no index, using filesort and using temporary may appear at the same time , because group by is to sort first before grouping. If there is no need for sorting, you can add an order by NULL to avoid sorting, so that using filesort will be removed, which can improve a little performance.

type field

1. system:  the table has only one row of records (equal to the system table), which is a special case of the const type and does not usually appear.

2. const:  If you find it sequentially through the index, const is used to compare the primary key index or unique index. Because only one row of data can be matched, it is very fast. If you put the primary key in the where list, MySQL can convert the query to a constant.

3. eq_ref:  unique index scan, for each index key, only one record in the table matches it. Common for primary key or unique index scans.

4. ref:  Non-unique index scan, returning all rows matching a single value. It is essentially an index access that returns all rows that match a single value, however it may find multiple rows that meet the criteria, so it should be a hybrid of search and scan.

5. range:  only retrieves rows in a given range, and uses an index to select rows. The key column shows which index is used. Generally, there are queries such as between, <, >, in, etc. in your where statement. This kind of range scan index is better than full table scan, because it only needs to start at a certain point of abbreviation, and End at another point without scanning the entire index.

6. index:  Full Index Scan, the difference between index and ALL is that the index type only traverses the index tree, which is usually faster than ALL, because the index file is usually smaller than the data file. (That is to say, although both ALL and index read the entire table, index is read from the index, and ALL is read from the hard disk).

7. all:  Full Table Scan, traverse the entire table to obtain matching rows.

3. Field type and encoding

1. Mysql returns the string length:  CHARACTER_LENGTH method (the same as CHAR_LENGTH) returns the number of characters, and the LENGTH function returns the number of bytes, three bytes for a Chinese character.

2. Create an index length calculation statement for fields such as varvhar:  select count(distinct left(test,5))/count(*) from table; the closer to 1, the better.

3. Mysql's utf8 has a maximum of 3 bytes and does not support emoji emoticons, so only utf8mb4 must be used.  You need to configure the client character set to utf8mb4 in the mysql configuration file. The jdbc connection string does not support the configuration of characterEncoding=utf8mb4. The best way is to specify the initialization sql in the connection pool, for example: hikari connection pool, other connection pools are similar to spring.datasource.hikari.connection-init-sql=set names utf8mb4. Otherwise, you need to execute set names utf8mb4 before executing sql every time.

4. msyql sorting rules (generally use _bin and _genera_ci):

  • utf8_genera_ci is not case sensitive, ci is the abbreviation of case insensitive, that is, case insensitive,

  • utf8_general_cs is case-sensitive, and cs is the abbreviation of case sensitive, that is, case-sensitive. However, the current MySQL version does not support collations similar to ***_genera_cs, so use utf8_bin instead.

  • utf8_bin stores each character in the string as binary data, which is case-sensitive.

So, it is also case-sensitive, what is the difference between utf8_general_cs and utf8_bin?

cs is the abbreviation of case sensitive, that is, case sensitive; bin means binary, that is, binary code comparison.

Under the utf8_general_cs collation, even if it is case-sensitive, some Western European characters and Latin characters are indistinguishable, such as ä=a, but sometimes ä=a is not required, so there is utf8_bin.

The characteristic of utf8_bin is that it uses the binary code of characters to perform calculations. Any different binary codes are different, so under the utf8_bin collation: ä<>a

5. The specified encoding type for the initial connection in sql yog uses the initialization command of the connection configuration.

Four, SQL statement summary

Common but easy to forget:

1. If there is a primary key or unique key conflict, do not insert:  insert ignore into;

2. Update if there is a primary key or unique key conflict. Note that this will affect the incremental increment:  INSERT INTO room_remarks(room_id,room_remarks) VALUE(1,"sdf") ON DUPLICATE KEY UPDATE room_remarks="234";

3. If there is, replace it with a new one. If values ​​does not include the auto-increment column, the value of the auto-increment column will change:  REPLACE INTO room_remarks(room_id,room_remarks) VALUE(1,"sdf");

4、备份表: CREATE TABLE user_info SELECT * FROM user_info;

5. Copy table structure:  CREATE TABLE user_v2 LIKE user;

6. Import from query statement:  INSERT INTO user_v2 SELECT * FROM user or INSERT INTO user_v2(id,num) SELECT id,num FROM user;

7、连表更新: UPDATE user a, room b SET a.num=a.num+1 WHERE a.room_id=b.id;

8、连表删除: DELETE user FROM user,black WHERE user.id=black.id;

Lock related (as an understanding, rarely used)

1. Shared lock:  select id from tb_test where id = 1 lock in share mode;

2. Exclusive lock:  select id from tb_test where id = 1 for update;

Used for optimization:

1. Force the use of an index:  select * from table force index(idx_user) limit 2;

2. Prohibit the use of an index: select * from table ignore index(idx_user) limit 2;

3. Disable the cache (remove the impact of the cache during the test):  select SQL_NO_CACHE from table limit 2;

check status

1. View the character set  SHOW VARIABLES LIKE 'character_set%';

2. Check the collation  SHOW VARIABLES LIKE 'collation%';

SQL writing attention

1. The parsing order of the where statement is from right to left, and the conditions should be placed where and not having as much as possible.

2. Use deferred join technology to optimize super-multiple paging scenarios, such as limit 10000, 10, deferred join can avoid returning to the table.

3. The distinct statement consumes a lot of performance and can be optimized through group by.

4. Try not to connect more than three tables.

Five, stepping on the pit

1. If there is an auto-increment column, the truncate statement will reset the cardinality of the auto-increment column to 0. In some scenarios, use the auto-increment column as the id in the business needs to be paid attention to.

2. The aggregation function will automatically filter out the blanks. For example, if the type of column a is int and all of them are NULL, then SUM(a) returns NULL instead of 0.

3. Mysql cannot use "a=null" to judge null equality. This result will always be UnKnown. In where and having, UnKnown will always be regarded as false. In check constraints, UnKnown will be regarded as true for processing. So use "a is null" to deal with it.

6. Online modification of tens of millions of watches

When mysql has a large amount of table data, if the table structure is modified, the table will be locked and business requests will be blocked. Mysql introduced online update after 5.6, but in some cases it still locks the table, so the pt tool (Percona Toolkit) is generally used.

To add an index to a table:

as follows:

pt-online-schema-change --user='root' --host='localhost' --ask-pass --alter "add index idx_user_id(room_id,create_time)" 
D=fission_show_room_v2,t=room_favorite_info --execute

Seven, slow query log

Sometimes if the online request times out, you should pay attention to the slow query log. The analysis of slow query is very simple. First find the location of the slow query log file, and then use mysqldumpslow to analyze it. To query slow query log information, you can directly view related variables by executing the sql command. The commonly used sql is as follows:

-- 查看慢查询配置
-- slow_query_log  慢查询日志是否开启
-- slow_query_log_file 的值是记录的慢查询日志到文件中
-- long_query_time 指定了慢查询的阈值
-- log_queries_not_using_indexes 是否记录所有没有利用索引的查询
SHOW VARIABLES LIKE '%quer%';

-- 查看慢查询是日志还是表的形式
SHOW VARIABLES LIKE 'log_output'

-- 查看慢查询的数量

The tool of mysqldumpslow is very simple. I mainly use the following parameters:

  • -t: Limit the number of output lines, I usually take the first ten is enough

  • -s: According to what to sort, the default is the average query time at, and I often use c query times, because the query times are very frequent but the time is not high, it is necessary to optimize, and t query time, check which statement is particularly stuck.

  • -v: output detailed information

example:mysqldumpslow -v -s t -t 10 mysql_slow.log.2018-11-20-0500

View SQL process and kill process

If you execute a sql operation, but it has not returned for a long time, you can check its actual execution status by querying the process list. If the sql is very time-consuming, you can use the kill command to kill the process in order to avoid affecting the line. Checking the process list can also visually see the current SQL execution status. If the current database load is high, it may appear in the process list. A large number of processes are tied up, and the execution time is very long. The command is as follows:

--View process list
SHOW PROCESSLIST;
--Kill a process
kill 183665

If you use sqlyog, there is also a graphical page, in the menu bar-tool-display-process list. On the process list page, you can right-click to kill the process. As follows:

Nine, some database performance thinking

When optimizing the company's slow query logs, many times you may forget to build an index. This kind of problem is easy to solve, just add an index. But there are two situations that cannot be solved simply by adding indexes:

1. The business code reads the database cyclically:  Consider such a scenario, to obtain user fan list information and add pagination is ten. In fact, SQL like this is very simple, and the performance of querying through tables is also very high, but sometimes, many developments use Take out a string of ids, and then read the information of each id in a loop, so that if there are many ids, the pressure on the database will be great, and the performance will be very low.

2. Statistical sql:  In many cases, there will be rankings in the business. It is found that the company directly uses the database for calculation in many places. When performing aggregation operations on some large tables, it often takes more than five seconds. These sqls are generally very long And it is difficult to optimize. In such a scenario, if the business permits (for example, the consistency requirement is not high or the statistics are only collected after a period of time), statistics can be specially made in the slave library. In addition, I suggest using redis cache to handle this kind of business.

3. Super-large pagination:  In the slow query log, some slow queries with super-large pagination such as limit 40000, 1000 were found. Because mysql pagination is done at the server layer, delay correlation can be used to reduce the return table. But after reading the relevant business code, normal business logic will not make such a request, so it is very likely that a malicious user is swiping the interface, so it is best to add a check to the interface during development to intercept these malicious requests .

This article concludes here, I hope it can be helpful to you!

Guess you like

Origin blog.csdn.net/veratata/article/details/128793248