MYSQL-Performance Optimization

Table of contents

Why perform database optimization?

mysql database optimization

SQL and index optimization

mysql installation and uninstallation (linux online installation and uninstallation)

Database version selection

Prepare data

table structure relationship

How to spot problematic SQL

Check whether the slow check log is enabled:

View variable information of all logs

MySQL slow check log storage format

MySQL slow check log analysis tool (mysqldumpslow)

introduce

usage

MySQL slow query log analysis tool (pt-query-digest)

Introduction and function

Install pt-query-digest tool

Quick installation (note: wget must be installed first)

Check whether the installation is complete:

Introduction to using the tool:

How to find problematic SQL by slow checking logs

SQL with many queries and each query takes a long time

IO large sql

SQL for index misses

Analyze SQL execution plan through explain query

Use explain to query SQL execution plan

Description of each field:

Optimization cases for specific slow queries

Optimization of function Max()

Optimization of function Count()

Subquery optimization

Optimization of group by

Optimization of Limit query

Index optimization


Why perform database optimization?

1. Avoid access errors on website pages

        Page 5xx error due to database connection timeout

        Page cannot load due to slow query

        Data cannot be submitted due to blocking

2. Increase the stability of the database

        Many database problems are caused by inefficient queries

3. Optimize user experience

        Smooth page access speed

        Good website functionality experience

mysql database optimization

From what aspects can the database be optimized? As shown below:

1.SQL and index optimization

        Write good SQL according to the requirements and create effective indexes. To achieve a certain requirement, you can write it in multiple ways. At this time, we have to choose the most efficient way of writing. At this time, you need to understand sql optimization

2. Database table structure optimization

        Design the table structure according to the paradigm of the database. The good design of the table structure is directly related to writing SQL statements.

3. System configuration optimization

        Most run on Linux machines, such as restrictions on the number of TCP connections, restrictions on the number of open files, and security restrictions, so we need to optimize these configurations accordingly.

4. Hardware configuration optimization

        Choose a CPU suitable for database services, faster IO, and higher memory; more CPUs are not always better, some database versions have maximum limitations, and IO operations do not reduce blocking.

Note: It can be seen from the above figure that in the pyramid, the cost of optimization gradually increases from bottom to top, while the effect of optimization gradually decreases.

SQL and index optimization

mysql installation and uninstallation (linux online installation and uninstallation)

Database version selection

1. Check the database version

select @@version;

Prepare data

URL: https://dev.mysql.com/doc/sakila/en/sakila-installation.html

The files contained in the sakila-db.zip compressed package are explained below

Download Data

The steps are as shown below

 

table structure relationship

Note: This table structure relationship is generated using tools.

How to spot problematic SQL

How to open MySQL slow check log and storage format

Check whether the slow check log is enabled:

show variables like 'slow_query_log'

show variables like 'slow_query_log'  

//查看是否开启慢查询日志

set global slow_query_log_file=' /usr/share/mysql/sql_log/mysql-slow.log'

//慢查询日志的位置

set global log_queries_not_using_indexes=on;

//开启慢查询日志

set global long_query_time=1;  

//大于1秒钟的数据记录到慢日志中,如果设置为默认0,则会有大量的信息存储在磁盘中,磁盘很容易满掉

View variable information of all logs

show variables like '%log%'

mysql> show variables like '%log%';

+-----------------------------------------+------------------------------------+

| Variable_name                           | Value                              |

+-----------------------------------------+------------------------------------+

| back_log                                | 80                                 |

| binlog_cache_size                       | 32768                              |

| binlog_checksum                         | CRC32                              |

| binlog_direct_non_transactional_updates | OFF                                |

| binlog_error_action                     | IGNORE_ERROR                       |

| binlog_format                           | STATEMENT                          |

| binlog_gtid_simple_recovery             | OFF                                |

| binlog_max_flush_queue_time             | 0                                  |

| binlog_order_commits                    | ON                                 |

| binlog_row_image                        | FULL                               |

| binlog_rows_query_log_events            | OFF                                |

| binlog_stmt_cache_size                  | 32768                              |

| binlogging_impossible_mode              | IGNORE_ERROR                       |

| expire_logs_days                        | 0                                  |

| general_log                             | OFF                                |

| general_log_file                        | /var/lib/mysql/mysql-host.log      |

| innodb_api_enable_binlog                | OFF                                |

| innodb_flush_log_at_timeout             | 1                                  |

| innodb_flush_log_at_trx_commit          | 1                                  |

| innodb_locks_unsafe_for_binlog          | OFF                                |

| innodb_log_buffer_size                  | 8388608                            |

| innodb_log_compressed_pages             | ON                                 |

| innodb_log_file_size                    | 50331648                           |

| innodb_log_files_in_group               | 2                                  |

| innodb_log_group_home_dir               | ./                                 |

| innodb_mirrored_log_groups              | 1                                  |

| innodb_online_alter_log_max_size        | 134217728                          |

| innodb_undo_logs                        | 128                                |

| log_bin                                 | OFF                                |

| log_bin_basename                        |                                    |

| log_bin_index                           |                                    |

| log_bin_trust_function_creators         | OFF                                |

| log_bin_use_v1_row_events               | OFF                                |

| log_error                               | /var/log/mysqld.log                |

| log_output                              | FILE                               |

| log_queries_not_using_indexes           | ON                                 |

| log_slave_updates                       | OFF                                |

| log_slow_admin_statements               | OFF                                |

| log_slow_slave_statements               | OFF                                |

| log_throttle_queries_not_using_indexes  | 0                                  |

| log_warnings                            | 1                                  |

| max_binlog_cache_size                   | 18446744073709547520               |

| max_binlog_size                         | 1073741824                         |

| max_binlog_stmt_cache_size              | 18446744073709547520               |

| max_relay_log_size                      | 0                                  |

| relay_log                               |                                    |

| relay_log_basename                      |                                    |

| relay_log_index                         |                                    |

| relay_log_info_file                     | relay-log.info                     |

| relay_log_info_repository               | FILE                               |

| relay_log_purge                         | ON                                 |

| relay_log_recovery                      | OFF                                |

| relay_log_space_limit                   | 0                                  |

| simplified_binlog_gtid_recovery         | OFF                                |

| slow_query_log                          | OFF                                |

| slow_query_log_file                     | /var/lib/mysql/mysql-host-slow.log |

| sql_log_bin                             | ON                                 |

| sql_log_off                             | OFF                                |

| sync_binlog                             | 0                                  |

| sync_relay_log                          | 10000                              |

| sync_relay_log_info                     | 10000                              |

+-----------------------------------------+------------------------------------+

61 rows in set (0.01 sec)

Enable slow check log:

show variables like 'slow_query_log'  
//查看是否开启慢查询日志
set global slow_query_log_file=' /var/lib/mysql/mysql-host-slow.log '
//慢查询日志的位置
set global log_queries_not_using_indexes=on;
//开启慢查询日志
set global long_query_time=1;  
//大于1秒钟的数据记录到慢日志中,如果设置为默认0,则会有大量的信息存储在磁盘中,磁盘很容易满掉

Verify whether the slow query log is enabled:

In mysql operation,

show databases;
use sakila;
select * from store;
select * from staff;

Monitor the log file to see if it is written

tail -50f /var/lib/mysql/mysql-host-slow.log

MySQL slow check log storage format

As shown below:

illustrate:

1. # Time: 180526 1:06:54 ------->Query execution time

2. # User@Host: root[root] @ localhost [] Id: 4 ------->Host information for executing sql

3. # Query_time: 0.000401 Lock_time: 0.000105 Rows_sent: 2 Rows_examined: 2------->SQL execution information:

Query_time: SQL query time

Lock_time: lock time

Rows_sent: Number of rows sent

Rows_examined: The number of rows scanned by the lock

4. SET timestamp=1527268014; ------->SQL execution time

5. select * from staff; ------->SQL execution content

MySQL slow check log analysis tool ( mysqldumpslow )

introduce

        How to view the slow query log? If the slow query log is turned on, a lot of data will be generated. Then we can analyze the log, generate an analysis report, and then optimize through the report.

usage

Next let's check out the usage of this tool:

Note: On the server where the mysql database is located, not in the mysql > command line

How to use this tool:

mysqldumpslow -h

View verbose information

Mysqldumpslow -v

View the top 10 slow query logs. The results of mysqldumpslow analysis are as follows:

mysqldumpslow -t 10 /var/lib/mysql/mysql-slow.log

The two pictures above are the results of the analysis. Each result shows the execution time, locking time, number of rows sent, and number of rows scanned.

This tool is the most commonly used tool. It is installed by installing mysql. However, the statistical results of this tool are relatively few, and the data on our optimization lock performance is still relatively small.

MySQL slow query log analysis tool (pt-query-digest)

Introduction and function

        As an excellent MySQL DBA, you also need to master several useful MySQL management tools, so I have been organizing and looking for some tools that can facilitate the management of MySQL. In the next period of time, a large part of the energy will be spent searching for these tools.

        Performance management has always been the first priority. Management of many tasks of DBA cannot see and have no way to measure the value. However, if a system is as slow as a snail, DBA can restore the system from the edge of collapse through monitoring and tuning. Bring us back to the high-speed rail era. The value and touch should be huge. (Many company leaders believe that if the system cannot run anymore, they need to replace it with a faster CPU, larger memory, and faster storage, and this is not a minority. Therefore, the value of a DBA has not been reflected, and the salary naturally does not. will be very high)

        The mysql log is the fastest and most direct way to track mysql performance bottlenecks. When a system performance bottleneck occurs, you must first open the slow query log and track it. During this period, the management and viewing of the slow query log have been sorted out twice. After reading this article, I accidentally discovered another tool for viewing slow query logs: mk-query-digest. This tool is known as the top ten tools that MySQL DBA must master on the Internet.

Install pt-query-digest tool

Quick installation (note: wget must be installed first)

wget https://www.percona.com/downloads/percona-toolkit/2.2.16/RPM/percona-toolkit-2.2.16-1.noarch.rpm && yum localinstall -y percona-toolkit-2.2.16-1 .norch.rpm

Check whether the installation is complete :

 Enter: pt-summary on the command line

The display is as shown below: the installation is successful! Enter [[root@node03 mysql]# pt-query-digest --help] 

Introduction to using the tool:

pt-summary –help

wget http://percona.com/get/pt-summary

View server information

Command: pt-summary

View disk overhead usage information

Command: pt-diskstats

View mysql database information

Instruction: pt-mysql-summary --user=root --password=123456

Analyze slow query logs

Command: pt-query-digest /data/mysql/data/db-3-12-slow.lo

Find the slave database and synchronization status of mysql

Instruction: pt-slave-find --host=localhost --user=root --password=123456

View mysql deadlock information

pt-deadlock-logger --user=root --password=123456 localhost

Analyze index usage from slow query logs

pt-index-usage slow_20131009.log

Find duplicate indexes in database tables

pt-duplicate-key-checker --host=localhost --user=root --password=123456

View current active IO overhead for mysql tables and files

pt-ioprofile

View the differences between different mysql configuration files

pt-config-diff /etc/my.cnf /etc/my_master.cnf

pt-find finds the mysql table and executes the command. The example is as follows

Find tables larger than 2G in the database:

pt-find --user=root --password=123456 --tablesize +2G

Find the table created 10 days ago in the MyISAM engine:

pt-find --user=root --password=123456 --ctime +10 --engine MyISAM

View table and index sizes and sort

pt-find --user=root --password=123456 --printf "%T\t%D.%N\n" | sort -rn

pt-kill kills the mysql process that meets the standard

Display queries that take more than 60 seconds

pt-kill --user=root --password=123456 --busy-time 60 --print

Kill queries longer than 60 seconds

 pt-kill --user=root --password=123456 --busy-time 60 --kill

View mysql authorization

1pt-show-grants --user=root --password=123456

2pt-show-grants --user=root --password=123456 --separate –revoke

Verify the integrity of database replication

pt-table-checksum --user=root --password=123456

appendix:

How to find problematic SQL by slow checking logs

SQL with many queries and each query takes a long time

        Usually it is the first few queries analyzed by pt-query-digest; this tool can clearly see the number and percentage of each SQL executed. SQL that is executed more times and accounts for a larger proportion

IO large sql

        Pay attention to the Rows examine item in the pt-query-digest analysis. The more rows scanned, the greater the IO.

SQL for index misses

        Pay attention to the comparison between Rows examine and Rows Send in pt-query-digest analysis. It means that the index hit rate of this SQL is not high. We should pay special attention to this kind of SQL.

Analyze SQL execution plan through explain query

Use explain to query SQL execution plan

The execution plan of SQL reflects the execution efficiency of SQL. The specific execution method is as follows:

Just add the explain keyword in front of the executed SQL;

Description of each field:

1) The larger the number in the id column, the faster it will be executed. If the numbers are the same, then they will be executed from top to bottom. If the id column is null, it means that this is a result set, and there is no need to use it for querying.

2) Common select_type columns include:

A: simple: indicates a simple select query that does not require union operations or does not contain subqueries. When there is a connection query, the outer query is simple, and there is only one

B: primary: a select that requires a union operation or contains a subquery. The select_type of the outermost unit query is primary. and only one

C: Union: Two select queries connected by union. The first query is a dervied derived table. Except for the first table, the select_type of the second and subsequent tables is union.

D: dependent union: same as union, appears in union or union all statement, but this query will be affected by external query

E: Union result: The result set containing union. In the union and union all statements, because it does not need to participate in the query, the id field is null.

F: subquery: Except for the subquery contained in the from clause, subqueries appearing elsewhere may be subqueries.

G: dependent subquery: similar to dependent union, indicating that the query of this subquery will be affected by the external table query

H: derived: The subquery that appears in the from clause is also called a derived table. In other databases, it may be called an inline view or a nested select.

3)table

The displayed query table name. If the query uses an alias, then the alias is displayed here. If it does not involve the operation of the data table, then this is displayed as null. If it is displayed as <derived N> enclosed in angle brackets, it means that this is Temporary table, the following N is the id in the execution plan, indicating that the results come from this query. If it is <union M,N> enclosed in angle brackets, it is similar to <derived N> and is also a temporary table, indicating that the result comes from the result set with id M,N of the union query.

4)type

From best to worst: system, const, eq_ref, ref, fulltext, ref_or_null, unique_subquery, index_subquery, range, index_merge, index, ALL, except all, other types can use index, except index_merge, other Only one index can be used for the type

A: system: There is only one row of data in the table or it is an empty table, and it can only be used for myisam and memory tables. If it is an Innodb engine table, the type column in this case is usually all or index.

B: const: When using a unique index or primary key, the returned record must be an equivalent where condition of 1 row of records, usually the type is const. Other databases are also called unique index scans

C: eq_ref: Appears in the query plan to be connected to two tables. The driver table only returns one row of data, and this row of data is the primary key or unique index of the second table, and must be not null. The unique index and primary key are multiple columns, eq_ref will only appear when all columns are used for comparison

D: ref: It does not require the connection order like eq_ref, and there is no requirement for primary key and unique index. It may occur as long as equal conditions are used for retrieval, and equality searches with auxiliary indexes are common. Or in a multi-column primary key or unique index, using columns other than the first column as an equal value search may also occur. In short, an equal value search may occur where the returned data is not unique.

E: fulltext: full-text index retrieval. Please note that the priority of the full-text index is very high. If the full-text index and the ordinary index exist at the same time, mysql will give priority to the full-text index regardless of the cost.

F: ref_or_null: Similar to the ref method, except that the comparison of null values ​​is added. It's not actually used much.

G: unique_subquery: used for in-form subqueries in where. The subquery returns unique values ​​without duplicate values.

H: index_subquery: used for in-form subqueries that use auxiliary indexes or in constant lists. The subquery may return duplicate values, and the index can be used to deduplicate the subquery.

I: range: Index range scan, commonly used in queries using operators such as >, <, is null, between, in, like, etc.

J: index_merge: Indicates that the query uses more than two indexes, and finally takes the intersection or union. Common and and or conditions use different indexes. The official sorting is after ref_or_null, but in fact, because all indexes need to be read , the performance may not be as good as range most of the time

K: index: Index full table scan, scan the index from beginning to end. It is common to use index columns to process queries that do not need to read data files, and to use index sorting or grouping queries.

L: all: This is to scan the data file in the entire table, and then filter it at the server layer to return records that meet the requirements.

5)possible_keys

The indexes that may be used by the query will be listed here.

6)key

Query the index actually used. When select_type is index_merge, more than two indexes may appear here. For other select_types, only one will appear here.

7) key_len

The index length used to process queries. If it is a single-column index, the entire index length is included. If it is a multi-column index, then the query may not be able to use all columns. Specifically, how many column indexes are used, here is It will be calculated. Unused columns will not be calculated here. Pay attention to the value of this column and calculate the total length of your multi-column index to know whether all columns are used. It should be noted that the indexes used by mysql's ICP feature will not be counted. In addition, key_len only calculates the index length used in the where condition, and even if the index is used in sorting and grouping, it will not be calculated in key_len.

8)ref

If it is a constant equivalent query, const will be displayed here. If it is a connection query, the execution plan of the driven table will display the associated fields of the driving table. If the condition uses an expression or function, or the condition column has an internal error. Implicit conversion, which may appear as func here

9)rows

Here is the estimated number of scan lines in the execution plan, not an exact value

10)extra

This column can display a lot of information, there are dozens of them, the commonly used ones are

A: distinct: The distinct keyword is used in the select part

B: no tables used: query without from clause or From dual query

C: Use the not in() form subquery or the join query of the not exists operator. This is called an anti-join. That is, a general join query queries the inner table first and then the outer table, while an anti-join query queries the outer table first and then the inner table.

D: using filesort: This occurs when the index cannot be used during sorting. Commonly seen in order by and group by statements

E: using index: There is no need to query back to the table when querying. The queried data can be obtained directly through the index.

F: using join buffer (block nested loop), using join buffer (batched key accss): Versions after 5.6.x optimize the BNL and BKA features of related queries. The main purpose is to reduce the number of loops in the internal table and scan the query sequentially.

G:using sort_union,using_union,using intersect,using sort_intersection:

using intersect: When expressing the conditions of each index using and, this information indicates that the intersection is obtained from the processing results.

using union: Indicates that when using or to connect the conditions of each index, this information indicates that the union is obtained from the processing results.

using sort_union and using sort_intersection: are similar to the previous two, except that they appear when using and and or to query a large amount of information. The primary key is first queried, and then the records are read and returned after sorting and merging.

H: using temporary: Indicates that a temporary table is used to store intermediate results. Temporary tables can be memory temporary tables and disk temporary tables. They cannot be seen in the execution plan. You need to check the status variables, used_tmp_table, and used_tmp_disk_table to see them.

I: using where: indicates that not all records returned by the storage engine meet the query conditions and need to be filtered at the server layer. Query conditions are divided into restriction conditions and inspection conditions. Before 5.6, the storage engine could only scan and return data according to the restriction conditions, and then the server layer filtered according to the inspection conditions and returned data that truly meets the query. After 5.6.x, the ICP feature is supported, and the check conditions can be pushed down to the storage engine layer. Data that does not meet the check conditions and restrictions will not be read directly, which greatly reduces the number of records scanned by the storage engine. The extra column shows using index condition

J: firstmatch(tb_name): One of the new features of optimizing subqueries introduced in 5.6.x. It is common in subqueries containing in() type in where clauses. If the amount of data in the internal table is relatively large, this may occur.

K: loosescan(m..n): One of the new features of optimizing subqueries introduced after 5.6.x. In the in() type subquery, this may occur when the subquery returns duplicate records.

In addition to these, there are many query data dictionary libraries. During the execution plan process, some prompt messages are found that the results cannot exist.

11)filtered

This column will appear when using explain extended. Versions after 5.7 have this field by default, so there is no need to use explain extended. This field indicates the proportion of the remaining records that satisfy the query after the data returned by the storage engine is filtered at the server layer. Note that it is a percentage, not a specific number of records.

Attached pictures:

Optimization cases for specific slow queries

Optimization of function Max()

Purpose: Query the last payment time - optimize max() function

Statement:

select max(payment_date) from payment;

Implementation plan:

explain select max(payment_date) from payment;


You can see that the displayed execution plan is not very efficient and can slow down the efficiency of the server. How can it be optimized?

Create index

create index inx_paydate on payment(payment_date);

The index is operated sequentially and does not need to scan the table, so the execution efficiency will be relatively constant.

Optimization of function Count()

Requirement: Simultaneously check the number of movies in 2006 and 2007 in one SQL

Wrong way:

Statement:

select count(release_year='2006' or release_year='2007') from film;

I can’t tell what the numbers were in 2006 and 2007.

 select count(*) from film where release_year='2006' or release_year='2007';

Correct way to write:

select count(release_year='2006' or null) as '06films',count(release_year='2007' or null) as '07films' from film;

Difference: count(*) and count(id)

Create table and insert statement

 create table t(id int);

 insert into t values(1),(2),(null);

Count(*):select count(*)from t;

Count(id):select count(id)from t;

illustrate:

Count(id) is a value that does not contain null

Count(*) is a value containing null

Subquery optimization

        Subquery is a method we often use in the development process. Under normal circumstances, the subquery needs to be optimized into a join query. However, during optimization, you need to pay attention to whether the associated key has a one-to-many relationship, and pay attention to duplicate data.

View the t-table we created

show create table t;

Next we create a t1 table

create table t1(tid int);

and insert a piece of data

We need to perform a subquery. The requirement is: query all the data in table t whose id is tid in table t1;

select * from t where t.id in (select t1.tid from t1);

Next we use the join operation to perform the operation

select id from t join t1 on t.id =t1.tid;

 Judging from the above results, the query results are consistent, so we optimize the subquery into a join operation.

Next, we insert another piece of data into the t1 table

insert into t1 values (1);

select * from t1;

In this case, if we use subquery to query, the returned result is as shown below:

If you use the join method to search, as shown in the figure below:

In this case, there is a one-to-many relationship, and there will be duplication of data. In order to avoid duplication of data, we have to use the distinct keyword to perform deduplication operations.

select distinct id from t join t1 on t.id =t1.tid;

Note: This one-to-many relationship is a pitfall we encountered during the development process. Data duplication occurs and everyone needs to pay attention to it.

Example: Query all movies starring Sandra:

explain select title,release_year,length

 from film

 where film_id in (

 select film_id from film_actor where actor_id in (

 select actor_id from actor where first_name='sandra'));

Optimization of group by

It's better to use columns from the same table,

Requirements: The number of films each actor has participated in - (film list and cast list) 

explain select actor.first_name,actor.last_name,count(*)

from sakila.film_actor

inner join sakila.actor using(actor_id)

group by film_actor.actor_id;

Optimized SQL:

explain select actor.first_name,actor.last_name,c.cnt

from sakila.actor inner join (

select actor_id,count(*) as cnt from sakila.film_actor group by actor_id

)as c using(actor_id);

Note: Judging from the above execution plan, this optimized method does not use temporary files and file sorting , but uses indexes instead. The query efficiency is very high.

At this time, the data in our table is relatively large, which will occupy a lot of IO operations, optimize the efficiency of SQL execution, and save server resources, so we need to optimize.

Notice:

1. The role of using keyword in mysql : that is to say , to use using, table a and table b must have the same columns .

2. When using Join to perform joint queries on multiple tables, we usually use On to establish the relationship between the two tables. In fact, there is a more convenient keyword, which is Using.

3. If the associated field names of the two tables are the same, you can use Using to establish the relationship, which is concise and clear.

Optimization of Limit query

Limit is often used for paging processing, and the duration is used with the order by clause, so most of the time Filesorts is used, which will cause a lot of IO problems.

example:

Requirement: Query the movie ID and description information, sort according to the theme, and retrieve 5 pieces of data starting from serial number 50.

select film_id,description from sakila.film order by title limit 50,5;

Execution result:

Check out its execution plan:

 What optimization method should we use for this operation?

Optimization step 1:

Use indexed columns or primary keys to perform order by operations, because as we all know, innodb sorts according to the logical order of the primary keys. A lot of IO operations can be avoided.

select film_id,description from sakila.film order by film_id limit 50,5;

 Check the execution plan

 So if we get 5 records starting from row 500, what will the execution plan look like?

explain select film_id,description from sakila.film order by film_id limit 500,5\G

As we turn the page further, the IO operation will become larger and larger. If a table has tens of millions of rows of data, the page turning will become slower and slower, so we need to further optimize it.

Optimization step 2: Record the primary key returned last time and use primary key filtering in the next query. ( Note: This avoids scanning too many records when the amount of data is large )

The last limit was 50,5, so we need to use the last index record value in this optimization process.

select film_id,description from sakila.film  where film_id >55 and film_id<=60 order by film_id limit 1,5;

View the execution plan:

 Conclusion: The number of scan lines remains unchanged, the execution plan is very fixed, and the efficiency is also very fixed.

Precautions:

The primary keys must be sorted sequentially and consecutively. If there is a certain column or several columns in the middle of the primary key, there will be less than 5 rows of data listed; if it is not continuous, create an additional column index_id to ensure that this column of data If you want to increase it automatically, just add an index.

Index optimization

1. What is an index?

The index is equivalent to the table of contents of a book. You can quickly find the required content based on the page numbers in the table of contents.

The database uses an index to find a specific value and then points forward to find the row containing that value. Create an index in the table, then find the index value that meets the query conditions in the index, and finally quickly find the corresponding record in the table through the ROWID (equivalent to the page number) saved in the index. The establishment of an index is a relatively directional field in the table, which is equivalent to a directory. For example, the administrative area code. The administrative area codes in the same region are all the same. Then add an index to this column to avoid repeated scanning. Achieve the purpose of optimization!

2. How to create an index

Indexes can be created when executing the CREATE TABLE statement, or CREATE INDEX or ALTER TABLE can be used alone to add indexes to the table.

1ALTER TABLE

ALTER TABLE is used to create a normal index, UNIQUE index or PRIMARY KEY index.

ALTER TABLE table_name ADD INDEX index_name (column_list)

ALTER TABLE table_name ADD UNIQUE (column_list)

ALTER TABLE table_name ADD PRIMARY KEY (column_list)

Note: table_name is the name of the table to be indexed, column_list indicates which columns to index, and if there are multiple columns, separate them with commas. The index name index_name is optional. By default, MySQL will assign a name based on the first index column. Additionally, ALTER TABLE allows multiple tables to be altered in a single statement, so multiple indexes can be created at the same time.

2CREATE INDEX

CREATE INDEX can add ordinary indexes or UNIQUE indexes to the table.

CREATE INDEX index_name ON table_name (column_list)

CREATE UNIQUE INDEX index_name ON table_name (column_list)

Note: table_name, index_name and column_list have the same meaning as in the ALTER TABLE statement, and the index name is not optional. In addition, you cannot use the CREATE INDEX statement to create a PRIMARY KEY index.

3. Index type

When you create an index, you can specify whether the index can contain duplicate values. If not included, the index should be created as a PRIMARY KEY or UNIQUE index. For a single-column unique index, this guarantees that the single column does not contain duplicate values. For multi-column unique indexes, it is guaranteed that the combination of multiple values ​​is not repeated.

PRIMARY KEY indexes are very similar to UNIQUE indexes.

In fact, a PRIMARY KEY index is just a UNIQUE index with the name PRIMARY. This means that a table can only contain one PRIMARY KEY, because it is impossible to have two indexes with the same name in a table.

The following SQL statement adds a PRIMARY KEY index on sid to the students table.

ALTER TABLE students ADD PRIMARY KEY (sid)

4. Delete index

Indexes can be deleted using the ALTER TABLE or DROP INDEX statement. Similar to the CREATE INDEX statement, DROP INDEX can be processed as a statement inside ALTER TABLE. The syntax is as follows.
 

DROP INDEX index_name ON talbe_name

ALTER TABLE table_name DROP INDEX index_name

ALTER TABLE table_name DROP PRIMARY KEY

Among them, the first two statements are equivalent, and the index index_name in table_name is deleted.

The third statement is only used when deleting the PRIMARY KEY index, because a table can only have one PRIMARY KEY index, so there is no need to specify the index name. If no PRIMARY KEY index is created but the table has one or more UNIQUE indexes, MySQL drops the first UNIQUE index.

If a column is deleted from a table, the index will be affected. For a multi-column index, if one of the columns is deleted, the column will also be deleted from the index. If you delete all the columns that make up the index, the entire index will be deleted.

   5. View index

show index from tblname;

show keys from tblname;

   6. Under what circumstances are indexes used?

1. The primary key of the table

2. Automatically create a unique index

3. Unique constraints on table fields

4. Fields for direct conditional query (fields used for conditional constraints in SQL)

5. Fields associated with other tables in the query

6. The sorted fields in the query (if the sorted fields are accessed through the index, the sorting speed will be greatly improved)

7. Fields for statistics or group statistics in queries

8. The table records are too few (if a table only has 5 records and an index is used to access the records, then the index table needs to be accessed first, and then the data table is accessed through the index table. Generally, the index table and the data table are not in the same data block)

9. Tables that are frequently inserted, deleted, or modified (for some frequently processed business tables, indexes should be reduced as much as possible if queries allow)

10. Table fields with repeated data and even distribution (if a table has 100,000 rows of records, and there is a field A that only has two values: T and F, and the distribution probability of each value is about 50%, then for this kind of table A Building indexes on fields generally does not improve the query speed of the database.)

11. Table fields that are often queried together with the main field but have many main field index values

12. Things to do when indexing tens of millions of MySQL databases and how to improve performance

3. How to select appropriate columns to create indexes

1.  Add indexes to the columns in the where clause, group by clause, order by clause, and on clause 

2. The smaller the index field, the better (because the database data storage unit is based on "page", the more data is stored, the greater the IO will be)

3. Columns with large dispersion are placed in front of the joint index.

example:

select * from payment where staff_id =2 and customer_id =584;

Notice:

Is index ( staff_id , customer_id ) better or index ( customer_id , staff_id ) better?

So how do we verify the dispersion?

A. Let’s first look at the table structure

desc payment;

B. Check the number of different IDs in these two fields respectively. The greater the number, the greater the degree of dispersion: Therefore, it can be seen from the figure below: customer_id has a greater degree of dispersion.

Conclusion: Since customer_id is highly discrete, it is better to use index ( customer_id , staff_id )

C. mysql joint index

①Naming rules: table name_field name

1. The fields that need to be indexed must be in the where condition

2. Fields with a small amount of data do not need to be indexed

3. If there is an OR relationship in the where condition, indexing will not work.

4. Comply with the leftmost principle

②What is a joint index?

  1. An index on two or more columns is called a joint index, also known as a composite index.
  2. Using additional columns in an index allows you to narrow the scope of your search, but using an index with two columns  is different than using two separate indexes. The structure of a composite index is similar to a phone book, where a person's name is composed of a first and last name. The phone book is first sorted by last name pairs, and then sorted by first name for people with the same last name. A phone book is very useful if you know your last name, more useful if you know both your first and last name, but useless if you only know your first name but no last name.

Therefore, when creating a composite index, the order of columns should be carefully considered. Composite indexes are useful when searching on all columns in the index or only on the first few columns; they are not useful when searching on any subsequent columns.

4. Methods for index optimization SQL

1. Index maintenance and optimization (duplicate and redundant indexes)

Increasing indexes will be beneficial to query efficiency, but will reduce the efficiency of insert, update, and delete. However, this is often not the case. Too many indexes will not only affect the usage efficiency, but also affect the query efficiency. This is due to the database query. When analyzing, you must first choose which index to use for query. If there are too many indexes, the analysis process will be slower, which will also reduce the efficiency of query. Therefore, we need to know how to increase and sometimes maintain and delete unnecessary ones. index

2. How to find duplicate and redundant indexes

Duplicate index:

Duplicate indexes refer to indexes of the same type built on the same columns in the same order. The indexes on the primary key and ID columns in the following table are duplicate indexes.

create table test(

id int not null primary key,

name varchar(10) not null,

title varchar(50) not null,

unique(id)

)engine=innodb;

Redundant index:

A redundant index refers to an index with the same prefix column in multiple indexes, or an index that contains the primary key in a joint index. In the following example, key (name, id) is a redundant index.

create table test(

id int not null primary key,

name varchar(10) not null,

title varchar(50) not null,

key(name,id)

)engine=innodb;

Note: For innodb, the primary key will actually be included behind each index. At this time, the joint index we established artificially includes the primary key, so it is a redundant index at this time.

3. How to find duplicate indexes

Tools: Use the pt-duplicate-key-checker tool to check for duplicate and redundant indexes

pt-duplicate-key-checker -uroot -padmin -h 127.0.0.1

4. Methods of index maintenance

Due to business changes, some indexes are no longer needed and must be deleted.

In mysql, index usage can only be analyzed through slow query logs and the pt-index-usage tool;

pt-index-usage -uroot -padmin /var/lib/mysql/mysql-host-slow.lo

 Attached: https://www.percona.com/downloads/

5. Things to note

Well-designed MySql indexes can make your database fly and greatly improve database efficiency. There are a few points to note when designing MySql indexes:

1. Create an index

For applications where queries dominate, indexes are particularly important. Many times performance problems are simply caused by us forgetting to add an index, or not adding a more effective index. If there is no index, then a full table scan will be performed to find even a specific piece of data. If the amount of data in a table is large and there are few qualified results, then not adding an index will cause fatal performance degradation. .

But it is not necessary to build an index in every situation. For example, gender may only have two values. Building an index not only has no advantage, but also affects the update speed. This is called over-indexing.

2. Composite index

For example, there is a statement like this: select * from users where area='beijing' and age=22;

If we create a single index on area and age respectively, since mysql query can only use one index at a time, although this has improved the efficiency of on area and age, Creating a composite index on the column will bring higher efficiency. If we create a composite index of (area, age, salary), it is actually equivalent to creating three indexes (area, age, salary), (area, age), and (area). This is called the best left prefix characteristic.

Therefore, when we create a composite index, we should put the columns most commonly used as constraints on the left, in descending order.

3. The index will not contain columns with NULL values.

As long as a column contains a NULL value, it will not be included in the index. As long as one column in the composite index contains a NULL value, then this column is invalid for the composite index. Therefore, when designing the database, we should not let the default value of the field be NULL.

4. Use short indexes

To index string columns, a prefix length should be specified if possible. For example, if you have a CHAR(255) column, do not index the entire column if most values ​​are unique within the first 10 or 20 characters. Short indexes not only improve query speed but also save disk space and I/O operations.

5. Sorting index problem

MySQL queries only use one index, so if an index has been used in the where clause, the columns in order by will not use the index. Therefore, do not use sorting operations when the default sorting of the database can meet the requirements; try not to include sorting of multiple columns. If necessary, it is best to create composite indexes for these columns.

6. Like statement operation

Under normal circumstances, the use of like operations is discouraged . If it must be used, how to use it is also a problem. Like “%aaa%” will not use the index but like “aaa%” will use the index.

7. Don’t perform operations on columns

select * from users where

YEAR(adddate)

8. Do not use NOT IN operation

NOT IN operations will not use the index and will perform a full table scan. NOT IN can be replaced by NOT EXISTS

Guess you like

Origin blog.csdn.net/qq_40322236/article/details/129674310