Mysql Tuning Encyclopedia (covering 90% of the scenarios that need to be tuned)

foreword

To study this article, you need to have a certain theoretical foundation. You might as well take a look at my article first .

1.Join statement optimization

Algorithms related to Join statement

Algorithm 1: Nested-Loop Join (NLJ)

This algorithm is called "nested loop Join", and it roughly works like this:

Table Join Type
t1 range
t2 ref
t3 All
for each row in t1 matching range{
    
    
	for each row in t2 matching reference key {
    
    
		for each row in t3 {
    
    
			if row satisfies join conditions, send to client
		}
	}
}

As above, for example, we have three tables, the general process is like this, first query the eligible data in t1, then use a for loop to traverse, query the eligible data in the t2 table in the loop, and use the reference key to match, That is, the on field at the time of join, and then query the qualified data in the t3 table. Since the third table represents a full table scan, all data in t3 is looped to make a judgment.

It is not difficult to find that this algorithm is relatively crude. The more result sets in the outer loop, the more times the inner loop will be executed.

Algorithm 2: Block Nested-Loop Join (BNLJ)

Translated into Chinese, this algorithm is called "block loop nested Join", and its gameplay is as follows:

Let's still take the table in Algorithm 1 above as an example, and its pseudocode is roughly like this:

for each row in t1 matching range {
	for each row in t2 matching reference key {
		store used columns from t1, t2 in join buffer
		if buffer is full {
			for each row in t3 {
				for each t1, t2 combination in join buffer {
					if row satisfies join conditions, send to client
				}
			}
			empty join buffer
		}
	}
}
if buffer is not empty {
	for each row in t3 {
		for each t1, t2 combination in join buffer {
			if row satisfies join conditions, send to client
		}
	}
}

Querying t1 and t2 is the same as the above algorithm, but there is a difference when querying the third table. It stores the fields needed in t1 and t2 tables in a place called join buffer. This join buffer is called connection cache , stuff data into the join buffer in turn, when the join buffer is full, recycle t3, then use the data in t3 to compare with the join buffer, and return to the client if they match.

Let's compare these two algorithms. Let's look at the first one. If there are 100 elements in for each row in t2, then you need to execute for each row in t3 100 times. If you use BNLJ, mysql will put these 100 rows The data is cached in the join buffer. If the join buffer is large enough, all the 100 rows of data will be stored in the cache, so you only need to execute for each row in t3 once, which greatly reduces the number of table scans for the memory loop. So if these 100 pieces of data cannot be stored in join at one time, how many times do you need to execute for each row in t3? Here is a calculation formula: (S * C)/join_buffer_size + 1, where S is a row of data in the cached t1 or t2 table, C is the number of cached rows, and S*C indicates how much space is needed to cache all data, join_buffer_size Is the size of the join buffer, use this formula to get the number of scans required.

Then we can know that BNLJ mainly introduces the join buffer to reduce the number of memory scan tables and improve performance. However, the use of the join buffer is conditional. Only when certain conditions are met can the join buffer be used:

  • The join type is ALL, index or range

  • The first nonconst table will not allocate a join buffer, even if the type is ALL or index

  • The join buffer only caches the required fields, not the entire row of data, that is, store used columns are used instead of all fields of t1 and t2.

  • Each join that can be cached will be allocated a join buffer, and a query may have multiple join buffers

  • The join buffer is allocated before the join is performed and released after the query is complete.

You can set the join buffer size with the join_buffer_size variable:

show variables like 'join_buffer_size'; //查看join buffer大小
set join_buffer_size = 1024 * 1024 * 50; //设置当前session的join buffer为50M
set global join_buffer_size = 1024 * 1024 * 50; //设置全局的join buffer为50M(一般不建议设置太大)

How to know whether join buffer is used in sql? Look at the picture below:

insert image description here

Algorithm 3: Batched Key Access Join (BKA)

This algorithm translated into Chinese is called "batch key-value access"

  • Introduced in MySQL 5.6

  • The cornerstone of BKA: Multi Range Read (MRR)

Algorithm 4: HASH JOIN

Note:

  • The MySQL version must be >= 8.0.18, and there are many restrictions, such as not being able to apply to outer joins, such as left join/right join, etc. Starting from 8.0.20, the restrictions are much less, and it is recommended to use 8.0.20 or later.

  • Starting from MySQL8.0.18, the join buffer of hash join is allocated incrementally, which means that you can set the join_buffer_size larger. In MySQL 8.0.18, if you use an outer join, the outer join cannot use hash join. At this time, join_buffer_size will directly allocate memory according to the value you set. Therefore, join_buffer_size still has to be set carefully.

  • Since 8.0.20, BNLJ has been removed, and hash join is used instead of BNLJ

Regarding HASH JOIN, it is strongly recommended that you read this article: https://dev.mysql.com/doc/refman/8.0/en/hash-joins.html which lists the differences between hash joins in various versions of mysql.

Driver table and driven table

Let's understand these two concepts before formally discussing join optimization.

  • The table of the outer loop is the driving table, and the representation of the inner loop is the driven table

We still look at the tables in Algorithm 1 above: t1 is the driving table of t2, t2 is the driven table of t1, t2 is the driving table of t3, and t3 is the driven table of t2.

JOIN tuning principles

After understanding the above concepts, let's take a look at the JOIN tuning principles:

Tuning principle 1: Use small tables to drive large tables

That is to say, a table with a small amount of data is used as a driving table, and a table with a large amount of data is used as a driven table.

How to see which table is the driving table and which one represents the driven table?

We still need to use explain:

insert image description here

This id means that the larger the id, the faster it will be executed. If the id is the same, it will be executed sequentially from top to bottom. Therefore, for the SQL above, the book(b) table will be operated first, and then the second line will be executed to operate the userbookscore© table, so the book table is The driving table, userbookscore represents the driven table.

Let's take a look at the operation of associating three tables:

insert image description here

There are three rows in the result here, first operate table b, then operate table c, and finally operate table d, then table b is the driving table of table c, and table c is the driving table of table d.

The above is the first principle, using small tables to drive large tables. but:

Generally speaking, without manual consideration, mysql's associated query optimizer will automatically select the optimal execution order

If the optimizer sucks, you can use STRAIGHT_JOIN

Tuning principle 2: If there is a where condition, you should be able to use the index and minimize the amount of data in the outer loop

Because the larger the outer result, the more memory scans are performed.

Tuning principle 3: try to create indexes for join fields

Small pit: When the types of the join fields are different, the index cannot be used.

Tuning principle 4: Minimize the number of rows scanned (explain-rows)

Try to control it within a million (experience, for reference only), if it is within a million, the performance is acceptable, if it is more than a million, the performance will be slower. You can refer to it.

Tuning principle 5: Do not have too many tables involved in join

Ali programming regulations recommend no more than 3.

If the business must be associated with many tables to find out the results, then it can be split into multiple SQLs and executed in multiple ways.

If it is split into multiple sql, then the sql will become simpler, easier to analyze, and more efficient to execute. If it is a complex query, it will be more troublesome to optimize.

So don't be proud of writing complex sql, because complex sql is no longer something to boast about in this era.

Tuning principle 6: If the join field of the driven table cannot use the index, and the memory is sufficient, you can consider setting the join buffer to be larger

2. Limit query optimization

Take a look at the picture below:
insert image description here

insert image description here

From the above figure, it is not difficult to find that the larger the offset value, the slower the query efficiency will be.

Let's analyze it:

insert image description here

You can see that it uses a full table scan. In this case, we need to consider optimization. Let's discuss the optimization method of limit.

Optimization method 1: Covering index

insert image description here

Here we use primary key coverage, and we can see that the performance has been improved a lot. Let's analyze it:

insert image description here

It can be seen that it uses a full index scan, and a full index scan is much faster than a full table scan.

Optimization method 2: Covering index + Join

For optimization 1, if we really expect to return all the fields of the chapter table, then we can play like this:
insert image description here

In this way, it will first find out which chapterId values ​​the chapter has, and then select * from chapter to scan these chapter values.

Optimization method 3: covering index + subquery

insert image description here

The idea of ​​this method is to first take the smallest value of the chapter from the chapter, and then take 10 entries from the smallest value obtained.

Optimization method 4: If the starting primary key value and the ending primary key value can be obtained

insert image description here

In this way, you can directly replace limit with between

Optimization method 5: Prohibit passing in too large page numbers

you know

3. COUNT statement optimization

COUNT statement summary

  • The performance of count(*) and count(1) is the same, there is no performance difference.

  • count(*) will select the smallest non-primary key index, if there is no non-primary key index, the primary key will be used

  • count(*) will not exclude rows that are null, while count(field) will exclude

  • For the count(*) statement without any query conditions, MyISAM and InnoDB (MySQL>=8.0.13) have been optimized

  • If there is no special requirement, try to use COUNT(*)

Optimization

Let's start with a query:

Let me explain here that the reason why the chapter table is used is because the chapter table has a lot of data, and the effect can be seen better. You can see that the above query took 701 milliseconds, and the performance is not impressive.

Let's take a look at the storage engine of the chapter table:

insert image description here

As you can see, it uses InnoDB.

Let's look at the database version again:

insert image description here

We are using 5.6.16<8.0.13, so it cannot be optimized for the unconditional count statement.

If our version is higher than 8.0.13, the above sql may take a relatively short time. Therefore, upgrading the version of the mysql database will greatly improve our query efficiency.

So, let's discuss how to optimize this count statement.

First use explain to analyze the execution plan of this sql:

insert image description here

It can be seen that the type is index, indicating that a full index scan has occurred, the index used is idx_authorId, and the length of the key is 4.

For this sql, we have the following solutions to optimize:

Solution 1: Change the database engine to MyISAM

The actual project is rarely used, and the database engine is generally not modified.

Solution 2: Create a summary table table[table_name, count]

For example, if the chapter data changes, modify the data in the summary table. For example, add a piece of data to the chapter, count+1 in the summary table, delete a piece of data, and count-1 in the summary table. You can also automatically maintain summary tables through triggers. The advantage of this method is that the result is more accurate and more flexible, and you can design a summary table according to your needs. The disadvantage is also obvious, that is, increased maintenance costs.

Scenario 3: via sql_calc_found_rows

After finishing this query, count is executed automatically.

select sql_calc_found_rows * from salaries limit 0,10;
select found_rows() as salary_count;

Disadvantages: mysql8.0.17 has abandoned this usage and will be deleted in the future.

Scenario 4: Use caching

You can store the result of select count(*) from chapter; in the cache.

Advantages: Fast performance and more accurate results. Disadvantages: The introduction of additional components increases the complexity of the architecture.

Scenario 5: information_schema.tables

You can play like this:

insert image description here

If there is a TABLE_ROWS in the result, you can get the total number of entries in the chapter table, and you can see that it only takes 56 milliseconds, and the performance is very fast.

Benefits: No need to operate the chapter table, no matter how much data the chapter table has, the results can be returned quickly.

Disadvantages: This is an estimated value, not an accurate value, which means that if you want to use this sql, you will require your business to be less precise in the result of this count.

方案6:show table status where name = ‘chapter’;

insert image description here

This is similar to Solution 5, and does not need to operate your chapter table, but it is also an estimated value, not an exact value.

Solution 7: explain

insert image description here

The advantages and disadvantages are also the same as scheme 5 and scheme 6.

4. ORDER BY optimization

If you want to optimize the ORDER BY statement, the best way is: use the index to avoid sorting. This is the first principle of ORDER BY tuning.

Let's take a look at the bookindex table first:

insert image description here

There is a composite index on this table.

Let's execute this sql:

insert image description here

It can be seen that a full table scan has occurred, and it is found that the index is not used to avoid sorting. We might as well add a limit to see:

insert image description here

It can be found that the type has changed to index, that is to say, this sql can use the index, so why is it sometimes ALL and sometimes index?

This is because the SQL at the beginning is equivalent to sorting the entire table, and the mysql optimizer calculates based on the cost. When it finds that the cost of full table scanning is lower than the cost of using indexes, then it directly completes the table. The table is scanned.

And we can judge whether the index can be used to avoid sorting through the Extra field of the result returned by the execution plan.

If it is Using filesort, it means that the index cannot be used to avoid sorting.

Sorting mode 1: rowid sorting (regular sorting)

For now, mysql implements 3 sorting modes. The first is rowid sorting, also called conventional sorting. The general principle is as follows:

  1. Get records from the table that satisfy the WHERE condition

  2. For each record, take out the primary key and sort key (id, order_column) of the record and put it into the sort buffer (controlled by sort_buffer_size)

  3. If the sort buffer can store all the (id, order_column) that meet the conditions, then sort; otherwise, when the sort buffer is full, sort and write to a temporary file

    • Sorting Algorithm: Quick Sort Algorithm
  4. If temporary files are generated during sorting, you need to use the merge sort algorithm to ensure that the records are in order.

  5. The above process is executed in a loop until all records satisfying the conditions are all involved in sorting.

  6. Scan the sorted (id, order_column) pairs, and use the id to fetch other fields that the SELECT needs to return.

  7. return result set

rowid sorting characteristics

  • See if the sort buffer can store all (id, order_column) in the result set. If not, a temporary file will be generated.

  • A sort requires two IOs, the first IO is the above step 2, and the second IO is step 6.

For step 6, since the returned results are sorted according to order_column, the results this time are out of order for the primary key id, and there will be a problem of random IO. MySql has made an optimization for this situation. It will be in Before using the primary key to obtain data, sort according to the primary key and put it in the cache. This cache can be controlled by read_rnd_buffer_size , and then fetch records, thereby converting random IO into sequential IO.

Sorting mode 2: full field sorting (optimized sorting)

Full-field sorting is an optimization for rowid sorting. It is roughly the same as rowid sorting. The difference is that full-field sorting is:

  • Directly take out all the fields needed in SQL and put them in the sort buffer, instead of taking out the primary key and sorting key (id, order_column) like rowid and putting them in the sort buffer.

  • Since the sort buffer contains all the fields required by the query, it can be returned directly after sorting in the sort buffer.

Full field sorting VS rowid sorting

  • Benefits: Improve performance without requiring two IOs

  • Disadvantages: A row of data generally occupies more space than rowid sorting; if the sort buffer is relatively small, it is easy to cause temporary files.

How to choose the algorithm?

In what scenario should I choose rowid sorting, and in what scenario should I choose full field sorting?

mysql provides max_length_for_sort_data: when the total length of the fields appearing in ORDER BY SQL is less than this value, use full field sorting, otherwise use rowid sorting.

Sorting Mode 3: Packed Field Sorting

  • Introduced in MySQL 5.7

  • An optimization of the full-field pattern that works the same, but packs the fields tightly together instead of using fixed-length spaces.

For example: VARCHAR(255) "yes": do not pack 255 bytes; pack: 2+3 bytes.

That is to say, if a field type is VARCHAR, the length is 255, and the field value is yes, 255 bytes will be used if the whole field is used for sorting, and only 2 bytes are needed to store the value of this field if the packed field is used for sorting length, plus the three-byte storage field "yes" is fine by itself. This can save a lot of space, allowing the sort buffer to store more content.

Summary of parameters

Above we discussed three sorting modes, here we summarize them for easy review:

variable effect
sort_buffer_size Specify the size of the sort buffer
max_length_for_sort_data When the total length of the fields in ORDER BY SQL is less than this value, use full field sorting, otherwise use rowid sorting
read_rnd_buffer_size Cache stored after sorting by primary key

How to tune ORDER BY

  • Use indexes to prevent filesort from happening

  • If filesort occurs and there is no way to avoid it, find a way to optimize filesort

filesort tuning

  • Increase sort_buffer_size to reduce/avoid temporary files and merge operations.

How do we know when to adjust sort_buffer_size? You only need to pay attention to two aspects:

- optimizer trace中num_initial_chunks_spilled_to_disk的值

If this value is very large, a large number of merge operations will inevitably occur. At this time, the value of sort_buffer_size can be adjusted.

- sort_merge_passes变量的值

It indicates the number of merges performed. Use show status like '%sort_merge_passes%'. If the returned result is large, it means it is time to adjust sort_buffer_size.

  • Increase the read_rnd_buffer_size to allow more results to be returned in a sequential IO.

This variable is the size stored in the cache area after sorting according to the primary key. Increasing this value can make a sequential IO return more results.

  • Set a reasonable value of max_length_for_sort_data

If this parameter is set too large, all kinds of sorting SQL will use full-field sorting, which may cause a lot of memory usage, and if you want to write temporary files, it will take up a lot of hard disk. And if the setting is too small, it will cause all kinds of sorting SQL to use rowid sorting, resulting in two IOs, and the performance may be worse.

- 一般不建议随意调整
  • Reduce max_sort_length (how many bytes are taken at most when sorting)

5. GROUP BY optimization

For now, MySQL has three ways to handle the GROUP BY statement, namely:

  • Loose Index Scan (Loose Index Scan)

  • Tight Index Scan

  • Temporary table

loose index scan

  • Return results without scanning all index keys that satisfy the condition

insert image description here

If the result Extra returned by explain shows Using index for group-by, it means that loose index scan is used.

Loose index scan usage conditions

  • Queries act on a single table

  • All fields specified by GROUP BY must conform to the leftmost prefix principle, and there are no other fields.

    • For example, if there is an index (c1, c2, c3), if GROUP BY c1, c2 can use loose index scanning; but GROUP BY c2, c3, GROUP BY c1, c2, c4 cannot.
  • If there is an aggregate function, only MIN()/MAX() is supported, and if MIN() and MAX() are used at the same time, they must act on the same field. The field on which the aggregation function works must be in the index and must follow the field specified by GROUP BY.

    • For example, if there is an index index(c1,c2,c3), SELECT c1,c2,MIN(c3),MAX(c3) FROM t1 GROUP BY c1,c2 can use loose index scanning.
  • If there are other parts in the query other than the columns specified by GROUP BY, they must appear in the form of constants.

    • Such as SELECT c1, c3 FROM t1 GROUP BY c1, c2: cannot be used

    • If you want to use it, you can add a WHERE condition: SELECT c1,c3 FROM t1 WHERE c3 = 3 GROUP BY c1,c2

  • The index must index the value of the entire field, not a prefix index.

    • For example, there is a field c1 VARCHAR(20), but if the field uses the prefix index index(c1(10)) instead of index(c1), loose index scanning cannot be used.

Certain aggregate function usages can use loose scan conditions

  • AVG(DISTINCT), SUM(DISTINCT), COUNT(DISTINCT), where AVG(DISTINCT), SUM(DISTINCT) can accept a single parameter; while COUNT(DISTINCT) can accept multiple parameters

  • There must be no GROUP BY or DISTINCT statements in the query.

  • All previous conditions for using a loose index scan are met.

Assuming index(c1,c2,c3) acts on table t1(c1,c2,c3,c4), the following SQL can use loose index scan:

SELECT COUNT(DISTINCT c1), SUM(DISTINCT c1) FROM t1;
SELECT COUNT(DISTINCE c1, c2), COUNT(DISTINCT c2, c1) FROM t1;

compact index scan

A compact index scan is relative to a loose index scan.

  • All index keys that meet the condition need to be scanned to return results.

  • Performance is generally worse than loose index scans, but generally acceptable.

Temporary tables

  • If there is no way to use the compact index scan, MySQL will read the required data, and create a temporary table, and use the temporary table to implement the GROUP BY operation.

insert image description here

Let's look at this sql. Since mergeId has no index, we will find through explain analysis that a full table scan is used, but Extra shows Using temporary to indicate that a temporary table is used.

GROUP BY tuning ideas

  • If GROUP BY uses a temporary table, the optimal solution is to create an index for your sql, and find a way to use loose index scan or compact index scan.

6. DISTINCT optimization

DISTINCT is very similar to GROUP BY, you can think of it as:

  • DISTINCT is after the GROUP BY operation, each group only takes one

  • Therefore, optimizing DISTINCT is the same as optimizing GROUP BY.

Try to avoid temporary tables, use loose index scan or compact index scan.

Percona Toolkit

Percona Toolkit is a very useful tool suite in the MySQL world.

tool list

  • pt-align : Align the output of other tools

  • pt-archiver : archive data to other tables or files

  • pt-config-diff : compare configuration files and variables

  • pt-deadlock-logger : log MySQL deadlocks

  • pt-diskstats : Interactive IO monitoring tool

  • pt-duplicate-key-checker : find duplicate indexes or foreign keys

  • pt-fifo-split : Simulate split files and output

  • pt-find: look up the table and execute the command

  • pt-fingerprint : Convert query to fingerprint

  • pt-fk-error-logger : log foreign key error messages

  • pt-heartbeat : Monitor MySQL replication latency

  • pt-index-usage : analyze queries through logs, and analyze how queries use indexes

  • pt-ioprofile: monitor process IO and print IO activity table

  • pt-kill : kill the qualified query

  • pt-mext : query the sample information of SHOW GLOBAL STATUS in parallel

  • pt-mongodb-query-digest : Reports query usage statistics by summarizing queries from the MongoDB query profiler

  • pt-mongodb-summary: collects information about the MongoDB cluster, it collects information from multiple sources to provide a summary of the cluster

  • pt-mysql-summary : Display MySQL-related summary information

  • pt-online-schema-change : Modify the table structure online. ALTER table structure without locking table

  • pt-pg-summary : Gather information about a PostgreSQL cluster

  • pt-pmp : Aggregate stack traces of GDB for a specified program

  • pt-query-digest : Analyze MySQL queries from logs, processlist and tcpdump

  • pt-secure-collect: collect, clean, package, encrypt data

  • pt-show-grants : Canonicalized printing of MySQL grants

  • pt-sift : Browse files created by pt-stalk

  • pt-slave-delay : Make a MySQL slave server lag behind its Master

  • pt-slave-find: Find and print the replication hierarchy tree of MySQL slave

  • pt-slave-restart : monitors the MySQL slave and restarts it after an error

  • pt-stalk : collects diagnostic data about MySQL when problems occur

  • pt-summary : display system summary information

  • pt-table-checksum: verify the consistency of MySQL master-slave replication

  • pt-table-sync : efficiently synchronize table data

  • pt-table-usage : analyze how the query uses the table

  • pt-upgrade : Verify that query results are the same on different servers

  • pt-variable-advisor: Analyze MySQL variables and make suggestions for possible problems

  • pt-visual-explain : format the result of explain into a tree display

Install

The way to install Percona Toolkit is different for different operating systems. If you are using Windows, you can give it up because it does not support Windows.

If you are using a Mac system, it is sufficient to use this command directly:

brew install percona-toolkit

Percona Toolkit uses

pt-query-digest

This is an analysis tool that can analyze queries in logs, processlist, and tcpdump to help us optimize SQL. At present, many companies use it to analyze slow query logs. This tool is very useful, and it is widely used in the industry. Also very popular.

Official documentation: https://www.percona.com/doc/percona-toolkit/3.0/pt-query-digest.html

  • effect

Analyze queries in logs (including binlog, general log, slowlog), processlist, and tcpdump

  • grammar
pt-query-digest [OPTIONS] [FILES] [DSN]
  • Common OPTIONS
--create-review-table  当使用--review参数把分析结果输出到表中时,如果没有表就自动创建。
--create-history-table  当使用--history参数把分析结果输出到表中时,如果没有表就自动创建。
--filter  对输入的慢查询按指定的字符串进行匹配过滤后再进行分析
--limit限制输出结果百分比或数量,默认值是20,即将最慢的20条语句输出,如果是50%则按总响应时间占比从大到小排序,输出到总和达到50%位置截止。
--host  MySQL服务器地址
--user  mysql用户名
--password  mysql用户密码
--history 将分析结果保存到表中,分析结果比较详细,下次再使用--history时,如果存在相同的语句,且查询所在的时间区间和历史表中的不同,则会记录到数据表中,可以通过查询同一CHECKSUM来比较某类型查询的历史变化。
--review 将分析结果保存到表中,这个分析只是对查询条件进行参数化,一个类型的查询一条记录,比较简单。当下次使用--review时,如果存在相同的语句分析,就不会记录到数据表中。
--output 分析结果输出类型,值可以是report(标准分析报告)、slowlog(Mysql slow log)、json、json-anon,一般使用report,以便于阅读。
--since 从什么时间开始分析,值为字符串,可以是指定的某个”yyyy-mm-dd [hh:mm:ss]”格式的时间点,也可以是简单的一个时间值:s()、h(小时)、m(分钟)、d(),如12h就表示从12小时前开始统计。
--until 截止时间,配合—since可以分析一段时间内的慢查询。
  • Example of use
# 展示slow.log中最慢的查询的报表
pt-query-digest slow.log

# 分析最近12小时内的查询
pt-query-digest --since=12h slow.log

# 分析指定范围内的查询
pt-query-digest slow.log --since '2020-06-20 00:00:00' --until '2020-06-25 00:00:00'

# 把slow.log中查询保存到query_history表
pt-query-digest --user=root --password=root123 --review h=localhost,D=test,t=query_history --create-review-table slow.log

# 连上localhost,并读取processlist,输出到slowlog
pt-query-digest --processlist h=localhost --user=root --password=root123 --interval=0.01 --output slowlog

# 利用tcpdump获取MySQL协议数据,然后产生最慢查询的报表
# tcpdump使用说明:https://blog.csdn.net/chinaltx/article/details/87469933
tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 > mysql.tcp.txt
pt-query-digest --type tcpdump mysql.tcp.txt

# 分析binlog
mysqlbinlog mysql-bin.000093 > mysql-bin000093.sql
pt-query-digest  --type=binlog mysql-bin000093.sql

# 分析general log
pt-query-digest  --type=genlog  localhost.log
  • Result visualization:

Official in Percona: https://www.percona.com/blog/2012/08/31/visualization-tools-for-pt-query-digest-tables/

  • Query Digest UI

  • Box Anemometer

  • These two tools have not been maintained for many years. If you are interested, you can also build and play with them. It is not recommended for production.

pt-index-usage

Official documentation: https://www.percona.com/doc/percona-toolkit/3.0/pt-index-usage.html

  • effect

Analyze queries through log files and analyze how queries use indexes

  • principle

    • Inventory all tables and indexes in the database, and compare the existing indexes in the database with the indexes used by the queries in the log

    • Run EXPLAIN for each query in the log (this step uses a separate database connection to inventory tables and execute EXPLAIN)

    • For unused indexes, show the deleted statement.

  • grammar

pt-index-usage [OPTIONS] [FILES]
  • Common OPTIONS
--drop      打印建议删除的索引,取值primary、unique、non-unique、all。 默认值non-unique,只会打印未使用的二级索引。
--database   只分析指定数据库的索引,多个库用,分隔
--tables     只分析指定表的索引,多张表用,分隔
--progress    打印执行速度
--host        指定MySQL地址,也可用-h指定
--post        指定MySQL端口
--user        指定MySQL用户名,也可用-u指定
--password    指定MySQL密码,也可用-p指定 
  • Example of use
# 读取slow.log,并连上localhost,去分析有哪些索引是可以删除的
pt-index-usage slow.log --user=root --password=root123 --host=localhost --port=3306

#读取slow.log,并连上localhost,只分析bookdb库中,有哪些索引是可以删除的
pt-index-usage slow.log --user=root  --password=root123 --host=localhost  --databases=bookdb
  • important point:

    • This tool uses a lot of MySQL resources, so when using this tool:

      • If conditions permit, try not to execute directly in the production environment, but in a database environment with the same table structure;

      • If it must be executed in a production environment, please avoid the peak period, such as executing in the early morning trough period

    • This tool is relatively slow in analyzing large files, and you need to pay attention to this when using it, and do some processing (for example, delete the leftover large slow query log first, and create a slow query log, and use pt-index- usage analysis)

    • Since pt-index-usage only scans slow queries, not all queries, it is possible that an index is not used in the slow query log, but it is still used (only the SQL using this index is not a slow query). therefore:

      • Before formal deletion, you should review it first to ensure that the index can be deleted before operating to avoid problems.

pt-variable-advisor

Official documentation: https://www.percona.com/doc/percona-toolkit/3.0/pt-variable-advisor.html

  • effect

Analyzes MySQL variables and makes recommendations on possible problems.

  • principle

Execute SHOW VARIABLES, analyze which variables have unreasonable value settings, and give suggestions.

  • grammar
pt-variable-advisor [OPTIONS] [DSN]
  • Common OPTIONS
--source-of-variable    指定变量来源,可选mysql/none或者文件
--user                  指定MySQL用户名 可用-u指定
--password              指定MySQL密码,也可用-p指定

pt-online-schema-change

Official documentation: https://www.percona.com/doc/percona-toolkit/3.0/pt-online-schema-change.html

Starting from MySQL5.6, the online r function has been supported, and pt-online-schema-change has become weaker and weaker.

有关Online DDL,可详见 [__Online DDL Operations__](https://dev.mysql.com/doc/refman/5.7/en/innodb-online-ddl-operations.html)
有关online DDL和pt-online-schema-change之间的对比详见《[__MySQL ONLINE DDL 和PT-ONLINE-SCHEMA-CHANGE对比__](http://blog.itpub.net/27067062/viewspace-2147452/)》
  • effect

Modify the table structure online without locking the ALTER table structure

  • principle

    • Create an identical table, the table name is generally _new suffix

    • Alter table operations are performed on the new table.

    • Add three triggers to the original table, corresponding to DELETE/UPDATE/INSERT operations, and execute the statements to be executed in the original table in the new table.

    • Copy the data from the original table to the new table

    • Use an atomic RENAME TABLE operation to simultaneously rename the original table and the new table, and when this is done, drop the original table.

  • grammar

pt-online-schema-change [OPTIONS] DSN
  • Common OPTIONS
–-dry-run	  创建和修改新表,但不会创建触发器、复制数据、和替换原表。并不真正执行,与--print配合查看执行细节
--execute	如果指定该选项,则会修改表结构,否则只会做一些安全检查
--charset   指定编码
--alter     修改表结构的语句(其实就是你的alert table语句,去掉alert table后剩下的部分),多条语句使用,分隔。该选项有一些限制,详见:https://www.percona.com/doc/percona-toolkit/3.0/pt-online-schema-change.html#cmdoption-pt-online-schema-change-alter
--no-version-check    是否检查版本
--alter-foreign-keys-method 处理带有外键约束的表,以保证他们可以引用到正确的表。取值:auto(自动选择最佳策略)、rebuild_constraints(适用于删除和重新添加引用新表的外键约束)、drop_swap(禁用外键检查,然后再重命名新表之前将其删除)、none(无)

Guess you like

Origin blog.csdn.net/qq_45455361/article/details/123033908