MySQL performance optimization ideas and tools

MySQL performance optimization ideas and tools

1. Optimization ideas

As an architect or developer, what is your thinking when it comes to database performance optimization?
Or to be more specific, if you encounter this question during the interview: from which dimensions will you optimize the database, how would you answer it?

When we talk about performance tuning, most of the time what we want to achieve is to make our queries faster. A query action is composed of many links, and each link will consume time.
If we want to reduce the time consumed by the query, we must start from every link.
insert image description here

2. Connection - configuration optimization

The first link is that the client connects to the server. The connection may be caused by insufficient server connections and the application cannot obtain connections. For example, a Mysql: error1040: Too many connections error was reported.

We can solve the problem of insufficient connections from two aspects:
1. From the server side, we can increase the number of available connections on the server side.
If there are multiple applications or many requests to access the database at the same time and the number of connections is not enough, we can:
(1) Modify the configuration parameters to increase the number of available connections and modify the size of max_connections:

show variables like 'max_connections'; -- 修改最大连接数,当有多个应用连接的时候

(2) Alternatively, or release inactive connections in a timely manner. The default timeout for both interactive and non-interactive clients is 28800
seconds, 8 hours, and we can reduce this value.

show global variables like 'wait_timeout'; --及时释放不活动的连接,注意不要释放连接池还
在使用的连接


2. From the client side, the number of connections obtained from the server can be reduced. If we don't want to create a new connection every time we execute SQL , we can introduce a connection pool at this time to realize connection reuse.

The use level of the connection pool
is the ORM level (MyBatis comes with a connection pool);
or use a dedicated connection pool tool (Ali's Druid, the default connection pool Hikari of Spring Boot 2.x version, the old DBCP and C3P0).

Here we talked about optimizing the database from the level of database configuration. Regardless of the configuration of the database itself or the
configuration of the operating system where the database service is installed, the ultimate goal of optimizing the configuration is to better utilize the performance of the hardware itself, including CPU, memory, disk, and network.

In different hardware environments, the configuration of operating system and MySQL parameters are different, and there is no standard configuration.

There are many configuration parameters in MySQL, including the configuration of various switches and values. Most of the parameters provide a default value, such as the default buffer_pool_size, the default page size, the number of concurrent threads of InnoDB, and so on.


These default configurations can meet the needs of most situations, unless there are special needs, modify it after knowing the meaning of the parameters . The work of modifying the configuration is generally done by a professional DBA.

In addition to reasonably setting the number of connections on the server side and the size of the connection pool on the client side, we can introduce caching.

3. Cache-architecture optimization

3.1 Caching

When the concurrency of the application system is very large, if there is no cache, it will cause two problems: on the one hand, it will bring
a lot of pressure to the database. On the other hand, from the application level, the speed of operating data will also be affected.

We can solve this problem with a third-party caching service, such as Redis.
Running an independent cache service is an optimization at the architectural level.

In order to reduce the reading and writing pressure of a single database server, we can also take other optimization measures at the architectural level.

3.2 Master-slave replication

If a single database service cannot meet the access requirements, then we can do a database cluster solution.
A cluster will inevitably face a problem, that is, the problem of data consistency between different nodes. If multiple database
nodes are read and written at the same time, how to keep the data of all nodes consistent?

At this time, we need to use the replication technology (replication). The replicated node is called the master, and the replicated node is called the slave.

How is master-slave replication implemented? As we said in the Mysql architecture and internal modules , the update statement will record binlog, which is a logical log.

With this binlog, the slave server will obtain the binlog file of the master server, then parse the SQL statement in it, and
execute it on the slave server to keep the master-slave data consistent.

There are three threads involved, connecting to the master to obtain the binlog, and parsing the binlog to write to the relay log. This thread is called the
I/O thread.

There is a log dump thread on the master node, which is used to send binlog to slave.
The SQL thread of the slave library is used to read the relay log and write the data to the database.

These are the three threads involved in master-slave replication.
insert image description here

After implementing the master-slave replication scheme, we only write data to the master node, and the read requests can be distributed to the slave nodes. We
call this scheme read-write separation.

insert image description here

Read-write separation can reduce the access pressure of the database server to a certain extent, but special attention needs to be paid to the consistency of master-slave data.

After we have done master-slave replication, if the data stored in a single master node or a single table is too large, for example, a table
has hundreds of millions of data, the query performance of a single table will still decline. We need to further improve the single table The data of the database node is split, which is sub-database and sub-table.

3.3 Sub-database and sub-table

Vertical sub-library to reduce concurrency pressure. Horizontally split tables to solve storage bottlenecks.
The method of vertical database division splits a database into different databases according to business:

insert image description here
insert image description here

The method of horizontal database and table division distributes the data of a single table to multiple databases according to certain rules.

insert image description here

The above is the optimization at the architectural level, which can use cache, master-slave, sub-database and sub-table.

The third link:
parser, lexical and grammatical analysis, mainly to ensure the correctness of the statement, and there is no problem if the statement does not make mistakes. Handled by Sever itself.
Step 4: Optimizer

Four optimizers - SQL statement analysis and optimization

The optimizer is to analyze our SQL statement and generate an execution plan.
Question: When we are working on a project, we sometimes receive an email from the DBA, which lists several time-consuming query
statements on our project. Let us optimize them. Where do these statements come from?

Our service layer executes so many SQL statements every day, how does it know which SQL statements are slower?
In the first step, we need to record the SQL execution.

4.1 slow query log slow query log

https://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html

4.1.1 Turn on the slow log switch

Because there is a price to turn on the slow query log (same as bin log and optimizer-trace), it is turned off by default:

show variables like 'slow_query%';

insert image description here

In addition to this switch, there is also a parameter that controls how long the execution time of the SQL is recorded in the slow log. The default is 10 seconds.

showvariableslike'%long_query%';

Parameters can be dynamically modified directly (it will fail after restarting).

set @@global.slow_query_log=1;--1开启,0关闭,重启后失效
set @@global.long_query_time=3;--mysql默认的慢查询时间是10秒,另开一个窗口后才会查到最新值
show variables like'%long_query%';
show variables like'%slow_query%';

Or modify the configuration file my.cnf.
The following configuration defines the switch of the slow query log, the time of the slow query, and the storage path of the log file.

slow_query_log=ON
long_query_time=2
slow_query_log_file=/var/lib/mysql/localhost-slow.log

Simulate slow query:

select sleep(10);

Query the 5 million data of the user_innodb table (check if there is no index).

SELECT * FROM `user_innodb` where phone = '136';

4.1.2 Slow log analysis

1. Log content

show global status like 'slow_queries'; -- 查看有多少慢查询
show variables like '%slow_query%'; -- 获取慢日志目录
cat /var/lib/mysql/ localhost-slow.log

insert image description here

2. mysqldumpslow
https://dev.mysql.com/doc/refman/5.7/en/mysqldumpslow.html
MySQL provides the tool mysqldumpslow in the bin directory of MySQL.

mysqldumpslow --help

For example: 10 slow SQL statements with the most query time:

mysqldumpslow -s t -t 10 -g 'select' /var/lib/mysql/localhost-slow.log

insert image description here

Count represents how many times the SQL has been executed;
Time represents the execution time, and the parentheses are the cumulative time;
Lock represents the locking time, and the parentheses are the cumulative time;
Rows represents the number of records returned, and the parentheses are the cumulative time.
In addition to the slow query log, there is also a SHOW PROFILE tool that can be used

4.2 SHOW PROFILE

https://dev.mysql.com/doc/refman/5.7/en/show-profile.html
SHOW PROFILE is contributed to the MySQL community by Jeremy Cole, a senior architect at Google. You can view
the resources used when executing SQL statements, such as CPU and IO consumption.
Enter help profile in SQL to get detailed help information

4.2.1 Check if it is enabled

select @@profiling;
set @@profiling=1;

4.2.2 View profile statistics

(command with an s at the end)

show profiles;

insert image description here

View the execution details of the last SQL to find out the link that takes more time (without s).

show profile;

insert image description here

6.2E-5, move the decimal point to the left by 5 places, representing 0.000062 seconds.
You can also view the execution details according to the ID, followed by for query + ID.

show profile for query 1;

In addition to the slow log and show profile, if you want to analyze the slow SQL executed in the current database, you can also
analyze it by viewing the running thread status, server running information, and storage engine information.

4.2.3 Other system commands

show processlist running thread
https://dev.mysql.com/doc/refman/5.7/en/show-processlist.html

show processlist;

This is a very important command to display user running threads. Threads can be killed based on the id number.
You can also look up the table, the effect is the same: (you can group order by)

select * from information_schema.processlist;

insert image description here

show status Server running status
Description: https://dev.mysql.com/doc/refman/5.7/en/show-status.html
Detailed parameters: https://dev.mysql.com/doc/refman/5.7/en /server-status-variables.html
SHOW STATUS is used to view the running status of the MySQL server (it will be cleared after restarting), with session and global scopes
, format: parameter-value.
You can use like with wildcards to filter.

SHOW GLOBAL STATUS LIKE 'com_select'; -- 查看 select 次数

show engine storage engine running information
https://dev.mysql.com/doc/refman/5.7/en/show-engine.html
https://dev.mysql.com/doc/refman/5.7/en/innodb-standard -monitor.html
show engine is used to display the current running information of the storage engine, including table lock and row lock information held by the transaction; transaction lock waiting
status; thread semaphore waiting; file IO request; buffer pool statistics.
For example:

show engine innodb status;

If you need to output the monitoring information to the error log (once every 15 seconds), you can enable the output.

show variables like 'innodb_status_output%'; -- 开启输出:
SET GLOBAL innodb_status_output=ON;
SET GLOBAL innodb_status_output_locks=ON;

We now know so many commands for analyzing server status, storage engine status, and thread running information. If you were asked to write
a database monitoring system, what would you do?

In fact, many open source slow query log monitoring tools, their principles are actually read system variables and status.

Now that we know which SQLs are slow, why? Where is the slowness?

MySQL provides an execution plan tool (we mentioned in the architecture that the optimizer finally generates an execution plan
), and other databases, such as Oracle, also have similar functions.

Through EXPLAIN, we can simulate the process of the optimizer executing the SQL query statement to know how MySQL handles a
SQL statement. In this way we can analyze the performance bottlenecks of statements or tables.

Before MySQL 5.6.3, only SELECT could be analyzed; after MySQL5.6.3, update, delete and insert can be analyzed.

4.3 EXPLAIN execution plan

https://dev.mysql.com/doc/refman/5.7/en/explain-output.html
We first create three tables. One class table, one teacher table, one teacher contact table (without any index).

DROP TABLE
IF
EXISTS course;
CREATE TABLE `course` ( `cid` INT ( 3 ) DEFAULT NULL, `cname` VARCHAR ( 20 )
DEFAULT NULL, `tid` INT ( 3 ) DEFAULT NULL ) ENGINE = INNODB DEFAULT CHARSET =
utf8mb4;
DROP TABLE
IF
EXISTS teacher;
CREATE TABLE `teacher` ( `tid` INT ( 3 ) DEFAULT NULL, `tname` VARCHAR ( 20 )
DEFAULT NULL, `tcid` INT ( 3 ) DEFAULT NULL ) ENGINE = INNODB DEFAULT CHARSET =
utf8mb4;
DROP TABLE
IF
EXISTS teacher_contact;
CREATE TABLE `teacher_contact` ( `tcid` INT ( 3 ) DEFAULT NULL, `phone` VARCHAR
( 200 ) DEFAULT NULL ) ENGINE = INNODB DEFAULT CHARSET = utf8mb4;
INSERT INTO `course`
VALUES
( '1', 'mysql', '1' );
INSERT INTO `course`
VALUES
( '2', 'jvm', '1' );
INSERT INTO `course`
VALUES
( '3', 'juc', '2' );
INSERT INTO `course`
VALUES
( '4', 'spring', '3' );
INSERT INTO `teacher`
VALUES
( '1', 'bobo', '1' );
INSERT INTO `teacher`
VALUES
( '2', 'jim', '2' );
INSERT INTO `teacher`
VALUES
( '3', 'dahai', '3' );
INSERT INTO `teacher_contact`
VALUES
( '1', '13688888888' );
INSERT INTO `teacher_contact`
VALUES
( '2', '18166669999' );
INSERT INTO `teacher_contact`
VALUES
( '3', '17722225555' );

The result of explain has many fields, let's analyze it in detail.
First confirm the environment:

select version();
show variables like '%engine%';

4.3.1 id

id is the query sequence number.

When the id values ​​are different , the one with the larger id value is queried first (bigger first and then smaller).

-- 查询 mysql 课程的老师手机号
EXPLAIN SELECT
tc.phone
FROM
teacher_contact tc
WHERE
tcid = ( SELECT tcid FROM teacher t WHERE t.tid = ( SELECT c.tid FROM course
c WHERE c.cname = 'mysql' ) );

Query sequence: course c——teacher t——teacher_contact tc.
insert image description here

First check the class schedule, then check the teacher list, and finally check the teacher contact information list. Subqueries can only be performed in this way, and the
outer query can only be performed after the inner result is obtained.

id values ​​are the same (from top to bottom)

-- 查询课程 ID 为 2,或者联系表 ID 为 3 的老师
EXPLAIN SELECT
t.tname,
c.cname,
tc.phone
FROM
teacher t,
course c,
teacher_contact tc
WHERE
t.tid = c.tid
AND t.tcid = tc.tcid
AND ( c.cid = 2 OR tc.tcid = 3 );

insert image description here

When the id values ​​are the same, the table query sequence is executed from top to bottom. For example, the id of this query is all 1, and the query order is
teacher t (3 entries)——course c (4 entries)——teacher_contact tc (3 entries).

Both the same and different
If the IDs are both the same and different, it means that the IDs with different IDs are big and then small, and the IDs with the same ID are from top to bottom.

4.3.2 select type query type

Not all are listed here (Others: DEPENDENT UNION, DEPENDENT SUBQUERY, MATERIALIZED, UNCACHEABLE SUBQUERY, UNCACHEABLE UNION).
Some common query types are listed below:
SIMPLE
simple query, does not contain subquery, does not contain association query union.

EXPLAIN SELECT * FROM teacher;

insert image description here

Let's look at another example that includes a subquery:

-- 查询 mysql 课程的老师手机号
EXPLAIN SELECT
tc.phone
FROM
teacher_contact tc
WHERE
tcid = ( SELECT tcid FROM teacher t WHERE t.tid = ( SELECT c.tid FROM course
c WHERE c.cname = 'mysql' ) );

insert image description here

PRIMARY
subquery The main query in the SQL statement, that is, the outermost query. All inner queries in a
SUBQUERY subquery are of type SUBQUERY. DERIVED derived query, which means that a temporary table will be used before the final query result is obtained. For example:


-- 查询 ID 为 1 或 2 的老师教授的课程
EXPLAIN SELECT
cr.cname
FROM
( SELECT * FROM course WHERE tid = 1 UNION SELECT * FROM course WHERE tid =
2 ) cr;

insert image description here

For the associated query, execute the table on the right (UNION) first, and then execute the table on the left. The type is DERIVED
UNION
and UNION query is used. Same as above example.
UNION RESULT
mainly shows which tables exist between UNION queries. <union2,3> represents the query with id=2 and id=3 exists UNION. Same as
above example.

4.3.3 type connection type

https://dev.mysql.com/doc/refman/5.7/en/explain-output.html#explain-join-types
Among all the connection types, the top one is the best, and the lower one is the worse.
Among the commonly used link types: system > const > eq_ref > ref > range > index > all
not listed here (others: fulltext, ref_or_null, index_merger, unique_subquery, index_subquery).
The above access types can use indexes except all.

The const
primary key index or unique index can only find the SQL of one data.

DROP TABLE
IF
EXISTS single_data;
CREATE TABLE single_data ( id INT ( 3 ) PRIMARY KEY, content VARCHAR ( 20 ) );
INSERT INTO single_data
VALUES
( 1, 'a' );
EXPLAIN SELECT
*
FROM
single_data a
WHERE
id = 1;

system
system is a special case of const, and only one line satisfies the condition. For example: a system table with only one data.

EXPLAIN SELECT * FROM mysql.proxies_priv;

insert image description here

eq_ref
usually appears in the join query of multiple tables, which means that for each result of the previous table, only one row of the result of the latter table can be matched. Generally, it is
a unique index query (UNIQUE or PRIMARY KEY).
eq_ref is the best access type besides const.

First delete the redundant data in the teacher table, there are 3 data in teacher_contact, and 3 data in the teacher table.

DELETE
FROM
teacher
WHERE
tid IN ( 4, 5, 6 );
COMMIT;
-- 备份
INSERT INTO `teacher`
VALUES
( 4, 'jim', 4 );
INSERT INTO `teacher`
VALUES
( 5, 'bobo', 5 );
INSERT INTO `teacher`
VALUES
( 6, 'seven', 6 );
COMMIT;

Create a primary key index for tcid (the first field) of the teacher_contact table.

-- ALTER TABLE teacher_contact DROP PRIMARY KEY;
ALTER TABLE teacher_contact ADD PRIMARY KEY(tcid);

Create a normal index for the tcid (third field) of the teacher table.

-- ALTER TABLE teacher DROP INDEX idx_tcid;
ALTER TABLE teacher ADD INDEX idx_tcid (tcid);

Execute the following SQL statement:

select t.tcid from teacher t,teacher_contact tc where t.tcid = tc.tcid;

insert image description here

The execution plan at this time (teacher_contact table is eq_ref):
insert image description here

Summary:
The above three systems, const, eq_ref, are all met but not sought after, basically it is difficult to optimize to this state.
The ref
query uses a non-unique index, or the association operation only uses the leftmost prefix of the index.
For example: using a normal index query on tcid:

explain SELECT * FROM teacher where tcid = 3;

range
Index range scan.
If where is followed by between and or < or > or >= or <= or in, the type is range.
If you do not use the index, it must be a full table scan (ALL), so add the ordinary index first.

-- ALTER TABLE teacher DROP INDEX idx_tid;
ALTER TABLE teacher ADD INDEX idx_tid (tid);

Executing a range query (with a normal index on the field):

EXPLAIN SELECT * FROM teacher t WHERE t.tid <3;
-- 或
EXPLAIN SELECT * FROM teacher t WHERE tid BETWEEN 1 AND 2;

insert image description here

IN query is also range (the field has a primary key index)

EXPLAIN SELECT * FROM teacher_contact t WHERE tcid in (1,2,3);

insert image description here

index
Full Index Scan, query the data in all indexes (faster than no index).

EXPLAIN SELECT tid FROM teacher;

insert image description here

all
Full Table Scan, if there is no index or no index is used, the type is ALL. Represents a full table scan.
Summary:
Generally speaking, it is necessary to ensure that the query reaches at least the range level, and it is best to reach the ref.
Both ALL (full table scan) and index (query all indexes) need to be optimized.

4.3.4 possible_key、key

Potentially used indexes and actually used indexes. If it is NULL, it means that no index is used.
There can be one or more possible_keys, and the possible use of an index does not necessarily mean the use of an index.
Conversely, possible_key is empty, is it possible that key has a value?
Create a joint index on the table:

ALTER TABLE user_innodb DROP INDEX comidx_name_phone;
ALTER TABLE user_innodb add INDEX comidx_name_phone (name,phone);

Execution plan (changing to select name can also use the index):

explain select phone from user_innodb where phone='126';

insert image description here

Conclusion: It is possible (here is the case of covering indexes).
If the analysis finds that the index is not used, check the SQL or create an index.

4.3.5 key_len

The length of the index (number of bytes used). It is related to the type and length of the index field.
There is a joint index on the table: KEY comidx_name_phone ( name , phone )

explain select * from user_innodb where name ='jim';

4.3.6 rows

MySQL considers how many rows to scan to return the requested data, which is an estimate. In general, the fewer rows the better.

4.3.7 filtered

This field indicates the ratio of the number of records that satisfy the query left after the data returned by the storage engine is filtered by the server layer. It is
a percentage.

4.3.8 ref

Which column or constant to use along with the index to filter data from the table.

4.3.9 Extra

Additional information given by the execution plan.
The using index
uses the covering index and does not need to return to the table.

EXPLAIN SELECT tid FROM teacher ;

using where
uses where filtering, which means that not all records returned by the storage engine meet the query conditions, and need to be filtered at the server layer
(it has nothing to do with whether to use indexes).

EXPLAIN select * from user_innodb where phone ='13866667777';

insert image description here

using filesort
cannot use the index to sort, and uses an additional sort (has nothing to do with disk or file). Requires optimization. (Prerequisite for composite index
)

ALTER TABLE user_innodb DROP INDEX comidx_name_phone;
ALTER TABLE user_innodb add INDEX comidx_name_phone (name,phone);
EXPLAIN select * from user_innodb where name ='jim' order by id;

(caused by order by id)
insert image description here

using temporary
uses a temporary table. For example (the following are not all cases):
1. distinct non-indexed columns

EXPLAIN select DISTINCT(tid) from teacher t;

2. group by non-indexed columns

EXPLAIN select tname from teacher group by tname;

3. When using join, group any column

EXPLAIN select t.tid from teacher t join course c on t.tid = c.tid group by
t.tid;

Requires optimization, such as creating compound indexes.

To sum up:
Simulate the process of the optimizer executing the SQL query statement to know how MySQL handles a SQL statement. In this way
we can analyze the performance bottleneck of the statement or table.
After analyzing the problem, it is the specific optimization of the SQL statement.

4.4 SQL and index optimization

The goal of optimizing SQL statements is to use indexes most of the time.
In the mysql index principle and use, it also talks about the principle of index creation, and under what circumstances the index will be used and under what circumstances the index will not be used.

5. Storage engine

5.1 Selection of storage engine

Choose different storage engines for different business tables, for example: use MyISAM to query and insert business tables with many operations. Use Memory for temporary data
. InnoDB is used for regular concurrent large-update tables. .

5.2 Field definition

Principle: Use the smallest data type that can store data correctly.
Select the appropriate field type for each column.

5.2.1 Integer types

TINYINT 1 byte
SMALLINT 2 bytes
MEDIUMINT 3 bytes
INT, INTEGER 4 bytes
BIGINT 8 bytes

There are 8 types of INT, and the maximum storage range of different types is different.
gender? Use TINYINT, because ENUM is also integer storage.

5.2.2 Character type

In the case of variable length, varchar saves more space, but for a varchar field, one byte is needed to record the length.
Use char for fixed length, not varchar.

5.2.3 Do not use foreign keys, triggers, views

Reduces readability;
affects database performance, the calculation should be handed over to the program, and the database should concentrate on storage;
the integrity of the data should be checked in the program.

5.2.4 Large file storage

Don't use the database to store pictures (such as base64 encoding) or large files;
put the files on the NAS, the database only needs to store the URI (relative path), and configure the NAS server address in the application.

5.2.5 Table Split or Field Redundancy

Split out uncommonly used fields to avoid too many columns and too much data.
For example, in a business system, it is necessary to record all received and sent messages. The messages are in XML format and stored in blob or text to track and judge
duplication. A table can be created to store messages.

Six Summary: Optimizing the System

insert image description here

In addition to optimizing code, SQL statements, table definitions, schemas, and configurations, optimization at the business level cannot be ignored. Give two examples
:
1) On Double Eleven in a certain year, why do you have a bonus activity for recharging to Yu’e Bao and the balance, such as recharging 300 and getting 50?
Because using the balance or Yu'e Bao to pay is recorded in the local or internal database, and using the bank card to pay, you need to call the interface, and it is
definitely faster to operate the internal database.

2) On Double Eleven last year, why was it forbidden to inquire about bills other than today in the early hours of the morning?
This is a downgrade measure to ensure the current core business.

3) During the Double Eleven in recent years, why is the price of the Double Eleven day already available a week in advance?
Pre-sale diversion.

4) The Public Security Bureau's query of the same name does not return the result in real time (not real-time query database), but pushes it through the official account.
At the application level, there are also many other solutions to optimize to reduce the pressure on the database as much as possible, such as current limiting, or the introduction of
MQ peak shaving, and so on.

Why is MySQL also used? Some companies can withstand tens of millions of concurrency, while others cannot handle hundreds of concurrency.
The key lies in how to use it. Therefore, the slowness of using the database does not mean that the database itself is slow, and sometimes it needs to be optimized to the upper layer.

Of course, if the relational database cannot solve the problem, we may need to use a search engine or a big data solution. Not
all data must be stored in a relational database.

Guess you like

Origin blog.csdn.net/lx9876lx/article/details/129134904