MySQL learning advanced 2-query interception analysis

Query interception analysis

1. Query optimization

1. The small table drives the big table

1. Optimization analysis steps for mysql

(1) Observe, run at least one day to see the slow SQL production.

(2) Open the slow query log and set the threshold. For example, if it exceeds 5 seconds, it is slow SQL, and grab it.

(3) explain+ slow SQL analysis

(4)show profile

(5) Operation and maintenance manager or DBA, perform parameter tuning of SQL database server

to sum up

(1) Query optimization

(2) Start and capture slow query

(3) explain+ slow SQL analysis

(4) show profile queries the execution details and life cycle of SQL in the Mysql server.

(5) Parameter tuning of SQL database server.

2、IN or EXISTS

(1) Optimization principle: Small tables drive large tables, that is, small data sets drive large data sets, similar to the following nested loop:

// 只需要连接5次,每一次都可以执行1000次。小嵌套驱动大语句
for(int i = 5;....){
    
    
	for(int i = 1000;....){
    
    
		....
	}
}

//需要连接的次数为1000次,浪费时间
for(int i = 1000;....){
    
    
	for(int i = 5;....){
    
    
		....
	}
}

(2)IN

When the data set of table B must be smaller than the data set of table A, using in is better than exists.

select * from A where id in(select id from B);
-- 等价于下面的语句
for select id from B  -- 小表
for select * from A where A.id = B.id -- 大表

(3)EXISTS

When the data set of table A must be smaller than the data set of table B, using exists is better than in.

-- select 1(常量就可以,可以为3或者'x'等)。select 1可以换成任意东西,mysql会自动忽略select后面的内容。
select * from A where exists(select 1 from B where A.id = B.id);
-- 等价于下面的语句
for select * from A
for select * from B where A.id = B.id

The exists analysis is as follows:

select _ from table where exists(subquery);

Put the data of the main query into the sub-query for conditional verification, and determine whether the data result of the main query can be retained according to the verification result.

image-20200829202119559

2. Order by keyword optimization

image-20200829202320394

1. Create table SQL

create table tb1A(
age int,
birth timestamp not null
);

-- 插入数据
insert into tb1A(age,birth) values(22,now());
insert into tb1A(age,birth) values(23,now());
insert into tb1A(age,birth) values(24,now());

-- 创建索引
create index idx_A_ageBirth on tb1A(age,birth);

2. Optimization

1. Execute the following statement

(1) When the index is used and the output is normal

EXPLAIN SELECT * FROM tb1A WHERE age>20 ORDER BY age;
EXPLAIN SELECT * FROM tb1A WHERE age>20 ORDER BY age,birth;
image-20200829203305696

(2) The index is used, but filesort occurs

EXPLAIN SELECT * FROM tb1A WHERE age>20 ORDER BY birth;
EXPLAIN SELECT * FROM tb1A WHERE age>20 ORDER BY birth,age;
image-20200829203431800

(3) SQL supports scanning in two ways

FileSort and Index, among which Index is more efficient, means that MySQL scans the index itself to complete the sorting. The FileSort method is less efficient. Order by satisfies two conditions and will support Index sorting:

order by 语句使用索引最左前列
使用where子句与order by子句条件列组合满足索引最左前列

2. Single/dual way sorting

(1) Single-way sorting

One-way sorting (after mysql4.1) reads all the columns required by the query from the disk, sorts them in the buffer according to the order by column, and then scans the sorted list for output. It is faster and avoids the first Read the data a second time. And it turns random IO into sequential IO, but it uses more space because it saves each row in memory.

Since the single-channel is backward, it is better than the dual-channel in general, and its problems are as follows:

image-20200829204131189

Optimization Strategy:

增大sort_buffer_size参数设置
增大max_length_for_sort_data参数设置

(2) Two-way sorting

image-20200829204003218

3. Improve the speed of order by

(1) When order by select * is a taboo, only the fields required by Query, this is very important. The impact here is:

(A) When the sum of the field sizes of Query is less than maxlength-for-sort-data and the sorting field is not of TEXTIBLOB type, the improved algorithm-one single! | way sorting will be used, otherwise the old algorithm-multiple way sorting will be used.

(B) The data of the two algorithms may exceed the capacity of the sort buffer. After that, tmp files will be created for merge sorting, resulting in multiple I/Os, but the risk of using a single-channel sorting algorithm is greater, so Increase sort-buffer-size.

(2) Try to increase the sort buffer size. No matter which algorithm is used, increasing this parameter will improve efficiency. Of course, it should be improved according to the system's ability, because this parameter is specific to each process.

(3) Try to increase max-length-for-sort data | Increasing this parameter will increase the probability of using the improved algorithm. But if it is set too high, the probability that the total data capacity exceeds the sort buffer size will increase. The obvious symptoms are high disk 1/O activity and low processor usage.

4. Small summary

image-20200829204525863

3. Group by keyword optimization

Similar to order by.

image-20200829204601369

2. Slow query log

1 Introduction

1 Overview

MySQL's slow query log is a log record provided by MySQL. It is used to record the statements whose response time exceeds the threshold in MySQL. Specifically long_query_time, the SQL whose running time exceeds the value will be recorded in the slow query log.

Specifically long_query_time, the SQL whose running time exceeds the value will be recorded in the slow query log. long_query_timeThe default value is 10, which means to run a statement longer than 10 seconds.

It is up to him to check which SQL exceeds our maximum endurance time value. For example, if a SQL executes for more than 5 seconds, we even if it is slow SQL, we hope to collect more than 5 seconds of SQL and perform a comprehensive analysis in conjunction with the previous explain.

2. Description

By default, the MySQL database does not enable the slow query log, and we need to manually set this parameter.

Of course, if it is not required for tuning, it is generally not recommended to enable this parameter, because turning on the slow query log will more or less bring about a certain performance impact. The slow query log supports writing log records to a file.

2. Operation

1. Check the opening situation

show VARIABLES LIKE '%slow_query_log%';

-- 开启。开启慢查询日志只对当前数据库生效,如果MySQL重启后则会失
效。
set global slow_query_log=1;
image-20200829205250189

It is generally not recommended to enable the slow query log permanently. If you must do this, you can only modify the configuration file my.cnf:

image-20200829205400637

2. Check the threshold of slow time

show VARIABLES LIKE '%long_query_time%';

The default long_query_time is 10 seconds. That is, sql longer than 10 seconds will be recorded in the slow query log. Not greater than or equal to. When the running time of the SQL statement is greater than the time set by long_query_time, it will be recorded in the log.

For testing, we set the threshold time for the SQL statement execution to 3S:

set global long_query_time=3;

At this point, although the variable is modified long_query_time, long_query_timethe value of the variable is still 10 in the query .

image-20200829205819044

At this point, we need to reopen a session, or use the following command to query:

show GLOBAL VARIABLES LIKE '%long_query_time%';

Query the slow log file, you can see the following:

image-20200829211304163

Note:

For the above log, I need to execute the following SQL statement,

set global long_query_time=3;

Re-disconnect the current FinalShell connection, and then execute the following statement to see:

select sleep(4);

3. Set the threshold time to take effect permanently

-- [mysqld]下配置
slow_query_log=1;
slow_query_log_file=/usr/local/mysql/data/hadoop101-slow.log;
long_query_time=3;
log_output=FILE;

4. Log analysis tool-mysqldumpslow

In a production environment, if you want to manually analyze logs, find and analyze SQL, it is a manual task. MySQL provides a log analysis tool mysqldumpslow.

(1) View help information

mysqldumpslow --help
image-20200829211937491

(2) Parameters

image-20200829211958271

(3) Common reference for common command work

image-20200829212020983

3. Batch data script

1 Introduction

The difference between functions and stored procedures: functions have return values, stored procedures do not.

image-20200829212117018

2. Operation-insert 1000w data into the table

image-20200829212156934

1. Create table SQL

-- 新建库
create database bigData;
use bigData;


-- 部门表
create table dept(
id int unsigned primary key auto_increment,
deptno mediumint unsigned not null default 0,
dname varchar(20) not null default "",
loc varchar(13) not null default ""
)engine=innodb default charset=utf8;

-- 员工表
create table emp(
id int unsigned primary key auto_increment, 
empno mediumint unsigned not null default 0, /*编号*/
ename varchar(20) not null default "", /*名字*/
job varchar(9) not null default "",  /*工作*/
mgr mediumint unsigned not null default 0, /*上级编号*/
hiredate date not null, /*入职时间*/
sal decimal(7,2) not null, /*薪水*/
comm decimal(7,2) not null, /*红利*/
deptno  mediumint unsigned not null default 0,/*部门编号*/
)engine=innodb default charset=utf8;

2. Set the parameter log_bin_trust_function_creators

image-20200829213405835
show variables like 'log_bin_trust_function_creators';
set global log_bin_trust_function_creators = 1;
image-20200829213538501

3. Create a function to ensure that each piece of data is different

(1) Randomly generate a string

-- delimiter 定界符
delimiter $$
 -- ran_string 函数名,返回值类型 varchar(255) 
CREATE FUNCTION `rand_string`(n int) RETURNS varchar(255) 
BEGIN
	-- 定义变量chars_str
  DECLARE chars_str VARCHAR(100) DEFAULT
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
  -- 声明返回类型
  DECLARE return_str VARCHAR(255) DEFAULT '';
  DECLARE i INT DEFAULT 0;
  WHILE i < n DO
  -- 从 chars_str 中随机取一个字符;CONCAT函数将多个字符串连接成一个字符串。
  SET return_str = CONCAT(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));
  SET i = i + 1;
	END WHILE;
RETURN return_str;
END  $$

(2) Randomly generate department number

delimiter $$
CREATE FUNCTION `rand_num`() RETURNS int(5)
BEGIN
  DECLARE i INT DEFAULT 0;
  -- 100~ 110之间部门号
  SET i=FLOOR(100+RAND()*10);
RETURN i;
END  $$

-- 加入要删除函数,使用下面命令
drop function rand_num;

4. Create a stored procedure

(1) The process of inserting data into the employee table emp

delimiter $$
  CREATE  PROCEDURE `insert_emp`(IN START INT(10),IN max_num INT(10))
  BEGIN
    DECLARE i INT DEFAULT 0;
    -- 关闭每次插入数据后自动提交,节省时间
    SET autocommit = 0;
    REPEAT
    SET i = i + 1;
    INSERT INTO emp(empno,ename,job,mgr,hiredate,sal,comm,deptno)
    VALUES((START+i),rand_string(6),'SALESMAN',0001,CURDATE(),2000,400,rand_num());
    UNTIL i = max_num
  END REPEAT;
  -- 对于插入的50w条数据,一次性提交
  COMMIT; 
END $$

(1) The process of inserting data into the department table dept

delimiter $$
  CREATE PROCEDURE `insert_dept`(IN START int,IN max_num int)
  BEGIN
    DECLARE i INT DEFAULT 0;
    SET autocommit = 0;
    REPEAT
    SET i = i + 1;
    INSERT INTO dept(deptno,dname,loc) VALUES((START+i),rand_string(10),rand_string(8));
    UNTIL i = max_num
  END REPEAT;
  COMMIT;
END $$

5. Call the stored procedure

(1) Insert 10 department data into the department table dept

-- 换回分界符
delimiter ;

-- call表示调用函数
CALL insert_dept(100,10);
image-20200829221540765

(2) Insert 500,000 pieces of data into the employee table emp

A total of 1000w was inserted, in order to relieve the pressure on the database, 20 times were completed.

-- 换回分界符
delimiter ;

-- call表示调用函数
-- CALL insert_emp(1,500000);
-- 为了简便起见,插入1000条数据进行测试
CALL insert_emp(1,1000);
image-20200829222358825

4、show profile

1 Introduction

image-20200829222513009

mysql is used to analyze the resource consumption of statement execution in the current session.

1. View the current show profile status

-- 默认是关闭的,使用前需要进行开启
show variables like 'profiling';

-- 开启
set profiling=on;
image-20200829222756609

2. Run SQL

select * from emp;
select * from emp limit 10;

3. View the results

show profiles;
image-20200829223423986
-- 3 对应show profiles;查询中的query_id
show profile cpu,block io for query 3;
image-20200829223520150

Among them, show profile can be followed by other parameters:

image-20200829223601779

For the results of the above query, we need to pay attention to those indicators:

  • converting HEAP to MYISAM: The result of the query is too large, the memory is not enough, and I started to move it to the disk
  • Creating tmp table: Create a temporary table (copy data to the temporary table, and then delete it after use)
  • Copying to tmp table on disk (copying the temporary table in the memory to the disk is very dangerous!!!)
  • locked
image-20200829224030561

5. Global query log

Do not enable this function in the production environment, it is mainly used in the test environment.

1. Configure and enable

In the my.cnf file, the settings are as follows:

#开启
general_log=1

#记录日志文件的路径./path/logfile可以设置指定
general_log_file = /path/logfile

#输出格式
log_output=FILE

2. Encoding is enabled

set global general_log=1;
set global log_output = 'TABLE';

After that, the sql statement you write will be recorded in the general_log table in the mysql library, and you can use the following command to view:

-- mysql是一个系统的数据库
select * from mysql.general_log;
image-20200829224908016

It is recommended to use show profile.

Guess you like

Origin blog.csdn.net/weixin_43334389/article/details/113928849