Write SQL experience, many overtime lessons learned

The following are the issues to pay attention to in SQL, some are pits I have stepped on, and some are stepped on by colleagues. You may not feel anything when you watch it, but you will remember it when you accidentally make a problem hahaha

select clause

(1) GROUP_CONCAT

Consider using carefully

This method is to aggregate the same lines into strings in the form of grouping . The characters are separated by commas by default , which is a bit time-consuming. When you use it, consider the number of lines to be aggregated. Use it with caution if there are too many lines. Before a colleague, SQL timed out because of this

As for its usefulness, for example:

All students in the school should be divided into classes

select class as 班级 , GROUP_CONCAT(studentName) as 学生名单 from student group by class
class Student list
Class 1 Xiao Ming, Xiao Bai, Xiao Huang
Class 2 Little pig, Ergouzi

(2) Inquiry

I don’t know how to call it, and I’ve never seen anyone write it like this before, it looks like this:

select studentName as 学生姓名, (select grade from student_grade where class = '1班') as 成绩 from student_info where class = '1班'  

Suppose there are 50 people in class 1

According to the execution order of SQL: first from -> where -> select , every time we check the name of a classmate, we check the grade of the whole class (that is, select grade is executed once);

In a word, how many rows the SQL of the outer query returns (50 studentNames total 50 rows), the inner query will check a few times, and a total of 50 more checks, if the outer query returns 500,000 rows

It is recommended to use less internal query


from clause

(1) Join try not to connect tables with a large amount of data

Join needs to use temporary table memory and will generate a temporary table. For large tables with millions of data, use You have to be cautious about left join Up:

Error demonstration:

select .. from 600万行的表  left join 1000行的表  on ...

For a left join, no matter what the conditions behind on are, a temporary table of 6 million rows will be generated , and the temporary table memory of your database will burst.

Because the left join causes the large table on the left to be almost scanned by the entire table, the index will not take effect with a high probability.

left join + where means that after the temporary table is generated, the temporary table is filtered and deleted with the conditions of where


The where clause must ensure that the index does not fail

(1) Index column operation

Error demonstration:

# (1) DATE_FORMAT会导致日期索引失效,假设create_time是索引
select record_info as 日志内容 from record_log where  DATE_FORMAT(create_time,'%Y-%m-%d')='2021-02-23'

# (2) age是int 型, 且作为索引列, 参与了算术运算, 导致索引失效
'age' int(11) Not NULL DEFAULT 0 COMMENT '年龄'
select studentName as 学生姓名 from student_info where age+1 = 18

The DATE_FORMAT of the first SQL will cause the date index to become invalid, because the database will calculate the create_time row by row before comparing it, which is basically a full table scan

The second SQL performs operations on the index column age, which will cause the index to fail

(2) Avoid automatic data type conversion

What type of fields are in the table and what type of parameters are passed in

Error demonstration:

# age是字符型, 且为索引
'age' varchar(6) Not NULL DEFAULT '0' COMMENT '年龄'

alter table student_info add index index_age('age')

select studentName as 学生姓名 from student_info where age = 18

age as an index column

  • Table definition is the character's age: 18 will pass type int index failure , age data sheet in case of "18.0", "18" in both cases it can lead to age is converted into an int row after Compare with 18

(3) Bring the sub-library key

If the project database has sub-databases and sub-tables, try to bring the sub-database key when querying . It can distribute SQL statements to the specified database tables for execution. Our DRDS is like this

If there is no sub-database key (split key), it will cause the SQL statement to be scanned and executed in the entire database, which is very slow


(4) The order of the composite index

创建复合索引
alter table student_info add index index_collection('age','studentName','sex')

Only the order of the following three where statements will follow the compound index:

  • age,studentName,sex
  • age,studentName
  • age

In addition, the first index column age involves ">,<,between and" , so the index is disabled.

Because when it comes to range queries , the index lookup of the B+ tree is" Traverse the linked list of leaf nodes directly from left to right ", Instead of looking up from the root node from top to bottom


group by and order by are fast with index

Indexes behind group by and order by can reduce overhead


Table structure definition

(1) Less Default NULL

Especially if the definition of the index column is Default NULL, it will greatly affect the stability

The definition of the column is best to have a default value, that is, NOT NULL DEFAULT VALUE


(2) The definition of the field gives a comment COMMENT

In particular, it shows the state of the type of field, and the values are given to all states corresponding to the meaning, for example:

`state` int(11) NOT NULL DEFAULT 0  COMMENT'0 进行中,  1 完成,   2 失效'

(3) Less use of text type

Text is a data type of long text. The mysql server consumes a lot of network bandwidth to send text data back to the client , while loading the text data from the disk to the memory when the server queries requires a lot of IO bandwidth .

Therefore, when defining the data type of a field, text is generally only used when the length of the data cannot be determined , and it is used with caution in other situations.


Index settings

(1) The index has a high degree of discrimination

What is high discrimination? That is, the field values ​​of the index columns are best to be different, the less repeated, the higher the degree of discrimination . The primary key is the most distinguishable. Each row is a unique value. The B+ tree can easily divide all rows.

The kind of low discrimination index on the column better not , such as gender sex, nothing more than male or female, there will be a lot of duplicate rows of data, imagine:

If you want to find out who the boys are with scores higher than 90, there are 50 people in total, and there are 49 boys...

在成绩表grade 将性别sex设置为索引
alter table grade add index index_sex('sex')

select studentName from grade where sex='男' and grade>90 

The expected implementation is to take out 49 boys and compare the results one by one, which is almost the same as if there is no index...

Because the frequency of index sex='male' is too high, it is estimated that MySQL's execution strategy will become a full table scan.

However, our company has such a problem, we dare not say anything. . .


(2) The data length of the index field is better to be smaller

The node of the B+ tree needs to store every value of the index column. If every value of the index column is very large, MySQL will consume a lot of IO bandwidth in the process of loading the index into memory.

In addition, if the data length of the index field is small, the memory occupied is small . As you know, the space of the cache index set by innodb_buffer_pool_size is limited. The smaller the data length of the index field, the more key values ​​that the memory can hold, which increases the probability that the target value can be found in one time. Once you can’t find it, you have to load other indexes from the disk to check

Can use int type, try not to use BigInt

The primary key should make the data length smaller . Each secondary index (except the index of the clustered index) will store the index field (other fields except the primary key) + the corresponding primary key . If the primary key is too long, then each secondary index The more memory it takes up


I don't know the others, I can't write. . .

Guess you like

Origin blog.csdn.net/qq_44384533/article/details/113941577