Annual salary of four hundred thousand +, MySQL optimization summary

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/qq_39662660/article/details/97013314

Annual salary of four hundred thousand +, MySQL optimization summary

 

Selecting the storage engine (MyISAM and Innodb)

Storage engine: Data in MySQL, how to index and other objects are stored, is to achieve a file system.

5.1 is the default storage engine before MyISAM, the default storage engine after 5.1 is Innodb.

Functional Differences

Annual salary of four hundred thousand +, MySQL optimization summary

 

Selected based on

MyISAM engine simple design, data storage in a compact format, so some scenes very good performance reading.

If no special requirements, you can use the default Innodb.

MyISAM: read-write insert-based applications, such as blog system, news portal.

Innodb: update (delete) the operating frequency is high, or to ensure the integrity of the data; high concurrency, transaction support and foreign keys to ensure data integrity. Such as OA office automation system.

Official website recommended

The official recommended Innodb, above only tell you that the data engine can be selected, but in most cases was not elected wonderful

2. field design

3 large database design paradigm

  • The first paradigm (each column to ensure atomicity)
  • The second paradigm (to ensure that each list and are associated primary key)
  • Third Normal Form (make sure each column and is directly related to the primary key column, rather than indirectly)

Design paradigm is generally recommended, since normalization to perform operations often make faster. But this is not absolute, paradigm is flawed, usually associated with the query, not only expensive, but also may make some indexing strategy invalid.

So, we sometimes need to be confused with anti-normalization and normalization, such as a low frequency fields can be updated in the table redundancy to avoid association inquiry

Single table field not too much

We recommend a maximum of 30 or less

The more fields can cause performance degradation, and increase the difficulty of development (in a single glance endless field, we develop these earners will suddenly silly out)

Suitable small and simple data types

a. string type

Use fixed-length char, the use of non-fixed length varchar, and assign the appropriate and sufficient space

char at the query, trailing spaces will be removed;

b. decimal type

Generally you can use float or double, small footprint, but the store may lose precision

You may store decimal decimal precision, high memory requirements when using decimal financial data or longitude

c. Date Time

datetime:

  • Range: 1001 - 9999
  • Storage: 8 bytes of storage, the storage format YYYYMMDDHHMMSS
  • Time zone: regardless of the time zone

timestamp:

  • Range: 1970 - 2038
  • Storage: 4 bytes of storage, memory stored in UTC, the same UNIX timestamp
  • Time zone: the current time zone conversion storage, and converted back to the current time zone upon retrieval

1. usually try to use the timestamp, since it takes up little space, and time zone conversion occurs automatically, without concern for regional jet lag

2.datetime timestamp and stores the minimum particle size is only seconds, a time stamp may be used microsecond storage type BIGINT

d., and large text data blob

blob and text for the string data type to store a lot of data and design, but it is generally recommended to avoid using

MySQL will each blob as separate objects and text processing, will do special handling during storage engine is stored, when the value is too large, the use of specialized innoDB external storage area for storing, the line memory pointer, then the actual value of the external storage. These will lead to serious performance overhead

Try setting the column to NOT NULL

a. may occupy more storage space of the column is NULL

b. columns may be NULL, and using the index value comparison, mySQL require special handling, some performance loss

Recommendation: usually best designated as NOT NULL, NULL value unless it really needs to be stored

Try to use a primary key integer

a. integer type identifier column typically the best choice, because they can be used quickly and AUTO_INCREMENT

b. should be avoided as the identification string type column, because they are space consuming, and typically slower than the digital type

c. For complete "random" string also need to pay more attention. For example: MD5 (), SHAI () or the UUID () generated string. Function generates new values ​​are also randomly distributed over a large space, which can lead to a number of SELECT statements and INSERT very slow

Java architecture circles

index

Why use an index fast

  • Index with respect to the data itself, a small amount of data
  • The index is ordered, you can quickly determine the location of the data
  • Represents an index organized tables, data distribution table in accordance with the primary key ordering InnoDB

Contents of the book is like, want to find a certain content, you can directly see the directory to find the corresponding page

Storage structure index

a.B + tree (specific structure not say that he get to know)

b. Hash (Configuration of key-value pairs)

MySQL is the primary key index with a B + tree structure can be selected non-primary key index B + tree or hash

It is generally recommended to use B + tree index

Because more hash indexes disadvantages:

1. can not be used for sorting

2. can not be used for range queries

3. large amount of data, there may be a large number of hash collisions, inefficient

Type index

Action by Category:

1. primary key index: do not explain, you know

2. Ordinary Index: is not particularly limited, allowing duplicate values

3. unique index: do not allow duplicate values, slightly faster than the average index

4. The full-text index: match as a full-text search, but basically do not have access, only English word index, and operating a high price

Data storage structure by Category:

1. clustered index

The same sequence of physical row and column values ​​of data (that is typically the primary key column) of the logical order, a table can have only one clustered index: defined.

Primary key index clustered index is stored and the sequence order of the data is the same primary key

2. Non-clustered index

Definition: The logical order of the index index and the physical disk storage order different uplink, a table can have multiple non-clustered index.

Index other than the clustering index is non-clustered index, broken down into the general index, the only index, full-text indexing, they are also called secondary indexes.

FIG following relation <High Performance MySQL> Innodb storing data and indexes

 

Annual salary of four hundred thousand +, MySQL optimization summary

 

 

Stored in the leaf node is the primary key index "row pointer", directly to the data line of the physical file.

Secondary index leaf node is stored in the primary key value

Cover index : available directly from the primary key index for direct return without index data table

such as:

Suppose there is a table t (clo1, clo2) multi-column index

select clo1,clo2 from t where clo = 1

So, use this sql queries, data can be obtained directly from (clo1, clo2) index tree without back-table query

So we need to write as much as possible after select only the necessary query field, in order to increase the chances of index covering.

多列索引:使用多个列作为索引,比如(clo1,clo2)

使用场景:当查询中经常使用clo1和clo2作为查询条件时,可以使用组合索引,这种索引会比单列索引更快

需要注意的是,多列索引的使用遵循最左索引原则

假设创建了多列索引index(A,B,C),那么其实相当于创建了如下三个组合索引:

1.index(A,B,C)

2.index(A,B)

3.index(A)

这就是最左索引原则,就是从最左侧开始组合。

索引优化

1.索引不是越多越好,索引是需要维护成本的

2.在连接字段上应该建立索引

3.尽量选择区分度高的列作为索引,区分度count(distinct col)/count(*)表示字段不重复的比例,比例越大扫描的记录数越少,状态值、性别字段等区分度低的字段不适合建索引

4.几个字段经常同时以AND方式出现在Where子句中,可以建立复合索引,否则考虑单字段索引

5.把计算放到业务层而不是数据库层

6.如果有 order by、group by 的场景,请注意利用索引的有序性。

  • order by 最后的字段是组合索引的一部分,并且放在索引组合顺序的最后,避免出现 file_sort 的情况,影响查询性能。

 

例如对于语句 where a=? and b=? order by c,可以建立联合索引(a,b,c)。

order by 最后的字段是组合索引的一部分,并且放在索引组合顺序的最后,避免出现 file_sort(外部排序) 的情况,影响查询性能。

  • 例如对于语句 where a=? and b=? order by c,可以建立联合索引(a,b,c)。
  • 如果索引中有范围查找,那么索引有序性无法利用,如 WHERE a>10 ORDER BY b;索引(a,b)无法排序。

可能导致无法使用索引的情况

1.is null 和 is not null

2.!= 和 <> (可用in代替)

3."非独立列":索引列为表达式的一部分或是函数的参数

例如:

表达式的一部分:select id from t where id +1 = 5

函数参数:select id from t where to_days(date_clo) >= 10

4.like查询以%开头

5.or (or两边的列都建立了索引则可以使用索引)

6.类型不一致

如果列是字符串类型,传入条件是必须用引号引起来,不然无法使用索引

select * from tb1 where email = 999;

Java架构圈子​​​​​​​

3.Sql优化建议

1.首先了解一下sql的执行顺序,使我们更好的优化

(1)FROM:数据从硬盘加载到数据缓冲区,方便对接下来的数据进行操作

(2)ON:join on实现多表连接查询,先筛选on的条件,再连接表

(3)JOIN:将join两边的表根据on的条件连接

(4)WHERE:从基表或视图中选择满足条件的元组

(5)GROUP BY:分组,一般和聚合函数一起使用

(6)HAVING:在元组的基础上进行筛选,选出符合条件的元组(必须与GROUP BY连用)

(7)SELECT:查询到得所有元组需要罗列的哪些列

(8)DISTINCT:去重

(9)UNION:将多个查询结果合并

(10)ORDER BY:进行相应的排序

(11)LIMIT:显示输出一条数据记录

  • join on实现多表连接查询,推荐该种方式进行多表查询,不使用子查询(子查询会创建临时表,损耗性能)。
  • 避免使用HAVING筛选数据,而是使用where
  • ORDER BY后面的字段建立索引,利用索引的有序性排序,避免外部排序
  • 如果明确知道只有一条结果返回,limit 1 能够提高效率

2.超过三个表最好不要 join

3.避免 SELECT *,从数据库里读出越多的数据,那么查询就会变得越慢

4.尽可能的使用 NOT NULL列,可为NULL的列占用额外的空间,且在值比较和使用索引时需要特殊处理,影响性能

5.用exists、not exists和in、not in相互替代

原则是哪个的子查询产生的结果集小,就选哪个

select * from t1 where x in (select y from t2)
select * from t1 where exists (select null from t2 where y =x)

IN适合于外表大而内表小的情况;exists适合于外表小而内表大的情况

6、使用exists替代distinct

当提交一个包含一对多表信息(比如部门表和雇员表)的查询时,避免在select子句中使用distinct,一般可以考虑使用exists代替,exists使查询更为迅速,因为子查询的条件一旦满足,立马返回结果。

低效写法:

select distinct dept_no,dept_name from dept d,emp e where d.dept_no=e.dept_no

 

高效写法:

select dept_no,dept_name from dept d where exists (select 'x' from emp e where e.dept_no=d.dept_no)

 

备注:其中x的意思是:因为exists只是看子查询是否有结果返回,而不关心返回的什么内容,因此建议写一个常量,性能较高!

用exists的确可以替代distinct,不过以上方案仅适用dept_no为唯一主键的情况,如果要去掉重复记录,需要参照以下写法:

select * from emp where dept_no exists (select Max(dept_no)) from dept d, emp e where e.dept_no=d.dept_no group by d.dept_no)

 

7、避免隐式数据类型转换

隐式数据类型转换不能适用索引,导致全表扫描!t_tablename表的phonenumber字段为varchar类型

以下代码不符合规范:

select column1 into i_l_variable1 from t_tablename where phonenumber=18519722169;

 

It should be written as follows:

select column1 into i_lvariable1 from t_tablename where phonenumber='18519722169';

 

8. query segment

In some query page, when a user selected time range is too big, causing slow queries. The main reason is the excessive number of scanning lines. This time can be programmed, segmentation query, loop through the results consolidation process on display.

4.Expalin analysis of the implementation plan

explain show how to use the index to handle mysql select statement and connection table. You can help choose a better index and write more optimized queries.

Example:

explain SELECT user_name from sys_user where user_id <10

 

Annual salary of four hundred thousand +, MySQL optimization summary

 

 

The connection type is Range statement, using the primary key index range query, scanning the estimated data line 100

For more details see the meaning seen from the following table

Annual salary of four hundred thousand +, MySQL optimization summary

 

 

Annual salary of four hundred thousand +, MySQL optimization summary

 

If you feel good, Give me a "concern"

 

Fans Welfare

Annual salary of four hundred thousand +, MySQL optimization summary

 

 

Annual salary of four hundred thousand +, MySQL optimization summary

If java micro-services, distributed, high concurrency, high availability, large-scale Internet infrastructure technology, exchange of experience in the interview.
I can add Java architecture circle to receive the information, which updates daily, free to receive.

Guess you like

Origin blog.csdn.net/qq_39662660/article/details/97013314