"Database Optimization" - MySQL Optimization

Foreword

  MySQL as our most popular relational database , in development, is sure to encounter large amount of data the situation, but there is not enough performance as a guarantee, often queries will be slower. Below, we started to talk about how the optimization of MySQL.

A, MySQL performance

 1, the maximum amount of data

  No amount of data and the number of concurrent database performance are soulless.

  MySQL does not limit the maximum number of records in a single table , it depends on the operating system limit on file size.

 

   "Ali Baba Java Development Manual" Recommended: single table rows over 5 million lines or single-table capacity of more than 2GB, it is recommended sub-library sub-table.

  Performance is determined by a combination of factors, put aside the complexity of the business, the impact are: hardware configuration, MySQL configuration, data table design, index optimization . 5000000 This value is only for reference, not an iron law.

  One Gangster operating over a single table over 400 million rows of data, paging, check the latest 20 records takes 0.6 seconds.

  SQL is roughly:

select field_1,field_2 from table where id < #{prePageMinId} order by id desc limit 20 ;

  prePageMinId ID is the smallest of the previous data record.

  At the time, query speed okay, as data continues to grow, one day must be overwhelmed.

  Sub-library sub-table is a long period and large high-risk job, you should try to optimize on the current structure, such as upgrading hardware, migrate historical data, etc., it Meizhe subdivision. Sub-library sub-table for interested students can read the basic idea of ​​sub-library sub-table.

  

 2, the maximum number of concurrent

  Concurrent means that the database can be the same time the number of requests processed by the max_connections and max_user_connections decision.

  max_connections: refers to the maximum number of connections MySQL instance, the upper limit value is 16384;

  max_user_connections: it refers to the maximum number of database connections per user.

  MySQL will provide a buffer for each connection, which means consuming more memory. If the connections are set too high hardware too much, is too low and can not take full advantage of the hardware.

  General requirements for both the ratio exceeds 10%, calculated as follows:

max_used_connections / max_connections * 100% = 3/100 *100%3%

  View the maximum number of connections and response maximum number of connections:

show variables like '%max_connections%';
show variables like '%max_user_connections%';

  Modify the maximum number of connections in the my.cnf configuration file:

[mysqld]
max_connections = 100
max_used_connections = 20

 

 3, the query takes 0.5 seconds

  Recommended that a single query took control in less than 0.5 seconds, 0.5 seconds is the experience points, three seconds from the principle of the user experience. If the user's operation does not respond within three seconds, it will even out of boredom.

  Response time = UI rendering client network requests Processed Processed + + + applications consuming process consuming database query, the processing time is 0.5 seconds left 1/6 database.

 

 4, the implementation of the principle of

  Compared NoSQL database, MySQL is a delicate fragile guy. It is like the female students on physical education, and a little dispute on the students arguing (expansion difficult), ran two steps out of breath (low-capacity small concurrent), often ill to leave (SQL constraints too much).

  Today I will point out a distributed, application expansion is much easier than the database, so less work is the database implementation of the principles, applications, and more work.

  a, full use but not abuse the index, the index also notes consume disk and CPU.

  B, not recommended database function to format data, to the application process.

  c, does not recommend the use of foreign key constraints, with applications to ensure data accuracy.

  d, write once read many small scenes, is not recommended to use a unique index, use the application to ensure uniqueness.

  E, the appropriate redundant field, try to create an intermediate table, intermediate results of calculations with the application, space for time.

  f, is not allowed to perform extremely time-consuming affairs, with the application is split into smaller transactions.

  g, estimated important data sheet (such as order table) and load data growth, optimize advance.

Second, the database table design

 1 , the data type

  Select principle data types: simpler or smaller footprint .

    ● If the length can be satisfied, to make use of an integer tinyint, smallint, medium_int not int.

    ● If the string length is determined, using the char type.

    ● If a varchar meet, without using text type.

    ● the use of high precision decimal type, BIGINT may also be used, such as two decimal accuracy multiplied by 100 to save.

    ● Try using timestamp instead of datetime.

   Compared Datetime, Timestamp take up less space, storing automatic conversion zone to UTC time format.

 

 2 , to avoid nulls

  MySQL in the field is  NULL, still occupy space, will make the index, the index statistics more complex. NULL value is updated to a non-NULL update can not be done in situ from, prone to split affect the performance of the index.

  Therefore, as far as possible NULL values instead of meaningful value, but also to avoid  SQL statements contained inside is not null judgment.

 

 3, text type optimization

  Since the text field to store large amounts of data, table capacity will go up early, affecting the performance of other fields of inquiry.

  We recommend drawn out on the child table, with associated natural key.

Third, the Index Tuning

 1, the index classification

    ● ordinary index: basic index.

    ● composite index: indexing the plurality of fields, the composite can be accelerated retrieval query.

    ● unique index: Similar to ordinary indexes, but the value of the index columns must be unique, allow nulls.

    ● unique index combination: a combination of column values must be unique.

    ● primary key index: special unique index, a record for a unique identification data in the table, allow nulls, usually with a primary key constraint.

    ● full-text index: for mass text query, InnoDB and MyISAM after MySQL5.6 support full-text indexing. Because the query precision and scalability poor, more companies choose Elasticsearch.

 

 2, index optimization

    ●  paging query is very important, if the amount of query data exceeds 30%, MYSQL will not use the index.

    ●  single table index number of not more than 5, a single index field number no more than five.

    ●  string prefix index may be used, the prefix length of the control characters 5-8.

    ●  field uniqueness is too low, increase the index does not make sense, such as: the effectiveness of gender.

    ●  rational use of a covering index, as follows:

Four, SQL optimization

 1 batch

  Fish ponds dug a small hole to see a child turn on the water, the water there were all kinds of floating debris. Duckweed and leaves can always pass the outlet, and will block other objects through the branches, and sometimes get stuck, the need for manual cleaning.

  MySQL is a fish pond, and the maximum number of concurrent network bandwidth is the outlet, the user SQL is floating.

  Queries with no paging parameters, or the impact of large amounts of data update and delete operations, all the branches, we want it to break up a batch process, example:

  Business Description: update users all expired coupons unavailable.

  SQL statement :

update status=0 FROM 'coupon' WHERE expire_date <= #{currentDate} and status = 1;

  If a large number of coupons need to be updated unavailable state, executes the SQL may be blocked other SQL, batch processing of the pseudo-code is as follows:

int pageNo = 1;
int PAGE_SIZE = 100;
while(true) {
    List batchIdList = queryList('select id FROM `coupon` WHERE expire_date <= #{currentDate} and status = 1 limit #{(pageNo-1) * PAGE_SIZE},#{PAGE_SIZE}');
    if (CollectionUtils.isEmpty(batchIdList)) {
        return;
    }
    update('update status = 0 FROM `coupon` where status = 1 and id in #{batchIdList}')
    pageNo ++;
}

 

 2, sub-query optimization

  Prohibit conversion sets another select field values ​​inside the sql select wording, this will only consume cpu sql results column, and each query result is not cached in-memory database, check once again cpu-consuming, must be rewritten as a join writing.

  Counterexample :

1 SELECT s.stu_name,
2        s.stu_code,
3        (SELECT t.sub_name
4           FROM subject t
5          WHERE t.sub_code = s.sub_code) sub_name
6   FROM student s
7  WHERE s.stu_code = 'GZ20200301001';

  Correct answer:

1 SELECT s.stu_name,
2             s.stu_code,
3             t.sub_name
4   FROM student s 
5    left join subject t
6  WHERE s.sub_code = t.sub_code
7       and s.stu_code = 'GZ20200301001';

 3, the operators <> Optimization

  Typically <> operator can not use the index, for example as follows, the query is not the amount of $ 100 orders:

1 select bill_no from orders where amount != 100;
2 
3 select bill_no from orders where amount <> 100;

  If the amount is under 100 orders for rare, severe uneven distribution of data such circumstances, it is possible to use the index.

  Given this uncertainty, the search results using the polymerization union, rewritten as follows:

1 (select bill_no from orders where amount > 100)
2  union all
3 (select bill_no from orders where amount < 100 and amount > 0);

 

 4, or optimization

  In Innodb engine or can not use the composite index, such as:

select id, product_name from orders where mobile_no = '18688886666' or user_id = 100;

  or not hit mobile_no + user_id combination index, Union employed, as follows:

(select id,product_name from orders where mobile_no = '18688886666a')
 union
(select id,product_name from orders where user_id = 100);

  At this point id and product_name field has an index, the query is most efficient.

 

 5, in the optimization

  in for the main table large table small child, exist for the main table big kid table. Because the query optimizer escalating, many scenes both performance almost the same thing.

  Try instead join query, for example as follows:

select id from orders where user_id in (select id from user where level = 'VIP');

  Using JOIN shown below:

select o.id from orders o left join user u on o.user_id = u.id where u.level = 'VIP';

  

 6, do not do the column operation

  Typically query the index column computation will lead to failure, as follows:

  Day inquiry Order:

select id from order where date_format(create_time,'%Y-%m-%d') = '2020-03-01';

  date_format function causes the query can not use the index, after rewrite:

select id from order where create_time between '2019-07-01 00:00:00' and '2019-07-01 23:59:59';

 

 7. Avoid Select * 

  If you do not query all the columns in the table, avoid using SELECT *, it will be a full table scan, can not effectively use the index.

 

 8, Like optimization

  Like a fuzzy query, for example (field indexed):

SELECT column FROM table WHERE field like '%keyWord%';

  This query is not life index , and replaced with the following wording:

SELECT column FROM table WHERE field like 'keyWord%';

  In addition to the previous query% will hit the index, but the product manager must be fuzzy match before and after it?

  Full-text indexing Fulltext can try, but Elasticsearch is the ultimate weapon.

 

 9, Join Optimization

  Is achieved using Join Nested Loop Join algorithm, the result is set by the drive as the basic data table, the data through the node to the next as a filter condition table query data cycle, then combined the results.

  If there are a plurality of Join, the result is set as the preceding cycle data, again after a query data table.

  Table-driven table and driven increase query as to meet the ON condition is less Where, with little result set to drive large result sets.

  Plus the index driven table Join field, when not indexed, provision of adequate Join Buffer Size.

  Join prohibited connect more than three tables, try to increase the redundancy field.

 

 10, Limit Optimization

  Limit query for paging when the next turn worse performance, principle solution: Reduce the scan area, as shown below :

select * from orders order by id desc limit 100000, 10;
-- 耗时0.4秒

select * from orders order by id desc limit 1000000, 10;
-- 耗时5.2秒

  先筛选出ID缩小查询范围,写法如下:

select *
  from orders
 where id > (select id from orders order by id desc limit 1000000, 1)
 order by id desc limit 0, 10;
-- 耗时0.5秒

  如果查询条件仅有主键ID,写法如下:

select id
  from orders
 where id between 1000000 and 1000010
 order by id desc;
-- 耗时0.3秒

五、SQL优化十条(含Oracle)

(1)选择最有效率的表名顺序(只在基于规则的优化器中有效):

  Oracle 的解析器按照从右到左的顺序处理 From 子句中的表名,From 子句中写在最后的表(基础表 driving table)将被最先处理,在 From 子句中包含多个表的情况下,你必须选择记录条数最少的表作为基础表。

  如果有3个以上的表连接查询, 那就需要选择交叉表(intersection table)作为基础表,,交叉表是指那个被其他表所引用的表。

 

(2)WHERE子句中的连接顺序:

  Oracle 采用自下而上的顺序解析 where 子句,根据这个原理,表之间的连接必须写在其他 where 条件之前, 那些可以过滤掉最大数量记录的条件必须写在 where 子句的末尾。

 

(3)SELECT子句中避免使用‘*’:

  Oracle 在解析的过程中, 会将 * 依次转换成所有的列名,这个工作是通过查询数据字典完成的,这意味着将耗费更多的时间。

 

(4)使用 decode 函数来减少处理时间:

  使用 decode 函数可以避免重复扫描相同记录或重复连接相同的表。

 

(5)整合简单,无关联的数据库访问:

  如果你有几个简单的数据库查询语句,你可以把它们整合到一个查询中(即使它们之间没有关系)。

 

(6)用 Truncate 替代 Delete:

  当删除表中的记录时,在通常情况下,回滚段(rollback segments)用来存放可以被恢复的信息.。

  如果你没有 Commit 事务,ORACLE会将数据恢复到删除之前的状态(准确地说是恢复到执行删除命令之前的状况) 而当运用TRUNCATE时, 回滚段不再存放任何可被恢复的信息。

  当命令运行后,数据不能被恢复。因此很少的资源被调用,执行时间也会很短。(Truncate 只在删除全表适用,Truncate 是 DDL,不是 DML)。

 

(7)使用表的别名(Alias):

  当在 SQL 语句中连接多个表时,请使用表的别名并把别名前缀于每个 Column 上。这样一来,就可以减少解析的时间并减少那些由 Column 歧义引起的语法错误。

 

(8)用 >= 替代 >:

-- 高效:
SELECT * FROM EMP WHERE DEPTNO >=4

-- 低效: 
SELECT * FROM EMP WHERE DEPTNO > 3

  两者的区别在于,前者 DBMS 将直接跳到第一个 Dept 等于4的记录,而后者将首先定位到 DeptNO=3 的记录并且向前扫描到第一个 Dept 大于3的记录。

 

(9)SQL语句用大写的:

  因为 Oracle 总是先解析 SQL 语句,把小写的字母转换成大写的再执行。

 

(10)用 Where 子句替换 Having 子句:

  避免使用 Having 子句,Having 只会在检索出所有记录之后才对结果集进行过滤。这个处理需要排序,总计等操作。如果能通过 Where 子句限制记录的数目,那就能减少这方面的开销。

六、其他数据库

  作为一名优秀的后端开发人员,务必精通作为存储核心的 MySQL 或 SQL Server,也要积极关注 NoSQL 数据库,它们已经足够成熟并被广泛应用,能解决特定场景下的性能瓶颈。

 

 

 

 

select *  from orders where id > (select id from orders order by id desc limit 1000000, 1) order by id desc limit 0, 10;

 

Guess you like

Origin www.cnblogs.com/qiuhaitang/p/12593727.html