Wall crack recommend: Developers will learn a few SQL optimization points

Click on the blue " Python space " Ah my attention

Add a " star " happy learning together every day


来源：https://www.cnblogs.com/xiaoyangjia/p/11267191.html

Bloggers responsible for the project mainly Ali cloud database MySQL, SQL recent slow alarms occur frequently, even the longest execution time up to 5 minutes. After the export log analysis, it turned out to be the main reason for not hit and no index paging process . In fact, this is a very low-level error, I can not help but back a cool, technical level of the team members needs to be improved ah. The process of transformation of these SQL, summed up some experience to share to you, if there is an error welcome criticism.

01 MySQL Performance

1. The maximum amount of data

Despite the amount of data and the number of concurrent, talk about the performance of bullying . MySQL is no limit on the maximum number of records in a single table, it depends on the operating system limit on file size.

File system	Single file size limit
FAT32	The largest 4G
NTFS	Maximum 64GB
NTFS5.0	The maximum 2TB
EXT2	Block size is 1024 bytes, the maximum capacity of 16GB file; block size of 4096 bytes, the maximum capacity of the file 2TB
EXT3	Block size is 4KB, the maximum capacity of the file 4TB
EXT4	Theory can be greater than 16TB

"Ali Baba Java Development Manual" put forward a single table rows over 5 million lines or single-table capacity of more than 2GB, it is recommended sub-library sub-table. Performance is determined by a combination of factors, put aside the complexity of the business, the impact is followed by the hardware configuration, MySQL configuration, data table design, index optimization. 5000000 This value is only for reference, not an iron law.

Bloggers have been operating over a single table over 400 million rows of data, paging, check the latest 20 records takes 0.6 seconds, SQL statements are generally

select field_1,field_2 from table where id < #{prePageMinId} order by id desc limit 20

prePageMinId ID is the smallest of the previous data record. At the time, query speed okay, as data continues to grow, one day must be overwhelmed. Sub-library sub-table is a long period and large high-risk job, you should try to optimize on the current structure, such as upgrading hardware, migrate historical data, etc., it Meizhe subdivision.

Sub-library sub-table for interested students can read basic idea of sub-library sub-table:

https://www.cnblogs.com/jshen/p/7682502.html

2. Maximum number of concurrent

The number of concurrent database can refer to the same time the number of requests processed, is determined by the max_connections and max_user_connections.

refers to the maximum number of connections max_connections MySQL instance, the upper limit value is 16384, max_user_connections is the maximum number of database connections per user.

MySQL will provide a buffer for each connection, which means consuming more memory. If the connections are set too high hardware too much, is too low and can not take full advantage of the hardware. General requirements for both the ratio exceeds 10%, calculated as follows:

max_used_connections / max_connections * 100% = 3/100 *100% ≈ 3%

View the maximum number of connections and response maximum number of connections:

show variables like '%max_connections%';show variables like '%max_user_connections%';
在配置文件 my.cnf 中修改最大连接数

[mysqld]max_connections = 100max_used_connections = 20

3. Query took 0.5 seconds

Recommended that a single query took control in less than 0.5 seconds, 0.5 seconds is an experience, from user experience three seconds principles . If the user's operation does not respond within three seconds, it will even out of boredom. Response time = UI rendering client network requests Processed Processed + + + applications consuming process consuming database query, the processing time is 0.5 seconds left 1/6 database.

4. The implementation of the principle

Compared NoSQL database, MySQL is a delicate fragile guy. It is like the female students on physical education, and a little dispute on the students arguing (expansion difficult), ran two steps out of breath (low-capacity small concurrent), often ill to leave (SQL constraints too much).

Today we will point out a distributed, application expansion is much easier than the database, so the implementation of the principle of the database less work, more work applications .

But do not take full advantage of the abuse index, index notes also consume disk and CPU.
Not recommended to use the database function to format the data to the application process.
Not recommended to use foreign key constraints to ensure the accuracy of the data with the application.
Write Once Read Many small scenes, is not recommended to use a unique index, use the application to ensure uniqueness.
Appropriate redundant field, try to create an intermediate table, intermediate results of calculations with the application, space for time.
Not allowed to perform extremely time-consuming affairs, with the application split into smaller transactions.
Estimated important data sheet (such as order table) and load data growth, optimize advance.

02 Data table design

1. Data Type

Select principle data types: simple or more smaller footprint.

If the length can be satisfied, to make use of an integer tinyint, smallint, medium_int not int.
If the string length is determined, using the char type.
If varchar meet, without using text type.
The use of high precision decimal type, BIGINT may also be used, such as two decimal accuracy multiplied by 100 to save.
Try using timestamp instead of datetime.

Types of	Occupy bytes	description
datetime	8 bytes	'1000-01-01 00:00:00.000000' to '9999-12-31 23:59:59.999999
timestamp	4 bytes	'1970-01-01 00:00:01.000000' to '2038-01-19 03:14:07.999999'

Compared datetime, timestamp take up less space, the storage zone is automatically converted to UTC time format.

2. Avoid null

MySQL in the field is still NULL space, will make the index, the index statistics more complex. NULL value is updated to a non-NULL update can not be done in situ from, prone to split affect the performance of the index. The NULL value instead of as a meaningful value can be avoided which contains a SQL statement is not nulldetermination.

3. text type optimization

Since the text field to store large amounts of data, table capacity will go up early, affecting the performance of other fields of inquiry. We recommend drawn out on the child table, with associated natural key.

4. Index Tuning

1) Classification Index

Ordinary Index: basic index.
Composite index: indexing the plurality of fields, the composite can be accelerated retrieval query.
The only index: Similar to ordinary indexes, but the value of the index columns must be unique, allow nulls.
A combination of a unique index: a combination of column values must be unique.
Primary key index: special unique index, a record for a unique identification data in the table, allow nulls, usually with a primary key constraint.
Full-text index: for mass text query, InnoDB and MyISAM after MySQL5.6 support full-text indexing. Because the query precision and scalability poor, more companies choose Elasticsearch.

2) Index Tuning

Paging query is very important, if the amount of query data exceeds 30%, MYSQL will not use the index.
Single table index number not more than 5, a single index field number no more than five.
String prefix index may be used, the prefix length of the control characters 5-8.
The only field is too low, increase the index does not make sense, such as: whether to remove the gender.

Rational use of a covering index, as follows:

select login_name, nick_name from member where login_name = ?
login_name, nick_name两个字段建立组合索引，比login_name简单索引要更快

5. SQL optimization

1) batch

Bloggers see a child ponds dug a small hole in the drain, the water there are all kinds of floating debris. Duckweed and leaves can always pass the outlet, and will block other objects through the branches, and sometimes get stuck, the need for manual cleaning.

MySQL is a fish pond, and the maximum number of concurrent network bandwidth is the outlet, the user SQL is floating. Queries with no paging parameters, or the impact of large amounts of data update and delete operations, all the branches, we want it to break up a batch process, example:

Business Description: update users all expired coupons unavailable.

SQL statement:

update status=0 FROM `coupon` WHERE expire_date <= #{currentDate} and status=1;

If a large number of coupons need to be updated unavailable state, executes the SQL may be blocked other SQL, batch processing of the pseudo-code is as follows:

int pageNo = 1;
int PAGE_SIZE = 100;
while(true) {
    List<Integer> batchIdList = queryList('select id FROM `coupon` WHERE expire_date <= #{currentDate} and status = 1 limit #{(pageNo-1) * PAGE_SIZE},#{PAGE_SIZE}');
    if (CollectionUtils.isEmpty(batchIdList)) {
        return;
    }
    update('update status = 0 FROM `coupon` where status = 1 and id in #{batchIdList}')
    pageNo ++;
}

2) operators <> Optimization

Typically <> operator can not use the index, for example as follows, the query is not the amount of $ 100 orders:

select id from orders where amount != 100;

If the amount is under 100 orders for rare, severe uneven distribution of data such circumstances, it is possible to use the index. Given this uncertainty, the search results using the polymerization union, rewritten as follows:

(select id from orders where amount > 100) union all(select id from orders where amount < 100 and amount > 0)

3) OR optimization

In Innodb engine or can not use the composite index, such as:

select id，product_name from orders where mobile_no = '13421800407' or user_id = 100;

Mobile_no + user_id not hit OR combination of the index, Union employed, as follows:

(select id，product_name from orders where mobile_no = '13421800407') union(select id，product_name from orders where user_id = 100);
此时id和product_name字段都有索引，查询才最高效。

4) IN optimization

IN large main table for small child table, EXIST main table for big kid table. Because the query optimizer escalating, many scenes both performance almost the same thing.

Try instead join query, for example as follows:

select id from orders where user_id in (select id from user where level = 'VIP');

Using JOIN shown below:

select o.id from orders o left join user u on o.user_id = u.id where u.level = 'VIP';

5) do not do the column operation

Query conditions through the column arithmetic operation will lead to the failure index, as follows:

Queries day orders

select id from order where date_format(create_time，'%Y-%m-%d') = '2019-07-01';

date_format function causes the query can not use the index, after rewrite:

select id from order where create_time between '2019-07-01 00:00:00' and '2019-07-01 23:59:59';

6) Avoid select all

If you do not query all the columns in the table, to avoid the use SELECT *, it will be a full table scan, can not effectively use the index.

7) Like optimization

like a fuzzy query, for example (field indexed):

SELECT column FROM table WHERE field like '%keyword%';

This query misses the index and replaced with the following wording:

SELECT column FROM table WHERE field like 'keyword%';

In addition to the previous query% will hit the index, but the product manager must be fuzzy match before and after it? Full-text indexing fulltext can try, but Elasticsearch is the ultimate weapon.

8) Join Optimization

Join to achieve is the use of Nested Loop Join algorithm, the result is set by the drive as the basic data table, the data through the node to the next as a filter condition table query data cycle, then combined the results. If multiple join, in front of the result is set as the cyclic data after a re-query the data tables.

Table-driven table and driven increase query as to meet the ON condition and less Where, with little result set to drive large result sets.
Is indexed and join field on the drive table, time can not be indexed, provision of adequate Join Buffer Size.
Prohibit join connect more than three tables, try to increase the redundancy field.

9) Limit Optimization

When the query for paging limit next turn worse performance, principle solution: Reduce the scan area , as shown below:

select * from orders order by id desc limit 100000,10 耗时0.4秒select * from orders order by id desc limit 1000000,10耗时5.2秒

First screened ID narrow your search, worded as follows:

select * from orders where id > (select id from orders order by id desc  limit 1000000, 1) order by id desc limit 0,10耗时0.5秒

If the query conditions only the master key ID, worded as follows:

select id from orders where id between 1000000 and 1000010 order by id desc耗时0.3秒

If the above program is still very slow? I had to use the cursor, and interested friends to read JDBC use the cursor implement paging query

https://www.cnblogs.com/firstdream/p/7732656.html

03 Other databases

As a back-end developer, be sure proficient in MySQL or SQL Server as the storage core, but also an active interest in NoSQL database, they have matured and are widely used enough to solve performance bottlenecks in specific scenarios.

classification	database	characteristic
Key type	Memcache	For content caching, high load large volumes of data
Key type	Redis	For content caching, support more than Memcache data types, and can be persistent data
Columnar storage	HBase	Hadoop core database system, massive structured data storage, big data necessary.
Document type	MongoDb	Well-known document database can also be used to cache
Document type	CouchDB	Apache open source projects, focusing on ease of use, support for REST API
Document type	SequoiaDB	Well-known document database
Graph	Neo4J	Map for social networking to build relationships, recommendation systems

-END-
推荐阅读：
算法题从入门到放弃？刷了几千道算法题，关于如何刷题有些话我想对你说

刷了几千道算法题，这些我私藏的刷题网站都在这里了！

奥利给！有了这么豪横的指南，还愁不会逛 GitHub？！

速观！GitHub 总星 5.4w+，这里藏着 Git 从入门到轻松玩转的秘密!

Ｂ站收藏 6.1w+！GitHub 标星 3.9k+！这门神课拯救了我薄弱的计算机基础


????扫描上方二维码即可关注

Rocky0429 blog expert

Published 613 original articles · won praise 7657 · Views 1.33 million +

His message board concerns