It is necessary to write a good hand SQL

Foreword

Bloggers responsible for the project mainly Ali cloud database MySQL , SQL recent slow alarms occur frequently, when executed among the longest actually ran up to five minutes . After the export log analysis, it turned out to be the main reason for not hit and no index paging process  .

In fact, this is a very low-level error, I can not help but back a cool, technical level of the team members needs to be improved ah. The process of transformation of these SQL, summed up some experience to share to you, if there is an error welcome criticism.

MySQL Performance

The maximum amount of data

Despite the amount of data and the number of concurrent, talk about the performance of bullying  . MySQL is no limit on the maximum number of records in a single table, it depends on the operating system limit on file size.

File system Single file size limit
FAT32 The largest 4G
NTFS Maximum 64GB
NTFS5.0 The maximum 2TB
EXT2 Block size is 1024 bytes, the maximum capacity of 16GB file; block size of 4096 bytes, the maximum capacity of the file 2TB
EXT3 Block size is 4KB, the maximum capacity of the file 4TB
EXT4 Theory can be greater than 16TB

"Ali Baba Java Development Manual" put forward a single table rows over 5 million lines or single-table capacity of more than 2GB, it is recommended sub-library sub-table. Performance is determined by a combination of factors, put aside the complexity of the business, the impact is followed by the hardware configuration, MySQL configuration, data table design, index optimization. 5000000 This value is only for reference, not an iron law. Bloggers have been operating over a single table over 400 million rows of data, paging, check the latest 20 records takes 0.6 seconds, SQL statements, roughly , prePageMinId is the smallest ID Previous data record. At the time, query speed okay, as data continues to grow, one day must be overwhelmed. Sub-library sub-table is a long period and large high-risk job, you should try to optimize on the current structure, such as upgrading hardware, migrate historical data, etc., it Meizhe subdivision. Sub-library sub-table for interested students can read the basic idea of sub-library sub-table. select field_1,field_2 from table where id < #{prePageMinId} order by id desc limit 20

The maximum number of concurrent

Concurrent database referring to the same time how many requests can be processed by max Connections and max User determined connections. max ** Connections refers MySQL maximum number of connections instance, the upper limit value is 16384, max User Connections refers to the maximum number of connections per user database. MySQL will provide a buffer for each connection, which means consuming more memory. If the connections are set too high hardware too much, is too low and can not take full advantage of the hardware. General requirements for both the ratio exceeds 10%, calculated as follows:

max_used_connections / max_connections * 100% = 3/100 *100% ≈ 3%

View the maximum number of connections and response maximum number of connections:

show variables like '%max_connections%';show variables like '%max_user_connections%';

Modify the maximum number of connections in the configuration file my.cnf

[mysqld]max_connections = 100max_used_connections = 20

Query takes 0.5 seconds

Recommended that a single query took control in less than 0.5 seconds, 0.5 seconds is an experience, from user experience three seconds principles  . If the user's operation does not respond within three seconds, it will even out of boredom. Response time = UI rendering client network requests Processed Processed + + + applications consuming process consuming database query, the processing time is 0.5 seconds left 1/6 database. 

Implementation of the principle

Compared NoSQL database, MySQL is a delicate fragile guy. It is like the female students on physical education, and a little dispute on the students arguing (expansion difficult), ran two steps out of breath (low-capacity small concurrent), often ill to leave (SQL constraints too much). Today we will point out a distributed, application expansion is much easier than the database, so the implementation of the principle of the database less work, more work applications  .

  • But do not take full advantage of the abuse index, index notes also consume disk and CPU.
  • Not recommended to use the database function to format the data to the application process.
  • Not recommended to use foreign key constraints to ensure the accuracy of the data with the application.
  • Write Once Read Many small scenes, is not recommended to use a unique index, use the application to ensure uniqueness.
  • Appropriate redundant field, try to create an intermediate table, intermediate results of calculations with the application, space for time.
  • Not allowed to perform extremely time-consuming affairs, with the application split into smaller transactions.
  • Estimated important data sheet (such as order table) and load data growth, optimize advance.

Data table design

type of data

Select principle data types: simple or more smaller footprint.

  • If the length can be satisfied, to make use of an integer tinyint, smallint, medium_int not int.

  • If the string length is determined, using the char type.

  • If varchar meet, without using text type.

  • The use of high precision decimal type, BIGINT may also be used, such as two decimal accuracy multiplied by 100 to save.

  • Try using timestamp instead of datetime.

    Types of Occupy bytes description
    datetime 8 bytes '1000-01-01 00:00:00.000000' to '9999-12-31 23:59:59.999999
    timestamp 4 bytes '1970-01-01 00:00:01.000000' to '2038-01-19 03:14:07.999999'

Compared datetime, timestamp take up less space, the storage zone is automatically converted to UTC time format.

Avoid null

MySQL in the field is still NULL space, will make the index, the index statistics more complex. NULL value is updated to a non-NULL update can not be done in situ from, prone to split affect the performance of the index. The NULL value instead of as a meaningful value can be avoided which contains a SQL statement determination. is not null

Type text optimization

Since the text field to store large amounts of data, table capacity will go up early, affecting the performance of other fields of inquiry. We recommend drawn out on the child table, with associated natural key.

Index Tuning

Index Classification

  1. Ordinary Index: basic index.
  2. Composite index: indexing the plurality of fields, the composite can be accelerated retrieval query.
  3. The only index: Similar to ordinary indexes, but the value of the index columns must be unique, allow nulls.
  4. A combination of a unique index: a combination of column values ​​must be unique.
  5. Primary key index: special unique index, a record for a unique identification data in the table, allow nulls, usually with a primary key constraint.
  6. Full-text index: for mass text queries, MySQL InnoDB and MyISAM after 5.6 support full-text indexing. Because the query precision and scalability poor, more companies choose Elasticsearch.

Index Tuning

  1. Paging query is very important, if the amount of query data exceeds 30%, MYSQL will not use the index.

  2. Single table index number not more than 5, a single index field number no more than five.

  3. String prefix index may be used, the prefix length of the control characters 5-8.

  4. The only field is too low, increase the index does not make sense, such as: whether to remove the gender.

  5. Rational use of a covering index, as follows:

    select loginname, nickname from member where login_name = ?

the Login name, Nick name two fields to establish a composite index, a simple index is faster than login_name

SQL optimization

Batch processing

Bloggers see a child ponds dug a small hole in the drain, the water there are all kinds of floating debris. Duckweed and leaves can always pass the outlet, and will block other objects through the branches, and sometimes get stuck, the need for manual cleaning. MySQL is a fish pond, and the maximum number of concurrent network bandwidth is the outlet, the user SQL is floating. The impact of large amounts of data query or update and delete operations with no paging parameters, are the branches, we want it to break up a batch process, example: Business Description: update users all expired coupons unavailable. SQL statement: update status=0 FROMcoupon if a large number of coupons need to be updated unavailable state, executes the SQL may be blocked other SQL, batch processing of the pseudo-code is as follows: WHERE expire_date <= #{currentDate} and status=1;

int pageNo = 1;
int PAGE_SIZE = 100;
while(true) {
List<Integer> batchIdList = queryList('select id FROM `coupon` WHERE expire_date <= #{currentDate} and status = 1 limit #{(pageNo-1) * PAGE_SIZE},#{PAGE_SIZE}');
if (CollectionUtils.isEmpty(batchIdList)) {
return;
}
update('update status = 0 FROM `coupon` where status = 1 and id in #{batchIdList}')
pageNo ++;
}

Operators <> Optimization

Typically <> operator can not use the index, for example as follows, the query is not the amount of $ 100 orders: select id from orders where amount != 100;the case where the line 100 if the amount is extremely small, which severely uneven distribution of data, it is possible to use the index. Given this uncertainty, the search results using the polymerization union, rewritten as follows:

(select id from orders where amount > 100) union all(select id from orders where amount < 100 and amount > 0)

OR optimization

In Innodb engine or can not use the composite index, such as:

select id,product_name from orders where mobile_no = '13421800407' or user_id = 100;

OR not hit Mobile NO + User ID of the composite index, Union employed, as follows:

(select id,product_name from orders where mobile_no = '13421800407') union(select id,product_name from orders where user_id = 100);

At this point id and product_name field has an index, the query is most efficient.

IN Optimization

  1. IN large main table for small child table, EXIST main table for big kid table. Because the query optimizer escalating, many scenes both performance almost the same thing.

  2. Try instead join query, for example as follows:

    select id from orders where user_id in (select id from user where level = 'VIP');

Using JOIN shown below:

select o.id from orders o left join user u on o.user_id = u.id where u.level = 'VIP';

Do not do the column operation

Typically query the index column operation results in failure, as shown below: Order query day

select id from order where date_format(create_time,'%Y-%m-%d') = '2019-07-01';

date_format function causes the query can not use the index, after rewrite:

select id from order where create_time between '2019-07-01 00:00:00' and '2019-07-01 23:59:59';

Avoid Select all

If you do not query all the columns in the table, to avoid the use , it will be a full table scan, can not effectively use the index. SELECT *

Like Optimization

like a fuzzy query, for example (field indexed):

SELECT column FROM table WHERE field like '%keyword%';

This query misses the index and replaced with the following wording:

SELECT column FROM table WHERE field like 'keyword%';

In addition to the previous query% will hit the index, but the product manager must be fuzzy match before and after it? Full-text indexing fulltext can try, but Elasticsearch is the ultimate weapon.

Join Optimization

Join to achieve is the use of Nested Loop Join algorithm, the result is set by the drive as the basic data table, the data through the node to the next as a filter condition table query data cycle, then combined the results. If multiple join, in front of the result is set as the cyclic data after a re-query the data tables.

  1. Table-driven table and driven increase query as to meet the ON condition and less Where, with little result set to drive large result sets.
  2. Is indexed and join field on the drive table, time can not be indexed, provision of adequate Join Buffer Size.
  3. Prohibit join connect more than three tables, try to increase the redundancy field.

Limit Optimization

When the query for paging limit next turn worse performance, principle solution: Reduce the scan area  , as shown below:

select * from orders order by id desc limit 100000,10 耗时0.4秒select * from orders order by id desc limit 1000000,10耗时5.2秒

First screened ID narrow your search, worded as follows:

select * from orders where id > (select id from orders order by id desc  limit 1000000, 1) order by id desc limit 0,10耗时0.5秒

If the query conditions only the master key ID, worded as follows:

select id from orders where id between 1000000 and 1000010 order by id desc耗时0.3秒

If the above program is still very slow? I had to use the cursor, and interested friends to read JDBC use the cursor implement paging query

Other databases

As a back-end developer, be sure to store proficient as the core of MySQL or SQL Server, but also an active interest in NoSQL database, they have matured and are widely used enough to solve performance bottlenecks in specific scenarios.

classification database characteristic
Key type Memcache For content caching, high load large volumes of data
Key type [Redis](https://redis.io/) For content caching, support more than Memcache data types, and can be persistent data
Columnar storage HBase Hadoop core database system, massive structured data storage, big data necessary.
Document type MongoDb Well-known document database can also be used to cache
Document type CouchDB Apache open source projects, focusing on ease of use, support for REST API
Document type SequoiaDB Well-known document database
Graph Neo4J Map for social networking to build relationships, recommendation systems

Guess you like

Origin www.cnblogs.com/ldsweely/p/12155153.html