SQL performance optimization, wife wife wife too useful

Foreword

This article is aimed primarily relational data in the database MySql.

Mysql sort out briefly the basic concepts of the next, and then create sub-optimization of these two phases of expansion when and queries.

1 Introduction to Fundamental Concepts

1.1 Logical Architecture

 

  • First layer: client through connection service, sql instruction to be executed transmitted by
  • Second layer: server parsing and optimization sql, generate the final implementation plan and execute
  • Third layer: storage engine, is responsible for data storage and retrieval

1.2 Lock

Database to resolve concurrency scenarios by locking mechanism - a shared lock (read lock) and exclusive lock (write lock). Read lock is not blocked, multiple clients can read from the same resource at the same time. Write lock is exclusive, and will block other read and write locks. Simply put under optimistic and pessimistic locking.

  • Optimistic locking , typically used for data less competitive scene, reading and writing less, achieved by the version number and time stamp.
  • Pessimistic locking , typically used for data intense competition scene, each operation will lock the data.

To lock the data requires a certain locking strategy to cope with.

  • Table locks , lock the entire table, minimal overhead, but competition will intensify lock.
  • Row lock , row level locking, the most expensive, but the greatest degree of support for concurrency.

However, the storage engine MySql real implementation is not a simple row-level locking, usually to achieve a multi-version concurrency control (MVCC). MVCC is a variant of row-level locks, locking operation is avoided in most cases, lower cost. MVCC snapshot is achieved through a point in time to save data.

1.3 Transaction

Transactional guarantees atomic set of operations, either all succeed, or all fail. Should it fail, all operations before the rollback. MySql automatic submit, if not explicitly start a transaction, each query as a transaction.

Isolation levels control the modification of a transaction, which is visible and between transactions within a transaction. Four common isolation levels:

  • Uncommitted Read (Read UnCommitted), modify the transaction, even if not submitted to other matters also visible. Transaction may read uncommitted data, resulting in dirty read.
  • Read Committed (Read Committed), when a transaction begins, only to see the firm has been submitted modifications to do. Before uncommitted transactions, changes made to other transactions is not visible. Also known as non-repeatable read, I read many times the same record the same transaction may be different.
  • Repeatable read the same time (RepeatTable Read), reading the results of the same record multiple times results in the same transaction.
  • Serializable (Serializable), highest level of isolation, force the transaction serial execution.

1.4 Storage Engine

InnoDB engine, the most important and most widely used storage engine. It is designed to handle a large number of short-term transactions, having characteristics of high performance and automatic crash recovery.

MyISAM engine does not support transactions and row-level locking, security can not be restored after a crash.

2 Create optimization

2.1 Schema data types and optimization

Integer

TinyInt, SmallInt, MediumInt, Int, BigInt 8,16,24,32,64 bits stored using storage space. Unsigned if not allowed to use negative numbers, you can make a positive number on line doubling.

Real

  • Float, Double, support approximate floating-point operations.
  • Decimal, for storing the decimal precision.

String

  • VarChar, storing variable-length strings. It requires 1 or 2 additional record length byte strings.
  • Char, fixed length, for storing fixed-length character string, such as MD5 value.
  • Blob, Text to store a lot of data and design. Respectively, and by way of binary characters.

Time Type

  • The DateTime, save a large range of values, 8 bytes.
  • TimeStamp, recommended the same UNIX timestamp, 4 bytes.

Optimization tips point

  • To make use of the corresponding data type. For example, do not save time with a string type, with integral save IP.
  • Choose smaller data types. TinyInt can not Int.
  • Identity column (identifier column), recommend the use of integer, string type is not recommended, take up more space, and computing speed slower than integer.
  • ORM is not recommended automatically generated Schema, generally have not focused on data type, use a large VarChar type, index irrational use and other issues.
  • Real scene mix paradigm and counter-paradigm. Redundant high query efficiency, low insertion update efficiency; low insertion redundant high update efficiency, low query efficiency.
  • Create a fully independent summary \ cache table, the timing to generate data, the user takes a long time for the operation. For high accuracy requirements summary operations, you can use the historical record of the results of the latest results + to achieve the purpose of fast query.
  • Data migration, the upgrade process table can use the shadow table, the table name by modifying the original table, to the preservation of historical data, while not affecting the purpose of the new table to use.

2.2 Index

Index value comprising one or more columns. MySql only the most efficient use of the prefix left column of the index. Index of advantages:

  • Reduce the amount of data query scans
  • Avoid ordering and table zero
  • The order becomes random IO IO (IO efficiency is higher than random order IO)

B-Tree

Most used type of index. B-Tree data structure employed to store data (each leaf node contains a pointer to the next leaf node, thereby facilitating traversal of leaf nodes). B-Tree index applicable to the whole key, key range, key prefix lookup, supports sorting.

B-Tree indexing restrictions:

  • If it is not in accordance with the leftmost column index began to query, you can not use the index.
  • You can not skip columns in the index. If you use the first column and the third column index, you can only use the first column of the index.
  • If there is a range query query, all the columns to its right can not use indexes to optimize queries.

Hash indexes

All columns that only an exact match of the index, the query is valid. All storage engine will calculate a hash code index columns, all the hash index hash code stored in the index, and stores a pointer to each data line.

Hash indexes restrictions:

  • It can not be used for sorting
  • It does not support partial matches
  • The only support the equivalent query =, IN (), does not support <>

Optimization tips point

  • Note that the scope and limitations of each applicable index.
  • If the column index is part of the expression or function of the parameters, then the failure.
  • For particularly long string prefix index may be used to select the appropriate prefix length with selectivity index.
  • When using a multi-column index, can be connected by AND and OR grammar.
  • Duplicate index is not necessary, such as (A, B), and (A) is repeated.
  • When the index is particularly effective in conditions where queries and group by syntax of the query.
  • The scope of the inquiry on the conditions of the last query, the query scope to prevent problems caused by the failure of the right index.
  • Index is best not to select the string is too long, and the index column should not be null.

3 optimized query

Three important indicators of the quality of the inquiry 3.1

  • Response time (service time, queue time)
  • Scanning line
  • Return line

3.2 query optimization point

  • Avoid queries unrelated columns, such as the use Select * to return all columns.
  • Avoid row of the query-independent
  • Segmentation query. The task of a larger pressure on the server, to decompose in a long time, and in multiple execution. To delete ten thousand data points can be executed 10 times, each time after the completion of the implementation period of pause, and then continue. Process server resources can be released to other tasks.
  • Decomposition associated with the query. The first multi-table queries associated with the query, decomposed into multiple single-table queries. Can reduce lock contention, query efficiency of the query itself is relatively high. MySql because the connection and disconnection operations are lightweight, not because the query is split into several times, resulting in efficiency.
  • Note that the operation can only count statistics is not null column, so the total count of the number of lines to use count (*).
  • group by column in accordance with the identification of high packing efficiency, the results of the packet should travel beyond the column grouping columns.
  • Relational query associated delay, can be queried to narrow their search criteria based, reassociation.
  • Limit page optimization. Scanning can be covered according to the index, and then query other columns associated itself according to the index column. Such as
 
SELECT id, NAME, ageWHERE student s1INNER JOIN ( SELECT     id FROM     student ORDER BY     age LIMIT 50,5) AS s2 ON s1.id = s2.id
  • Union queries to default weight, if not the business must be recommended more efficient use of Union All

to add on

From the Great God - Andy

  1. Table Structure Type field types and conditions are inconsistent, mysql added automatically transfer function, resulting in the index as a function of the parameters of the failure.

2.like not enter a query in the previous section, beginning with% could not hit the index.

  1. Complement two new features of version 5.7:

generated column, the column is calculated from the database of the other columns to give

 
CREATE TABLE triangle (sidea DOUBLE, sideb DOUBLE, area DOUBLE AS (sidea * sideb / 2));insert into triangle(sidea, sideb) values(3, 4);select * from triangle;
 
+-------+-------+------+| sidea | sideb | area |+-------+-------+------+|   3      |   4      |  6     |+-------+-------+------+

Supports JSON format data, and provide built-in functions

 
CREATE TABLE json_test (name JSON);INSERT INTO json_test VALUES('{"name1": "value1", "name2": "value2"}');SELECT * FROM json_test WHERE JSON_CONTAINS(name, '$.name1');

Experts from the JVM - up

Concerned about the use explain in Performance Analysis

 
EXPLAIN SELECT settleId FROM Settle WHERE settleId = "3679"

 

  • select_type,有几种值:simple(表示简单的 select,没有 union 和子查询),primary(有子查询,最外面的 select 查询就是 primary),union(union 中的第二个或随后的 select 查询,不依赖外部查询结果),dependent union(union 中的第二个或随后的 select 查询,依赖外部查询结果)
  • type,有几种值:system(表仅有一行(= 系统表),这是 const 连接类型的一个特例),const(常量查询), ref(非唯一索引访问,只有普通索引),eq_ref(使用唯一索引或组件查询),all(全表查询),index(根据索引查询全表),range(范围查询)
  • possible_keys: 表中可能帮助查询的索引
  • key,选择使用的索引
  • key_len,使用的索引长度
  • rows,扫描的行数,越大越不好
  • extra,有几种值:Only index(信息从索引中检索出,比扫描表快),where used(使用 where 限制),Using filesort (可能在内存或磁盘排序),Using temporary(对查询结果排序时使用临时表)

(完)

发布了202 篇原创文章 · 获赞 41 · 访问量 1万+

Guess you like

Origin blog.csdn.net/Sqdmn/article/details/104451081