Mysql Optimization section summarizes

1. storage engine of choice

Storage engine: Data in MySQL, indexes, and other objects of storage

5.1 is the default storage engine before MyISAM, the default storage engine after 5.1 is Innodb.

difference:

the difference	MyISAM	Innodb
file format	And index data are stored separately, .MYD data, index .MYI	And index data are centrally stored .ibd
You can move files	Energy, a table on the corresponding .frm, MYD, MYI 3 files	No, there are other files associated data
Record storage order	Press Save record insertion order	Ordered insertion primary key size
Space debris	Generating; regular cleaning, using the command table implemented optimize table	Does not produce
Affairs	not support	stand by
Foreign key	not support	stand by
Lock particles	Table-level locking	Row-level locking

MyISAM engine simple design, data storage in a compact format, so some scenes very good performance reading.

If no special requirements, you can use the default Innodb.

MyISAM: read-write insert-based applications, such as blog system, news portal.

Innodb: update (delete) the operating frequency is high, or to ensure the integrity of the data; high concurrency, transaction support and foreign keys to ensure data integrity. Such as OA office automation system.

"High Performance Mysql" a book cited many storage engine, but it is strongly recommended to use Innodb

2. field design

- 3 large database design paradigm

The first paradigm (each column to ensure atomicity)
The second paradigm (to ensure that each list and are associated primary key)
Third Normal Form (make sure each column and is directly related to the primary key column, rather than indirectly)

Design paradigm is generally recommended, since normalization to perform operations often make faster. But this is not absolute, paradigm is flawed, usually associated with the query, not only expensive, but also may make some indexing strategy invalid.

- a single table field not too much

　　We recommend a maximum of 30 or less

　　The more fields can cause performance degradation, and increase the difficulty of development (in a single glance endless field, we develop these earners will suddenly silly out)

- using small, simple data types appropriate

　　a. string type

　　Use fixed-length char, the use of non-fixed length varchar, and assign the appropriate and sufficient space

　　char at the query, trailing spaces will be removed;

　　b. decimal type

　　Generally you can use float or double, small footprint, but the store may lose precision

　　You may store decimal decimal precision, high memory requirements when using decimal financial data or longitude

　　c. Date Time

　　　　-- datetime:

　　　　Range: 1001 - 9999

　　　　Storage: 8 bytes of storage, the storage format YYYYMMDDHHMMSS

　　　　Time zone: regardless of the time zone

　　　　-- timestamp:

　　　　Range: 1970 - 2038

　　　　Storage: 4 bytes of storage, memory stored in UTC, the same UNIX timestamp

　　　　Time zone: the current time zone conversion storage, and converted back to the current time zone upon retrieval

　　　　Usually make use of timestamp, since it takes up little space, and time zone conversion occurs automatically, without concern for regional jet lag

　　　　datetime and timestamp only stores the minimum granularity of seconds, the time stamp may be used microsecond storage type BIGINT

　　d., and large text data blob

　　　　blob and text for the string data type to store a lot of data and design, but it is generally recommended to avoid using

　　　　MySQL will each blob as separate objects and text processing, will do special handling during storage engine is stored, when the value is too large, the use of specialized innoDB external storage area for storing, the line memory pointer, then the actual value of the external storage. These will lead to serious performance overhead

- Try to set a column as NOT NULL

　　a. may occupy more storage space of the column is NULL

　　b. columns may be NULL, and using the index value comparison, mySQL require special handling, some performance loss

　　Recommendation: usually best designated as NOT NULL, NULL value unless it really needs to be stored

- Try to use a primary key integer

　　a. integer type identifier column typically the best choice, because they can be used quickly and AUTO_INCREMENT

　　b. should be avoided as the identification string type column, because they are space consuming, and typically slower than the digital type

　　c. For complete "random" string also need to pay more attention. For example: MD5 (), SHAI () or the UUID () generated string. Function generates new values are also randomly distributed over a large space, which can lead to a number of SELECT statements and INSERT very slow

3. Index

- Why use an index fast

　　Index with respect to the data itself, a small amount of data

　　The index is ordered, you can quickly determine the location of the data

　　Innodb represents an index table organization, the data distribution table in accordance with the primary sort key

- storage structure index

　　a.B + tree

　　b. Hash (Configuration of key-value pairs)

　　MySQL is the primary key index with a B + tree structure can be selected non-primary key index B + tree or hash

　　Is generally recommended to use B + tree index, because more hash indexes Cons: can not be used to sort, query range can not be used when the data is large, there may be a large number of hash collisions, inefficient

- type of the index

　　Action by Category:

　　　　1. primary key index

　　　　2. Ordinary Index: is not particularly limited, allowing duplicate values

　　　　3. unique index: do not allow duplicate values, slightly faster than the average index

　　　　4. The full-text index: match as a full-text search, but basically do not have access, only English word index, and operating a high price

　　Data storage structure by Category:

　　　　1. clustered index

　　　　The same sequence of physical row and column values of data (that is typically the primary key column) of the logical order, a table can have only one clustered index: defined.

　　　　Primary key index clustered index is stored and the sequence order of the data is the same primary key

　　　　2. Non-clustered index

　　　　Definition: The logical order of the index index and the physical disk storage order different uplink, a table can have multiple non-clustered index.

　　　　Index other than the clustering index is non-clustered index, broken down into the general index, the only index, full-text indexing, they are also called secondary indexes.

　　　　Stored in the leaf node is the primary key index "row pointer", directly to the data line of the physical file.

　　　　Secondary index leaf node is stored in the primary key value

　　Cover index: available directly from the primary key index for direct return without index data table

　　such as:

　　　　Suppose there is a table t (clo1, clo2) multi-column index

　　　　select clo1,clo2 from t where clo = 1;

　　　　So, use this sql queries, data can be obtained directly from (clo1, clo2) index tree without having to query back to the table, so we only need to write as much as possible after select the necessary query field, in order to increase the chances of index covering.

　　Multi-column index: using a plurality of columns as an index, such as (clo1, clo2)

　　　　Usage scenarios: When a query is often used clo1 and clo2 as a query, you can use the composite index, this index will be faster than the single-column index

　　　　It should be noted that the use of multi-column index follows the principles of the leftmost index

　　　　Suppose you create a multiple column index index (A, B, C), it is the equivalent of three composite index created as follows:

　　　　1.index(A,B,C)

　　　　2.index(A,B)

　　　　3.index(A)

　　　　This is the leftmost index principle that the best combination from the left.

- The Index Tuning

　　1. The index is not possible, the index is the need for maintenance costs

　　2. On the Connection field should be indexed

　　3 to select a high-sensitive column as an index, the discrimination count (distinct col) / count (*) indicates the ratio of the field does not overlap, the larger the ratio of the number of the recording scans, the less the value of the low state, gender and other discrimination field the field is not indexed for

　　4. AND several fields often while to appear on the Where clause, you can create a composite index, or consider single-field index

　　The calculated into the service level rather than a database layer

　　6. If there is order by, group by scene, note the use of the orderliness of the index.

　　　　The final order by the field is part of the composite index, and the index on a combination of final order, to avoid the appearance file_sort affect query performance

　　　　For example, the statement where a =? And b =? Order by c, may establish joint index (a, b, c).

　　　　The last field is the order by a part of the composite index, and the index of the combination in order finally, to avoid file_sort (external sort) occurs, query performance

　　　　For statements where a =? And b =? Order by c, may establish joint index (a, b, c).

　　　　If there is an index to find the range, then the index can not be orderly use, such as WHERE a> 10 ORDER BY b; index (a, b) can not sort.

- may lead to unusable indexes

　　1.is null 和 is not null

　　2.! = And <> (available in place)

　　3. "Dependent Columns": index as part of an expression or function of the parameters of

　　　　For example: the part of the expression: select +1 = 5 function parameter id from t where id: select id from t where to_days (date_clo)> = 10

　　4.like begin with a query%

　　5.or (or both sides of the columns of the index can be used to establish an index)

　　6. Type inconsistent

　　　　If the column is a string type, the incoming condition that must be enclosed in quotes, or can not use the index

　　　　select * from tb1 where email = 999;

4. sql optimization tips

　　1. First look at the sql execution order, so that we better optimization

　　　　(1) FROM: data loaded from the hard disk into the data buffer for the next data to facilitate operation

　　　　(2) ON: join on multi-table join query, first screened on the conditions, then the connection table

　　　　(3) JOIN: the join on both sides of the table in accordance with the connection conditions

　　　　(4) WHERE: selecting tuples satisfying the condition from a base table or view

　　　　(5) GROUP BY: grouping, and aggregation function together with the general

　　　　(6) HAVING: screening on the basis of tuples elect qualifying tuples (must be used in conjunction with GROUP BY)

　　　　(7) SELECT: Query to get all tuples which columns require listed

　　　　(8) DISTINCT: de-duplication

　　　　(9) UNION: merge multiple query results

　　　　(10) ORDER BY: sorting the corresponding

　　　　(11) LIMIT: a display output data record

　　　　　　join on multi-table join queries, it is recommended that ways be multi-table queries, do not use subqueries (subquery will create a temporary table, loss performance)

　　　　　　Avoid using HAVING filter the data, but rather where

　　　　　　order by indexing the back of the field, using the sort order of the index, to avoid external sorting

　　　　　　If you know exactly only one result is returned, limit 1 can improve efficiency

　　2. The table is best not to join more than three

　　3. Avoid SELECT *, read more data from the database, then the query becomes slower
　　4. NOT NULL column to use as much as possible, can occupy additional space for NULL column, and the value in use and compare require special handling when indexing, performance impact

　　5. exists, not exists, and in, not in another alternative

　　　　Principle which is the result of a small sub-set produced by the query, which we chose

　　　　select * from t1 where x in (select y from t2)

　　　　select * from t1 where exists ( select null from t2 where y = x )

　　　　IN is suitable for large and small outer inner case; existx suitable for small and large outer inner case

　　6. Instead of using distinct exists

　　　　When submitting a to-many table information (such as department table and Crew) query that contains, in the select clause to avoid using distinct, in general, consider using exists in place to make inquiries more quickly, because the conditions subquery once meet , immediately returns the result

　　　　Inefficient wording:

　　　　　　select distinct dept_no,dept_name from dept d,emp e where d.dept_no=e.dept_no

　　　　Efficient writing:

　　　　　　select dept_no, dept_name from dept d where exists (select 'x' from emp e where e.dept_no = d.dept_no) NOTE: wherein x means: because exists just look subquery result was returned, without regard to return What, it is recommended to write a constant, high performance! Indeed may be replaced with exists distinct, but the above embodiment is a case where only dept_no unique primary key, if you want to remove the duplicates, necessary to refer to the following wording:

　　　　　　* from EMP WHERE dept_no SELECT EXISTS (SELECT Max (dept_no)) from Dept D, E EMP WHERE e.dept_no = d.dept_no by d.dept_no Group)　　

　　7. The implicit data type to avoid the conversion of 
　　　　implicit data type conversion can not be applied index, resulting in a full table scan! phonenumber field type varchar t_tablename table is 
　　　　the following code does not meet specifications:　
　　　　　　SELECT column1 INTO i_l_variable1 from t_tablename WHERE phonenumber = 18,519,722,169; 
　　　　should be written as follows: 
　　　　　　SELECT column1 INTO i_lvariable1 from t_tablename WHERE phonenumber = '18,519,722,169'; 
　　8. The segmented query 
　　　　in some queries page, when a user selected time range is too big, causing slow queries. The main reason is the excessive number of scanning lines. This time can be programmed, segmentation query, loop through the results consolidation process on display. 
5.explain analysis of the implementation plan 
　　explain shows how to use the index to handle mysql select statement and connection table. You can help choose a better index and write more optimized queries. 
　　Example: 
　　　　EXPLAIN SELECT from the mysql.user User;
　　+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
　　| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra

　　 +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
　　 | 1 | SIMPLE | user | NULL | index | NULL | PRIMARY | 276 | NULL | 6 | 100.00 | Using index |
　　 +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

Identifier	meaning
id	select an identifier; it is a select query sequence number
select_type	select type: simple, simple select (not using union and sub-query) primary, any complex query contains sub-portion, the outermost layer is labeled PRIMARY select union, union or the second select statement later DEPENDENT UNION: generally sub-second query select statement (depending on the outer query, mysql also some internal optimization) UNION RESULT: union results SUBQUERY: subquery first select DEPENDENT SUBQUERY: subquery first select, depending on the outer query (mysql optimization will be some, some dependent optimizes directly into simple) DERIVED: select derived tables (from clause subqueries)
table	Query and display data from the table which is sometimes not the real name of the table (virtual table), the last one is the virtual table number that represents how much of id
type	The field value more, I will just focus on a few fields we have developed frequently used: system, const, eq_ref, ref, range, index, all; Performance from good to bad were: == system> const> eq_ref> ref> range> index> all == (be sure to keep in mind) system: table has only one row, this is a special case const, generally does not appear to be ignored const: represented by the index once found, const for comparing the primary key or unique index. Because only one row of data matching, so soon eq_ref: unique index scans, only one record in the table match. Usually two tables associated conditions related field is the primary key or unique index ref: non-unique index scan line, returns all rows matching a single value range: retrieving the given range of rows, the general conditions appear in the query>, <, in, between the query and the like index: traversing the index tree. Usually faster than ALL, because the index file is usually smaller than the data file. all are read and index the whole table, but the index is retrieved from the index, but all is retrieved from the hard disk. all: traverse the whole table to find matching rows
possible_keys	Application of this index may be displayed in the table, but not necessarily the actual query
key	Index of actual use, if not display null
key_len	表示索引中使用的字节数，可通过该列计算查询中使用的索引的长度。一般来说，索引长度越长表示精度越高，效率偏低；长度越短，效率高，但精度就偏低。并不是真正使用索引的长度，是个预估值。
ref	表示哪一列被使用了，常数表示这一列等于某个常数。
rows	大致找到所需记录需要读取的行数
filtered	表示选取的行和读取的行的百分比，100表示选取了100%，80表示读取了80%。
Extra	一些重要的额外信息 Using filesort：使用外部的索引排序，而不是按照表内的索引顺序进行读取。（一般需要优化） Using temporary：使用了临时表保存中间结果。常见于排序order by和分组查询group by（最好优化） Using index：表示select语句中使用了覆盖索引，直接冲索引中取值，而不需要回行（从磁盘中取数据） Using where：使用了where过滤 Using index condition：5.6之后新增的，表示查询的列有非索引的列，先判断索引的条件，以减少磁盘的IO Using join buffer：使用了连接缓存 impossible where：where子句的值总是false

Mysql Optimization section summarizes

Guess you like