Mysql Optimization section summarizes

1. storage engine of choice

Storage engine: Data in MySQL, indexes, and other objects of storage

5.1 is the default storage engine before MyISAM, the default storage engine after 5.1 is Innodb.

difference:

the difference MyISAM Innodb
file format And index data are stored separately, .MYD data, index .MYI And index data are centrally stored .ibd
You can move files Energy, a table on the corresponding .frm, MYD, MYI 3 files No, there are other files associated data
Record storage order Press Save record insertion order Ordered insertion primary key size
Space debris Generating; regular cleaning, using the command table implemented optimize table Does not produce
Affairs not support stand by
Foreign key not support stand by
Lock particles Table-level locking Row-level locking

MyISAM engine simple design, data storage in a compact format, so some scenes very good performance reading.

If no special requirements, you can use the default Innodb.

MyISAM: read-write insert-based applications, such as blog system, news portal.

Innodb: update (delete) the operating frequency is high, or to ensure the integrity of the data; high concurrency, transaction support and foreign keys to ensure data integrity. Such as OA office automation system.

"High Performance Mysql" a book cited many storage engine, but it is strongly recommended to use Innodb

2. field design

- 3 large database design paradigm

  • The first paradigm (each column to ensure atomicity)
  • The second paradigm (to ensure that each list and are associated primary key)
  • Third Normal Form (make sure each column and is directly related to the primary key column, rather than indirectly)

Design paradigm is generally recommended, since normalization to perform operations often make faster. But this is not absolute, paradigm is flawed, usually associated with the query, not only expensive, but also may make some indexing strategy invalid.

- a single table field not too much

  We recommend a maximum of 30 or less

  The more fields can cause performance degradation, and increase the difficulty of development (in a single glance endless field, we develop these earners will suddenly silly out)

- using small, simple data types appropriate

  a. string type

  Use fixed-length char, the use of non-fixed length varchar, and assign the appropriate and sufficient space

  char at the query, trailing spaces will be removed;

  b. decimal type

  Generally you can use float or double, small footprint, but the store may lose precision

  You may store decimal decimal precision, high memory requirements when using decimal financial data or longitude

  c. Date Time

    -- datetime:

    Range: 1001 - 9999

    Storage: 8 bytes of storage, the storage format YYYYMMDDHHMMSS

    Time zone: regardless of the time zone

    -- timestamp:

    Range: 1970 - 2038

    Storage: 4 bytes of storage, memory stored in UTC, the same UNIX timestamp

    Time zone: the current time zone conversion storage, and converted back to the current time zone upon retrieval

    Usually make use of timestamp, since it takes up little space, and time zone conversion occurs automatically, without concern for regional jet lag

    datetime and timestamp only stores the minimum granularity of seconds, the time stamp may be used microsecond storage type BIGINT

  d., and large text data blob

    blob and text for the string data type to store a lot of data and design, but it is generally recommended to avoid using

    MySQL will each blob as separate objects and text processing, will do special handling during storage engine is stored, when the value is too large, the use of specialized innoDB external storage area for storing, the line memory pointer, then the actual value of the external storage. These will lead to serious performance overhead

- Try to set a column as NOT NULL

  a. may occupy more storage space of the column is NULL

  b. columns may be NULL, and using the index value comparison, mySQL require special handling, some performance loss

  Recommendation: usually best designated as NOT NULL, NULL value unless it really needs to be stored

- Try to use a primary key integer

  a. integer type identifier column typically the best choice, because they can be used quickly and AUTO_INCREMENT

  b. should be avoided as the identification string type column, because they are space consuming, and typically slower than the digital type

  c. For complete "random" string also need to pay more attention. For example: MD5 (), SHAI () or the UUID () generated string. Function generates new values ​​are also randomly distributed over a large space, which can lead to a number of SELECT statements and INSERT very slow

3. Index

- Why use an index fast

  Index with respect to the data itself, a small amount of data

  The index is ordered, you can quickly determine the location of the data

  Innodb represents an index table organization, the data distribution table in accordance with the primary sort key

- storage structure index

  a.B + tree

  b. Hash (Configuration of key-value pairs)

  MySQL is the primary key index with a B + tree structure can be selected non-primary key index B + tree or hash

  Is generally recommended to use B + tree index, because more hash indexes Cons: can not be used to sort, query range can not be used when the data is large, there may be a large number of hash collisions, inefficient

- type of the index

  Action by Category:

    1. primary key index

    2. Ordinary Index: is not particularly limited, allowing duplicate values

    3. unique index: do not allow duplicate values, slightly faster than the average index

    4. The full-text index: match as a full-text search, but basically do not have access, only English word index, and operating a high price

  Data storage structure by Category:

    1. clustered index

    The same sequence of physical row and column values ​​of data (that is typically the primary key column) of the logical order, a table can have only one clustered index: defined.

    Primary key index clustered index is stored and the sequence order of the data is the same primary key

    2. Non-clustered index

    Definition: The logical order of the index index and the physical disk storage order different uplink, a table can have multiple non-clustered index.

    Index other than the clustering index is non-clustered index, broken down into the general index, the only index, full-text indexing, they are also called secondary indexes.

  

    Stored in the leaf node is the primary key index "row pointer", directly to the data line of the physical file.

    Secondary index leaf node is stored in the primary key value

  Cover index: available directly from the primary key index for direct return without index data table

  such as:

    Suppose there is a table t (clo1, clo2) multi-column index

    select  clo1,clo2  from  t  where  clo = 1;

    So, use this sql queries, data can be obtained directly from (clo1, clo2) index tree without having to query back to the table, so we only need to write as much as possible after select the necessary query field, in order to increase the chances of index covering.

  Multi-column index: using a plurality of columns as an index, such as (clo1, clo2)

    Usage scenarios: When a query is often used clo1 and clo2 as a query, you can use the composite index, this index will be faster than the single-column index

    It should be noted that the use of multi-column index follows the principles of the leftmost index

    Suppose you create a multiple column index index (A, B, C), it is the equivalent of three composite index created as follows:

    1.index(A,B,C)

    2.index(A,B)

    3.index(A)

    This is the leftmost index principle that the best combination from the left.

- The Index Tuning

  1. The index is not possible, the index is the need for maintenance costs

  2. On the Connection field should be indexed

  3 to select a high-sensitive column as an index, the discrimination count (distinct col) / count (*) indicates the ratio of the field does not overlap, the larger the ratio of the number of the recording scans, the less the value of the low state, gender and other discrimination field the field is not indexed for

  4. AND several fields often while to appear on the Where clause, you can create a composite index, or consider single-field index

  The calculated into the service level rather than a database layer

  6. If there is order by, group by scene, note the use of the orderliness of the index.

    The final order by the field is part of the composite index, and the index on a combination of final order, to avoid the appearance file_sort affect query performance

    For example, the statement where a =? And b =? Order by c, may establish joint index (a, b, c).

    The last field is the order by a part of the composite index, and the index of the combination in order finally, to avoid file_sort (external sort) occurs, query performance

    For statements where a =? And b =? Order by c, may establish joint index (a, b, c).

    If there is an index to find the range, then the index can not be orderly use, such as WHERE a> 10 ORDER BY b; index (a, b) can not sort.

- may lead to unusable indexes

  1.is null 和 is not null

  2.! = And <> (available in place)

  3. "Dependent Columns": index as part of an expression or function of the parameters of

    For example: the part of the expression: select +1 = 5 function parameter id from t where id: select id from t where to_days (date_clo)> = 10

  4.like begin with a query%

  5.or (or both sides of the columns of the index can be used to establish an index)

  6. Type inconsistent

    If the column is a string type, the incoming condition that must be enclosed in quotes, or can not use the index

    select * from tb1 where email = 999;

4. sql optimization tips

  1. First look at the sql execution order, so that we better optimization

    (1) FROM: data loaded from the hard disk into the data buffer for the next data to facilitate operation

    (2) ON: join on multi-table join query, first screened on the conditions, then the connection table

    (3) JOIN: the join on both sides of the table in accordance with the connection conditions

    (4) WHERE: selecting tuples satisfying the condition from a base table or view

    (5) GROUP BY: grouping, and aggregation function together with the general

    (6) HAVING: screening on the basis of tuples elect qualifying tuples (must be used in conjunction with GROUP BY)

    (7) SELECT: Query to get all tuples which columns require listed

    (8) DISTINCT: de-duplication

    (9) UNION: merge multiple query results

    (10) ORDER BY: sorting the corresponding

    (11) LIMIT: a display output data record

      join on multi-table join queries, it is recommended that ways be multi-table queries, do not use subqueries (subquery will create a temporary table, loss performance)

      Avoid using HAVING filter the data, but rather where

      order by indexing the back of the field, using the sort order of the index, to avoid external sorting

      If you know exactly only one result is returned, limit 1 can improve efficiency

  2. The table is best not to join more than three

  3. Avoid SELECT *, read more data from the database, then the query becomes slower
  4. NOT NULL column to use as much as possible, can occupy additional space for NULL column, and the value in use and compare require special handling when indexing, performance impact

  5. exists, not exists, and in, not in another alternative

    Principle which is the result of a small sub-set produced by the query, which we chose

    select *  from  t1  where  x in  (select y from t2)

    select  *  from  t1  where  exists  ( select  null  from  t2  where  y = x )

    IN is suitable for large and small outer inner case; existx suitable for small and large outer inner case

  6. Instead of using distinct exists

    When submitting a to-many table information (such as department table and Crew) query that contains, in the select clause to avoid using distinct, in general, consider using exists in place to make inquiries more quickly, because the conditions subquery once meet , immediately returns the result

    Inefficient wording:

      select  distinct  dept_no,dept_name  from  dept d,emp e  where  d.dept_no=e.dept_no

    Efficient writing:

      select dept_no, dept_name from dept d where exists (select 'x' from emp e where e.dept_no = d.dept_no) NOTE: wherein x means: because exists just look subquery result was returned, without regard to return What, it is recommended to write a constant, high performance! Indeed may be replaced with exists distinct, but the above embodiment is a case where only dept_no unique primary key, if you want to remove the duplicates, necessary to refer to the following wording:

      * from EMP WHERE dept_no SELECT EXISTS (SELECT Max (dept_no)) from Dept D, E EMP WHERE e.dept_no = d.dept_no by d.dept_no Group)  

  7. The implicit data type to avoid the conversion of
    implicit data type conversion can not be applied index, resulting in a full table scan! phonenumber field type varchar t_tablename table is
    the following code does not meet specifications: 
      SELECT column1 INTO i_l_variable1 from t_tablename WHERE phonenumber = 18,519,722,169;
    should be written as follows:
      SELECT column1 INTO i_lvariable1 from t_tablename WHERE phonenumber = '18,519,722,169';
  8. The segmented query
    in some queries page, when a user selected time range is too big, causing slow queries. The main reason is the excessive number of scanning lines. This time can be programmed, segmentation query, loop through the results consolidation process on display.
5.explain analysis of the implementation plan
  explain shows how to use the index to handle mysql select statement and connection table. You can help choose a better index and write more optimized queries.
  Example:
    EXPLAIN SELECT from the mysql.user User;
  
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
  | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra

    +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
     | 1 | SIMPLE | user | NULL | index | NULL | PRIMARY | 276 | NULL | 6 | 100.00 | Using index |
    +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

    

Identifier meaning
id
select an identifier; it is a select query sequence number
select_type

select type:

simple, simple select (not using union and sub-query)

primary, any complex query contains sub-portion, the outermost layer is labeled PRIMARY select

union, union or the second select statement later

DEPENDENT UNION: generally sub-second query select statement (depending on the outer query, mysql also some internal optimization)

UNION RESULT: union results

SUBQUERY: subquery first select

DEPENDENT SUBQUERY: subquery first select, depending on the outer query (mysql optimization will be some, some dependent optimizes directly into simple)

DERIVED: select derived tables (from clause subqueries)

table 
 Query and display data from the table which is sometimes not the real name of the table (virtual table), the last one is the virtual table number that represents how much of id
type  

The field value more, I will just focus on a few fields we have developed frequently used: system, const, eq_ref, ref, range, index, all;

Performance from good to bad were: == system> const> eq_ref> ref> range> index> all == (be sure to keep in mind)

system: table has only one row, this is a special case const, generally does not appear to be ignored

const: represented by the index once found, const for comparing the primary key or unique index. Because only one row of data matching, so soon

eq_ref: unique index scans, only one record in the table match. Usually two tables associated conditions related field is the primary key or unique index

ref: non-unique index scan line, returns all rows matching a single value

range: retrieving the given range of rows, the general conditions appear in the query>, <, in, between the query and the like

index: traversing the index tree. Usually faster than ALL, because the index file is usually smaller than the data file. all are read and index the whole table, but the index is retrieved from the index, but all is retrieved from the hard disk.

all: traverse the whole table to find matching rows

 
possible_keys
 Application of this index may be displayed in the table, but not necessarily the actual query
 
key
 Index of actual use, if not display null
 
key_len 
 表示索引中使用的字节数,可通过该列计算查询中使用的索引的长度。一般来说,索引长度越长表示精度越高,效率偏低;长度越短,效率高,但精度就偏低。并不是真正使用索引的长度,是个预估值。
 
ref
 表示哪一列被使用了,常数表示这一列等于某个常数。
 
rows 
 大致找到所需记录需要读取的行数
 
filtered 
 表示选取的行和读取的行的百分比,100表示选取了100%,80表示读取了80%。
 
Extra   

 一些重要的额外信息

  • Using filesort:使用外部的索引排序,而不是按照表内的索引顺序进行读取。(一般需要优化)
  • Using temporary:使用了临时表保存中间结果。常见于排序order by和分组查询group by(最好优化)
  • Using index:表示select语句中使用了覆盖索引,直接冲索引中取值,而不需要回行(从磁盘中取数据)
  • Using where:使用了where过滤
  • Using index condition:5.6之后新增的,表示查询的列有非索引的列,先判断索引的条件,以减少磁盘的IO
  • Using join buffer:使用了连接缓存
  • impossible where:where子句的值总是false

Guess you like

Origin www.cnblogs.com/occl/p/11317047.html