1. storage engine of choice
Storage engine: Data in MySQL, indexes, and other objects of storage
5.1 is the default storage engine before MyISAM, the default storage engine after 5.1 is Innodb.
difference:
the difference | MyISAM | Innodb |
file format | And index data are stored separately, .MYD data, index .MYI | And index data are centrally stored .ibd |
You can move files | Energy, a table on the corresponding .frm, MYD, MYI 3 files | No, there are other files associated data |
Record storage order | Press Save record insertion order | Ordered insertion primary key size |
Space debris | Generating; regular cleaning, using the command table implemented optimize table | Does not produce |
Affairs | not support | stand by |
Foreign key | not support | stand by |
Lock particles | Table-level locking | Row-level locking |
MyISAM engine simple design, data storage in a compact format, so some scenes very good performance reading.
If no special requirements, you can use the default Innodb.
MyISAM: read-write insert-based applications, such as blog system, news portal.
Innodb: update (delete) the operating frequency is high, or to ensure the integrity of the data; high concurrency, transaction support and foreign keys to ensure data integrity. Such as OA office automation system.
"High Performance Mysql" a book cited many storage engine, but it is strongly recommended to use Innodb
2. field design
- 3 large database design paradigm
- The first paradigm (each column to ensure atomicity)
- The second paradigm (to ensure that each list and are associated primary key)
- Third Normal Form (make sure each column and is directly related to the primary key column, rather than indirectly)
Design paradigm is generally recommended, since normalization to perform operations often make faster. But this is not absolute, paradigm is flawed, usually associated with the query, not only expensive, but also may make some indexing strategy invalid.
- a single table field not too much
We recommend a maximum of 30 or less
The more fields can cause performance degradation, and increase the difficulty of development (in a single glance endless field, we develop these earners will suddenly silly out)
- using small, simple data types appropriate
a. string type
Use fixed-length char, the use of non-fixed length varchar, and assign the appropriate and sufficient space
char at the query, trailing spaces will be removed;
b. decimal type
Generally you can use float or double, small footprint, but the store may lose precision
You may store decimal decimal precision, high memory requirements when using decimal financial data or longitude
c. Date Time
-- datetime:
Range: 1001 - 9999
Storage: 8 bytes of storage, the storage format YYYYMMDDHHMMSS
Time zone: regardless of the time zone
-- timestamp:
Range: 1970 - 2038
Storage: 4 bytes of storage, memory stored in UTC, the same UNIX timestamp
Time zone: the current time zone conversion storage, and converted back to the current time zone upon retrieval
Usually make use of timestamp, since it takes up little space, and time zone conversion occurs automatically, without concern for regional jet lag
datetime and timestamp only stores the minimum granularity of seconds, the time stamp may be used microsecond storage type BIGINT
d., and large text data blob
blob and text for the string data type to store a lot of data and design, but it is generally recommended to avoid using
MySQL will each blob as separate objects and text processing, will do special handling during storage engine is stored, when the value is too large, the use of specialized innoDB external storage area for storing, the line memory pointer, then the actual value of the external storage. These will lead to serious performance overhead
- Try to set a column as NOT NULL
a. may occupy more storage space of the column is NULL
b. columns may be NULL, and using the index value comparison, mySQL require special handling, some performance loss
Recommendation: usually best designated as NOT NULL, NULL value unless it really needs to be stored
- Try to use a primary key integer
a. integer type identifier column typically the best choice, because they can be used quickly and AUTO_INCREMENT
b. should be avoided as the identification string type column, because they are space consuming, and typically slower than the digital type
c. For complete "random" string also need to pay more attention. For example: MD5 (), SHAI () or the UUID () generated string. Function generates new values are also randomly distributed over a large space, which can lead to a number of SELECT statements and INSERT very slow
3. Index
- Why use an index fast
Index with respect to the data itself, a small amount of data
The index is ordered, you can quickly determine the location of the data
Innodb represents an index table organization, the data distribution table in accordance with the primary sort key
- storage structure index
a.B + tree
b. Hash (Configuration of key-value pairs)
MySQL is the primary key index with a B + tree structure can be selected non-primary key index B + tree or hash
Is generally recommended to use B + tree index, because more hash indexes Cons: can not be used to sort, query range can not be used when the data is large, there may be a large number of hash collisions, inefficient
- type of the index
Action by Category:
1. primary key index
2. Ordinary Index: is not particularly limited, allowing duplicate values
3. unique index: do not allow duplicate values, slightly faster than the average index
4. The full-text index: match as a full-text search, but basically do not have access, only English word index, and operating a high price
Data storage structure by Category:
1. clustered index
The same sequence of physical row and column values of data (that is typically the primary key column) of the logical order, a table can have only one clustered index: defined.
Primary key index clustered index is stored and the sequence order of the data is the same primary key
2. Non-clustered index
Definition: The logical order of the index index and the physical disk storage order different uplink, a table can have multiple non-clustered index.
Index other than the clustering index is non-clustered index, broken down into the general index, the only index, full-text indexing, they are also called secondary indexes.
Stored in the leaf node is the primary key index "row pointer", directly to the data line of the physical file.
Secondary index leaf node is stored in the primary key value
Cover index: available directly from the primary key index for direct return without index data table
such as:
Suppose there is a table t (clo1, clo2) multi-column index
select clo1,clo2 from t where clo = 1;
So, use this sql queries, data can be obtained directly from (clo1, clo2) index tree without having to query back to the table, so we only need to write as much as possible after select the necessary query field, in order to increase the chances of index covering.
Multi-column index: using a plurality of columns as an index, such as (clo1, clo2)
Usage scenarios: When a query is often used clo1 and clo2 as a query, you can use the composite index, this index will be faster than the single-column index
It should be noted that the use of multi-column index follows the principles of the leftmost index
Suppose you create a multiple column index index (A, B, C), it is the equivalent of three composite index created as follows:
1.index(A,B,C)
2.index(A,B)
3.index(A)
This is the leftmost index principle that the best combination from the left.
- The Index Tuning
1. The index is not possible, the index is the need for maintenance costs
2. On the Connection field should be indexed
3 to select a high-sensitive column as an index, the discrimination count (distinct col) / count (*) indicates the ratio of the field does not overlap, the larger the ratio of the number of the recording scans, the less the value of the low state, gender and other discrimination field the field is not indexed for
4. AND several fields often while to appear on the Where clause, you can create a composite index, or consider single-field index
The calculated into the service level rather than a database layer
6. If there is order by, group by scene, note the use of the orderliness of the index.
The final order by the field is part of the composite index, and the index on a combination of final order, to avoid the appearance file_sort affect query performance
For example, the statement where a =? And b =? Order by c, may establish joint index (a, b, c).
The last field is the order by a part of the composite index, and the index of the combination in order finally, to avoid file_sort (external sort) occurs, query performance
For statements where a =? And b =? Order by c, may establish joint index (a, b, c).
If there is an index to find the range, then the index can not be orderly use, such as WHERE a> 10 ORDER BY b; index (a, b) can not sort.
- may lead to unusable indexes
1.is null 和 is not null
2.! = And <> (available in place)
3. "Dependent Columns": index as part of an expression or function of the parameters of
For example: the part of the expression: select +1 = 5 function parameter id from t where id: select id from t where to_days (date_clo)> = 10
4.like begin with a query%
5.or (or both sides of the columns of the index can be used to establish an index)
6. Type inconsistent
If the column is a string type, the incoming condition that must be enclosed in quotes, or can not use the index
select * from tb1 where email = 999;
4. sql optimization tips
1. First look at the sql execution order, so that we better optimization
(1) FROM: data loaded from the hard disk into the data buffer for the next data to facilitate operation
(2) ON: join on multi-table join query, first screened on the conditions, then the connection table
(3) JOIN: the join on both sides of the table in accordance with the connection conditions
(4) WHERE: selecting tuples satisfying the condition from a base table or view
(5) GROUP BY: grouping, and aggregation function together with the general
(6) HAVING: screening on the basis of tuples elect qualifying tuples (must be used in conjunction with GROUP BY)
(7) SELECT: Query to get all tuples which columns require listed
(8) DISTINCT: de-duplication
(9) UNION: merge multiple query results
(10) ORDER BY: sorting the corresponding
(11) LIMIT: a display output data record
join on multi-table join queries, it is recommended that ways be multi-table queries, do not use subqueries (subquery will create a temporary table, loss performance)
Avoid using HAVING filter the data, but rather where
order by indexing the back of the field, using the sort order of the index, to avoid external sorting
If you know exactly only one result is returned, limit 1 can improve efficiency
2. The table is best not to join more than three
3. Avoid SELECT *, read more data from the database, then the query becomes slower
4. NOT NULL column to use as much as possible, can occupy additional space for NULL column, and the value in use and compare require special handling when indexing, performance impact
5. exists, not exists, and in, not in another alternative
Principle which is the result of a small sub-set produced by the query, which we chose
select * from t1 where x in (select y from t2)
select * from t1 where exists ( select null from t2 where y = x )
IN is suitable for large and small outer inner case; existx suitable for small and large outer inner case
6. Instead of using distinct exists
When submitting a to-many table information (such as department table and Crew) query that contains, in the select clause to avoid using distinct, in general, consider using exists in place to make inquiries more quickly, because the conditions subquery once meet , immediately returns the result
Inefficient wording:
select distinct dept_no,dept_name from dept d,emp e where d.dept_no=e.dept_no
Efficient writing:
select dept_no, dept_name from dept d where exists (select 'x' from emp e where e.dept_no = d.dept_no) NOTE: wherein x means: because exists just look subquery result was returned, without regard to return What, it is recommended to write a constant, high performance! Indeed may be replaced with exists distinct, but the above embodiment is a case where only dept_no unique primary key, if you want to remove the duplicates, necessary to refer to the following wording:
* from EMP WHERE dept_no SELECT EXISTS (SELECT Max (dept_no)) from Dept D, E EMP WHERE e.dept_no = d.dept_no by d.dept_no Group)
7. The implicit data type to avoid the conversion of
implicit data type conversion can not be applied index, resulting in a full table scan! phonenumber field type varchar t_tablename table is
the following code does not meet specifications:
SELECT column1 INTO i_l_variable1 from t_tablename WHERE phonenumber = 18,519,722,169;
should be written as follows:
SELECT column1 INTO i_lvariable1 from t_tablename WHERE phonenumber = '18,519,722,169';
8. The segmented query
in some queries page, when a user selected time range is too big, causing slow queries. The main reason is the excessive number of scanning lines. This time can be programmed, segmentation query, loop through the results consolidation process on display.
5.explain analysis of the implementation plan
explain shows how to use the index to handle mysql select statement and connection table. You can help choose a better index and write more optimized queries.
Example:
EXPLAIN SELECT from the mysql.user User;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | user | NULL | index | NULL | PRIMARY | 276 | NULL | 6 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
Identifier | meaning |
id |
select an identifier; it is a select query sequence number |
select_type |
select type: simple, simple select (not using union and sub-query) primary, any complex query contains sub-portion, the outermost layer is labeled PRIMARY select union, union or the second select statement later DEPENDENT UNION: generally sub-second query select statement (depending on the outer query, mysql also some internal optimization) UNION RESULT: union results SUBQUERY: subquery first select DEPENDENT SUBQUERY: subquery first select, depending on the outer query (mysql optimization will be some, some dependent optimizes directly into simple) DERIVED: select derived tables (from clause subqueries) |
table |
Query and display data from the table which is sometimes not the real name of the table (virtual table), the last one is the virtual table number that represents how much of id |
type |
The field value more, I will just focus on a few fields we have developed frequently used: system, const, eq_ref, ref, range, index, all; Performance from good to bad were: == system> const> eq_ref> ref> range> index> all == (be sure to keep in mind) system: table has only one row, this is a special case const, generally does not appear to be ignored const: represented by the index once found, const for comparing the primary key or unique index. Because only one row of data matching, so soon eq_ref: unique index scans, only one record in the table match. Usually two tables associated conditions related field is the primary key or unique index ref: non-unique index scan line, returns all rows matching a single value range: retrieving the given range of rows, the general conditions appear in the query>, <, in, between the query and the like index: traversing the index tree. Usually faster than ALL, because the index file is usually smaller than the data file. all are read and index the whole table, but the index is retrieved from the index, but all is retrieved from the hard disk. all: traverse the whole table to find matching rows |
possible_keys |
Application of this index may be displayed in the table, but not necessarily the actual query |
key |
Index of actual use, if not display null |
key_len |
表示索引中使用的字节数,可通过该列计算查询中使用的索引的长度。一般来说,索引长度越长表示精度越高,效率偏低;长度越短,效率高,但精度就偏低。并不是真正使用索引的长度,是个预估值。 |
ref |
表示哪一列被使用了,常数表示这一列等于某个常数。 |
rows |
大致找到所需记录需要读取的行数 |
filtered |
表示选取的行和读取的行的百分比,100表示选取了100%,80表示读取了80%。 |
Extra |
一些重要的额外信息
|