Java Web MySQL database of articles index

Java Web series summary posted: Java Web knowledge summary summary


Index Overview

Advantages and disadvantages

Advantage: You can quickly retrieve, reducing I / O times, quicker retrieval; according to the index grouping and sorting, grouping and sorting can be accelerated;

Disadvantages: the index table itself, and therefore take up storage space, in general, 1.5 times the data table of the index table space occupied; to maintain and create index table takes time costs, the cost increases as the amount of data ; modification operation will reduce the indexing data table (delete, add, modify) efficiency, since also need to modify the index table while modifying data table;

Index Classification

Common index types: primary key index, unique index, the general index, full-text index, the combination index
1, the primary key index: the main index, indexed according to primary key pk_clolum (length), allowed to repeat, does not allow nulls;
2, the only index: used to establish the value of the index column must be unique, allows nulls
3, the general index: table constructed of ordinary column index, without any limitation
4, full-text indexing: Construction of a large column with a text object index
5. the composition index: constructing a plurality of combinations of the index columns, the plurality of values of these columns does not allow nulls

Reference:
in-depth understanding of principles and implementation of MySQL index - Why indexes can speed up queries

MySQL indexing and optimization principle

Usually we are talking index refers to the B-Tree index, which is a relational database to find data in the most effective and commonly used index, most storage engines support such an index. Index InnoDB engine using B + tree structure.

With the increase of data in the database, increases the size of the index itself, the disk can not all be stored in memory, so the index is often stored in the form of an index file. In this case, the process will produce an index lookup disk I / O consumption, with respect to memory access, I / O access to the consumption of several orders of magnitude. Imagine the depth of a binary tree of millions of nodes is the number? If the depth of a binary tree so much put on a disk, each reading a node, you need a disk I / O read, consuming the entire look is clearly not acceptable. So how do you find the process of reducing the I / O access times?

An effective solution is to reduce the depth of the tree, the binary tree becomes m (multiple search tree), while the B + Tree is a multi-way search tree. Understanding the B + Tree, only need to understand its two most important features can be: first, all keywords (can be understood as data) are stored in the leaf node (Leaf Page), non-leaf node (Index Page) and does not store the actual data, the key values ​​of all nodes are recorded in the order on the same layer stored in the leaf node. Second, all the leaf nodes are connected by pointers. To simplify the following picture shows the height of 3 B + Tree.
B+Tree

Mainly due to the use of B + tree database index is a B-tree does not address the inefficiencies in traversal of the increase in disk IO performance at the same time. Precisely in order to solve this problem, B + tree came into being. As long as B + tree leaf node traversal can be achieved traverse the entire tree. And in a database based on a range of queries it is very frequent, and the B-tree is not supported by this operation (or inefficient). B + tree traversal element high efficiency, the B + tree structure is also particularly suitable with a range lookup. Such as finding the number of students 18-22 years old in school, you can find by randomly carried out starting from the root node, find the first 18-year-old students (at this time to reach the leaf node), before setting off in order to find the leaf node It meets all the recording range.

B + Tree of B means a balance, the balance is intended. Note that, B + tree index and can not find a specific row to the given key, only to find it is to find the page where the data rows, then the database will be read into memory page, and then look in the memory, the last get the data you want to find.

More:
Why choose to use the MySQL database index B + tree?
The principle MySql index
database indexing works and to optimize
the use and optimization mysql index

How much data is stored InnoDB B + tree

InnoDB a B + tree may be how many rows of data stored? The simple answer to this question is: about 20 million.

why?
InnoDB storage engine has its own storage unit the minimum - page (Page), a page size is 16K.
Data tables are stored in the page, so how many rows of data in a page can store it? Suppose a row of data is the size of 1k, then line 16 a page can store such data.
Therefore, the number of records in a single leaf node (p) is = 16K / 1K = 16. (It is assumed that the data size of a row is 1k, in fact, now many Internet service data record size is usually about 1K).

So now we need to calculate the number of non-leaf node pointer can be stored, in fact, this is considered good, we assume that the ID is the primary key bigint type, a length of 8 bytes, and the pointer is set to the size of the 6-byte source InnoDB, so a total of 14 bytes, we have a number of such units in the page can be stored, in fact, represents the number of pointers, i.e. 16 1024/14 = 16384/14 = 1170. Then one can calculate the height of the B + tree 2, can store 1170 is 16 = 18720 such data records.

The same principle we can calculate the height of a + B 3 of the tree can be stored: 1170 is 1170 is 16 = 21,902,400 such records. Therefore, in InnoDB B + tree height is generally 1-3 layers, it can meet the ten million data storage. When looking for data representative of a time to find IO page, so the query by the primary key index usually only need 1-3 times IO operation to find the data.

Details:
InnoDB a B + tree how many rows of data can be stored?
In a B + tree InnoDB many rows of data can be stored

Clustered index and non-clustered indexes (secondary indexes)

Outline

Clustered index into an index and two kinds of non-clustering index, a clustered index is stored in accordance with the physical location of the data as sequential, rather than clustered index is not the same; clustered index can improve the multi-line retrieval speed, for non-clustered index quickly retrieve a single row

MySQL, the different storage engines for different implementations index, broadly speaking under two storage engine MyISAM and InnoDB.
MyISAM on the B + data Tree leaf nodes, not the data itself, but the address data is stored. Main index and secondary indexes lacks distinction, but the main index of the key got to be unique. Here are the index of non-clustered index.
MyISAM also be employed mechanism for storing compressed index, for example, the first index "her", the second index is "here", then the second index is stored as "3, e", the disadvantage is the same node the index can only use sequential search.

InnoDB data file is the index file itself, B + data on Tree leaf node is the data itself, key-based key, which is clustered indexes. Non-clustered index, data in the leaf node is the primary key (so clustered index key, not too long). Why are stored in primary key, rather than the address where the records it, the reason is quite simple, because the address where the record does not guarantee that will not change, but the primary key can be guaranteed.
As for why the primary key is generally recommended to use auto-increment id it?
A: sequence of the physical index stored Clustered indexes the data is consistent, that is: as long as indexes are adjacent, then the corresponding data must also be stored on the adjacent disk. If the primary key is not the increment id, so you can imagine, it would be doing, constantly adjusting the physical address, paging of data, of course, there are other measures to reduce these operations, but can not completely avoid. However, if it is auto-incremented, it is simple, it only requires one to write a page, the index structure is relatively compact, less disk fragmentation, efficiency is high.
Recommended use:

Excerpt:
MySQL clustered index and non-clustered index

MyISAM- non-clustered index

  • A non-clustered index MyISAM storage engine employed, the main index and a secondary index of non-clustered index is almost the same, but does not allow duplicate primary index, does not allow nulls, key their leaf node corresponding to the key points are stored physical address data.
  • Non-clustered index data and index tables are stored separately.
  • Non-clustered index data is stored in the data according to the insertion order. Therefore, non-clustered index is more suitable for a single query data. Insert the key sequence is not affected.
  • Only use FULLTEXT indexes in MyISAM. (Mysql5.6 after innoDB also supports full-text indexing)
  • Since the non-clustered index, the main index and secondary indexes point to the same content, why does this stuff secondary indexes, and later came to realize that the index is used to query do not use it in those places, not that WHERE and ORDER BY clause behind it, then if the conditions are not the primary key query is how to do it, this time need a secondary index.

InnoDB- clustered index

  • Primary index clustered index leaf node corresponding to the key is stored in the data itself, the auxiliary leaf node storing the index key is the primary key data corresponding key. Thus the length of the primary key values ​​as small as possible, the simpler the better type.
  • Clustered index data and stored together with the primary key index.
  • Clustered index data is stored in the order of the primary key. Therefore, according to the interval for the primary key index to find, there may be less disk I / O, speed up queries. But also for this reason, the clustered index insertion order best in order by primary key drab insert, otherwise frequent cause page splitting, seriously affect performance.
  • In InnoDB, if you only need to find column index, try not to add other columns, which will improve query performance.

When using the main index, more suitable to use a clustered index, clustered index because one only need to look up, rather than clustered index found in the address data, but also for one I / O data lookup.

Because the auxiliary clustered index key is stored in the primary key, so you can reduce the cost of the Commission in mobile data line or page split time, because then do not maintain secondary indexes. But secondary indexes take up more space.

Clustered index when inserting new data much slower than non-clustered index, because of the need decompression whether to repeat the primary key when inserting new data, which need to traverse all the leaf nodes of the main index, rather than save the clustered index leaf nodes are address data, take up less space, and therefore the distribution of concentration, when the query I / O less, but the main index clustered index is stored in the data itself, a large data space, a greater distribution, a sector may occupy a lot and therefore require more time I / O to traverse completed.

The chart below illustrates the difference between the image of the clustered index and non-clustered indexes

More:
clustered index, secondary indexes, covering indexes, join indexes

Covering index (Covering Indexes)

If the index contains all the data that satisfy the query, it is called a covering index. A covering index is a very powerful tool that can greatly improve query performance. Only need to read the index data without reading the following advantages:
(1) items are typically smaller than the index records, MySQL accessing less data;
(2) index of all stored values in order of size, with respect to a random access record, less the I / O;
(3) most of the data cache engines to better index. For example, only MyISAM cache index.
(4) covering index is especially useful for InnoDB table, because InnoDB uses clustered index to organize data, if the secondary index contains the data required for the query, we no longer need to look in the clustered index.

InnoDB storage engine supports a covering index that can be obtained from the auxiliary records check of the index, without the need to query records clustered index.
Use a covering index so what benefits?

  • You can reduce a lot of IO operations
  • Statistics help

覆盖索引不能是任何索引,只有B-TREE索引存储相应的值。而且不同的存储引擎实现覆盖索引的方式都不同,并不是所有存储引擎都支持覆盖索引(Memory和Falcon就不支持)。
对于索引覆盖查询(index-covered query),使用EXPLAIN时,可以在Extra一列中看到“Using index”。例如,在sakila的inventory表中,有一个组合索引(store_id,film_id),对于只需要访问这两列的查询,MySQL就可以使用索引,如下:

mysql> EXPLAIN SELECT store_id, film_id FROM sakila.inventory\G
*************************** 1. row ***************************
           id: 1
 select_type: SIMPLE
        table: inventory
         type: index
possible_keys: NULL
          key: idx_store_id_film_id
      key_len: 3
          ref: NULL
         rows: 5007
        Extra: Using index
1 row in set (0.17 sec)

在大多数引擎中,只有当查询语句所访问的列是索引的一部分时,索引才会覆盖。但是,InnoDB不限于此,InnoDB的二级索引在叶子节点中存储了primary key的值。因此,sakila.actor表使用InnoDB,而且对于是last_name上有索引,所以,索引能覆盖那些访问actor_id的查询,如:

mysql> EXPLAIN SELECT actor_id, last_name
    -> FROM sakila.actor WHERE last_name = 'HOPPER'\G
*************************** 1. row ***************************
           id: 1
 select_type: SIMPLE
        table: actor
         type: ref
possible_keys: idx_actor_last_name
          key: idx_actor_last_name
      key_len: 137
          ref: const
         rows: 2
        Extra: Using where; Using index

索引使用建议

什么时候要使用索引?

  • 主键自动建立唯一索引;
  • 经常作为查询条件在WHERE或者ORDER BY 语句中出现的列要建立索引;
  • 作为排序的列要建立索引;
  • 查询中与其他表关联的字段,外键关系建立索引
  • 高并发条件下倾向组合索引;

什么时候不要使用索引?

  • 经常增删改的列不要建立索引;
  • 有大量重复的列不建立索引;
  • 表记录太少不要建立索引;
  • 在组合索引中不能有列的值为NULL,如果有,那么这一列对组合索引就是无效的;
  • 在一个SELECT语句中,索引只能使用一次,如果在WHERE中使用了,那么在ORDER BY中就不要用了;
  • LIKE操作中,’%aaa%'不会使用索引,也就是索引会失效,但是‘aaa%’可以使用索引;
  • 在索引的列上使用表达式或者函数会使索引失效,例如:select * from users where YEAR(adddate)<2007,将在每个行上进行运算,这将导致索引失效而进行全表扫描,因此我们可以改成:select * from users where adddate<’2007-01-01′。
  • 在查询条件中使用正则表达式时,只有在搜索模板的第一个字符不是通配符的情况下才能使用索引。
  • 在查询条件中使用<>会导致索引失效。
  • 在查询条件中使用IS NULL会导致索引失效。
  • 在查询条件中使用OR连接多个条件会导致索引失效,这时应该改为两次查询,然后用UNION ALL连接起来。
  • 尽量不要包括多列排序,如果一定要,最好为这队列构建组合索引;
  • 只有当数据库里已经有了足够多的测试数据时,它的性能测试结果才有实际参考价值。如果在测试数据库里只有几百条数据记录,它们往往在执行完第一条查询命令之后就被全部加载到内存里,这将使后续的查询命令都执行得非常快–不管有没有使用索引。只有当数据库里的记录超过了1000条、数据总量也超过了MySQL服务器上的内存总量时,数据库的性能测试结果才有意义。

其他建议

1、MySQL只对一下操作符才使用索引:<,<=,=,>,>=,between,in,以及某些时候的like(不以通配符%或_开头的情形)

2、缺省情况下建立的索引是非聚簇索引,但有时它并不是最佳的。在非群集索引下,数据在物理上随机存放在数据页上。合理的索引设计要建立在对各种查询的分析和预测上。一般来说:

  • a.有大量重复值、且经常有范围查询( > ,< ,> =,< =)和order by、group by发生的列,可考
    虑建立群集索引;
  • b.经常同时存取多列,且每列都含有重复值可考虑建立组合索引;
  • c. To try to make the combination of the index key index covers the query form, its leading column must be the most frequently used columns. Index may help improve performance but not index the better, contrary too many indexes can cause the system inefficient. Users in the table added to each index, the index set to do the appropriate maintenance of updating.

3, ORDER BY, and GROPU BY GROUP BY and ORDER BY use the phrase, any kind of index will help to improve the performance of SELECT.

4, the index does not contain NULL values

5, multi-table operation before it is actually executed, the query optimizer according to the join condition, listed several possible set of connectivity solutions and to find best solution to minimize system overhead. To fully consider the connection condition table, the table with the number of rows and indexes; appearance may be selected within the formula: * Number of times each inner table lookup table to match the outer row is determined, the optimum minimum product Program.

6, any operation on the columns will cause a table scan, which includes database functions, evaluate expressions, etc., to move the operation to the right of the equal sign as the query.

7, IN, OR clause often work tables, so that failure index. If the repeat value is not a large amount, it may be considered to clause apart. Open clause should contain an index.

Index Tuning

  • The most left-prefix, the highest frequency grouping sorting column on the left, and so on
  • Fuzzy query optimization indexed, use LIKE fuzzy query time, '% aaa%' does not use the index
  • Construction of full-text index search conditions, then use
  • Using short index of serial index, if possible, you should specify a prefix length.
  • Index does not contain NULL values ​​in a column
  • Index column sort
  • Do not carry out operations in the column

More:
MySQL index summarizes ---- mysql index types and creating
Why do not you create a database index entry into force, conditions index failed

Guess you like

Origin blog.csdn.net/zangdaiyang1991/article/details/91386549