Speechless, I was almost broken by the interviewer, and asked me about the MySQL index

A while ago, I ran out for an interview again, and my mentality was broken. The interviewer was stunned by the principle and use of MySQL index, and I was determined to summarize it. Then I didn’t have time (in fact, lazy...). Are you ready?

Speechless, I was almost broken by the interviewer, and asked me about the MySQL index

 

1. Syntax of index in MySQL

Create index

Add indexes when creating tables

CREATE TABLE mytable(  
    ID INT NOT NULL,   
    username VARCHAR(16) NOT NULL,  
    INDEX [indexName] (username(length))  
); 

Add indexes after creating the table

ALTER TABLE my_table ADD [UNIQUE] INDEX index_name(column_name);
或者CREATE INDEX index_name ON my_table(column_name);

note:

1. The index needs to take up disk space , so when creating an index, consider whether the disk space is sufficient

2. The table needs to be locked when creating an index , so the actual operation needs to be done during business idle periods

Query by index

具体查询:
SELECT * FROM table_name WHERE column_1=column_2;(为column_1建立了索引)
 或者模糊查询SELECT * FROM table_name WHERE column_1 LIKE '%三'
SELECT * FROM table_name WHERE column_1 LIKE '三%'
SELECT * FROM table_name WHERE column_1 LIKE '%三%'
 SELECT * FROM table_name WHERE column_1 LIKE '_好_'
 如果要表示在字符串中既有A又有B,那么查询语句为:SELECT * FROM table_name WHERE column_1 LIKE '%A%' AND column_1 LIKE '%B%';
 SELECT * FROM table_name WHERE column_1 LIKE '[张李王]三';  //表示column_1中有匹配张三、李三、王三的都可以
SELECT * FROM table_name WHERE column_1 LIKE '[^张李王]三';  //表示column_1中有匹配除了张三、李三、王三的其他三都可以
 //在模糊查询中,%表示任意0个或多个字符;_表示任意单个字符(有且仅有),通常用来限制字符串长度;[]表示其中的某一个字符;[^]表示除了其中的字符的所有字符 或者在全文索引中模糊查询SELECT * FROM table_name WHERE MATCH(content) AGAINST('word1','word2',...);

Delete index

DROP INDEX my_index ON tablename;
或者ALTER TABLE table_name DROP INDEX index_name;

View the index in the table

SHOW INDEX FROM tablename

View the query statement using the index

//explain 加查询语句
explain SELECT * FROM table_name WHERE column_1='123';

Two, the advantages and disadvantages of the index

Advantages: fast retrieval, reducing I/O times, and speeding up retrieval; grouping and sorting by index can speed up grouping and sorting;

Disadvantages: The index itself is also a table, so it will take up storage space. Generally speaking, the space occupied by the index table is 1.5 times that of the data table; the maintenance and creation of the index table requires time cost, and this cost increases as the amount of data increases ; Building an index will reduce the efficiency of data table modification operations (delete, add, modify), because the index table needs to be modified while modifying the data table;

Three, the classification of the index

Common index types are: primary key index, unique index, ordinary index, full-text index, composite index

1. Primary key index: the primary index, the index is established according to the primary key pk_clolum (length), no repetition is allowed, and no null value is allowed ;

ALTER TABLE 'table_name' ADD PRIMARY KEY pk_index('col');

2. Unique index: the value of the column used to build the index must be unique, and null values ​​are allowed

ALTER TABLE 'table_name' ADD UNIQUE index_name('col');

3. Ordinary index: an index constructed with ordinary columns in the table without any restrictions

ALTER TABLE 'table_name' ADD INDEX index_name('col');

4. Full-text index: an index built with columns of large text objects (explained in the next part)

ALTER TABLE 'table_name' ADD FULLTEXT INDEX ft_index('col');

5. Combined index: an index constructed with a combination of multiple columns. The values ​​in these multiple columns are not allowed to have null values

ALTER TABLE 'table_name' ADD INDEX index_name('col1','col2','col3');

*Follow the principle of "leftmost prefix", put the most commonly used columns for retrieval or sorting on the far left, in descending order, the combined index is equivalent to the establishment of three indexes col1, col1col2, col1col2col3, and col2 or col3 cannot use the index .

*When using a composite index, the key of the index may be too long due to the length of the column name being too long, resulting in a decrease in efficiency. If allowed, you can only take the first few characters of col1 and col2 as the index

ALTER TABLE 'table_name' ADD INDEX index_name(col1(4),col2(3));

Indicates to use the first 4 characters of col1 and the first 3 characters of col2 as indexes

Fourth, the realization principle of the index

MySQL supports many storage engine, and various storage engines support indexes also vary, so MySQL database supports multiple index types, such as BTree index, B + Tree indexes, hash indexes, full-text indexing and so on ,

1. Hash index:

Only the memory (memory) storage engine supports hash indexes. The hash index uses the value of the index column to calculate the hashCode of the value, and then stores the physical location of the row data where the value is located in the corresponding location of the hashCode. Because the hash algorithm is used, The access speed is very fast, but a value can only correspond to one hashCode, and it is a hash distribution method, so the hash index does not support the function of range search and sorting.

2. Full-text index:

FULLTEXT (full text) index can only be used in MyISAM and InnoDB. For larger data, generating full text index is very time and space consuming. For large text objects, or large CHAR data, if you use a normal index, it is still feasible to match the first few characters of the text, but if you want to match a few words in the middle of the text, you must use LIKE %word% It takes a long time to process and the response time will be greatly increased. In this case, the FULLTEXT index can be used. When the FULLTEXT index is generated, a list of words will be generated for the text. Index according to the list of words. FULLTEXT can be created when the table is created, or can be added with ALTER or CREATE INDEX when needed:

//创建表的时候添加FULLTEXT索引
CTREATE TABLE my_table(
    id INT(10) PRIMARY KEY,
    name VARCHAR(10) NOT NULL,
    my_text TEXT,
    FULLTEXT(my_text)
)ENGINE=MyISAM DEFAULT CHARSET=utf8;
//创建表以后,在需要的时候添加FULLTEXT索引
ALTER TABLE my_table ADD FULLTEXT INDEX ft_index(column_name);

The full-text index query also has its own special syntax, and the fuzzy query syntax of LIKE% query string% cannot be used

SELECT * FROM table_name MATCH(ft_index) AGAINST('查询字符串');

note:

*For larger data sets, adding data to a table without a FULLTEXT index, and then adding a FULLTEXT index is faster than adding data to a table that already has a FULLTEXT index.

*The full-text index that comes with MySQL before version 5.6 can only be used for the MyISAM storage engine. If it is another data engine, the full-text index will not take effect. After version 5.6, the InnoDB storage engine began to support full-text indexing

*In MySQL, English is useful for full-text indexing, but Chinese is not currently supported. After version 5.7, it supports Chinese by using the ngram plugin.

*In MySQL, if the retrieved string is too short, the expected result cannot be retrieved. The retrieved string must be at least 4 bytes in length. In addition, if the retrieved character includes a stop word, the stop word will be ignored.

3. BTree index and B+Tree index

  • BTree index

BTree is a balanced search multi-branch tree. Suppose the degree of the tree is 2d (d>1) and the height is h, then BTree must meet the following conditions:

  • The height of each leaf node is the same, equal to h;
  • Each non-leaf node is composed of n-1 keys and n pointers, where d<=n<=2d, key and point are separated from each other, and both ends of the node must be keys;
  • The leaf node pointers are all null;
  • The keys of non-leaf nodes are all [key,data] two-tuples, where key represents the key as an index, and data is the data of the row where the key value is located;

The structure of BTree is as follows:

Speechless, I was almost broken by the interviewer, and asked me about the MySQL index

 

Under the BTree organization, you can use the binary search search method. The search complexity is h*log(n). Generally speaking, the height of the tree is very small, generally about 3, so BTree is a very efficient search structure.

  • B+Tree index

B+Tree is a variant of BTree. Let d be the degree of the tree and h be the height of the tree. The main differences between B+Tree and BTree are:

  • Non-leaf nodes in B+Tree do not store data, only key values;
  • The leaf nodes of B+Tree have no pointers, all key values ​​will appear on the leaf nodes, and the key value stored by the key corresponds to the physical address of the data data;
  • Each non-leaf node of B+Tree consists of n key values and n pointer points ;

The structure of B+Tree is as follows:

Speechless, I was almost broken by the interviewer, and asked me about the MySQL index

 

Advantages of B+Tree compared to BTree:

1. Lower disk read and write costs

Generally speaking, B+Tree is more suitable for implementing the index structure of external memory than BTree, because the design experts of the storage engine cleverly use the storage structure of external memory (disk), that is, the smallest storage unit of the disk is a sector, and The block of the operating system is usually an integer multiple of the sector. The operating system manages memory in the unit of page (page). A page usually defaults to 4K. The page of the database is usually set to an integer multiple of the operating system page. The node of the index structure is designed to be the size of a page, and then using the "pre-reading" principle of external memory, each time the data of the entire node is read into the memory, and then searched in the memory, it is known The read speed of the memory is hundreds of times the speed of the external memory read I/O, so the key to improving the search speed is to minimize the disk I/O, then you can know that the more keys in each node , Then the smaller the height of the tree, the fewer I/Os are required. Therefore, in general, B+Tree is faster than BTree, because B+Tree does not store data in non-leaf nodes, and can store more keys.

2. The query speed is more stable

Since B+Tree non-leaf nodes do not store data, all data must be queried to the leaf nodes, and the height of the leaf nodes is the same, so the query speed of all data is the same.

More operating system content reference:

Hard disk structure

The difference between sectors, blocks, clusters, and pages

Operating system layer optimization (advanced, no need to read for beginners)

  • B+TREE with sequential index

Many storage engines are optimized on the basis of B+Tree, adding pointers to adjacent leaf nodes, forming a B+Tree with sequential access pointers. This is to improve the efficiency of interval search, as long as you find the first Then you can find the following values ​​in sequence.

The structure of B+Tree is as follows:

 

Speechless, I was almost broken by the interviewer, and asked me about the MySQL index

 

Clustered index and non-clustered index

Analyzed the implementation principle of MySQL's index structure, and then let's take a look at how the specific storage engine implements the index structure. The two most common storage engines in MySQL are MyISAM and InnoDB, which implement non-clustered index and cluster respectively. index.

The explanation of the clustered index is: the order of the clustered index is the physical storage order of the data

The explanation of non-clustered index is: index order has nothing to do with the physical order of data

(This is not easy to understand, and it is confusing, so I will continue to read the following and explain the above two sentences below the illustration)

First of all, we must introduce a few concepts. In the classification of indexes, we can divide the index key into "primary index" and "secondary index" according to whether the key of the index is the primary key. The index built using the primary key value is called the "primary index". Is called the "secondary index". Therefore, there can only be one primary index, and there can be many auxiliary indexes.

MyISAM-non-clustered index

  • The MyISAM storage engine uses a non-clustered index. The primary and secondary indexes of a non-clustered index are almost the same, except that the primary index does not allow duplicates and does not allow null values. The keys of their leaf nodes are stored to point to the key value. The physical address of the data.
  • The data table and index table of the non-clustered index are stored separately.
  • The data in the non-clustered index is stored according to the order in which the data is inserted. Therefore, non-clustered index is more suitable for single data query. The insertion order is not affected by the key value.
  • FULLTEXT index can only be used in MyISAM. (InnoDB also supports full-text indexing after mysql5.6)

At first, I didn’t understand why the secondary index is needed since the primary index and secondary index of the non-clustered index point to the same content. Later I realized that the index is not used for query, and used in those places. Is it after the WHERE and ORDER BY statements? What if the query condition is not the primary key? At this time, an auxiliary index is needed.

InnoDB-clustered index

  • The leaf node of the primary index of the clustered index stores the data itself corresponding to the key value, and the leaf node of the auxiliary index stores the primary key value of the data corresponding to the key value. Therefore, the smaller the value of the primary key, the better, and the simpler the type, the better.
  • The data of the clustered index is stored together with the primary key index.
  • The data of the clustered index is stored according to the order of the primary key. Therefore, it is suitable for searching by the interval of the primary key index, which can have less disk I/O and speed up the query. But also for this reason, the insertion order of the clustered index is best to be inserted in the monotonous order of the primary key, otherwise it will frequently cause page splits and seriously affect performance.
  • In InnoDB, if you only need to find indexed columns, try not to add other columns, which will improve query efficiency.

When using the main index, it is more suitable to use the clustered index, because the clustered index only needs to be searched once, and the non-clustered index needs to perform an I/O search for the data after the address of the data is found.

*Because the clustered auxiliary index stores the key value of the primary key, it can reduce the cost when the data row is moved or the page is split, because the auxiliary index is not maintained at this time. However, because the primary index stores the data itself, the clustered index will take up more space.

*Clustered indexes are much slower than non-clustered indexes when inserting new data, because when inserting new data, it is necessary to detect whether the primary key is duplicated. This requires traversing all leaf nodes of the primary index, instead of the leaf nodes of the clustered index. The data address occupies less space, so the distribution is concentrated, and there is less I/O when querying, but the main index of the clustered index stores the data itself. The data occupies a large space and has a larger distribution range, which may occupy a lot of sectors. , So it takes more I/O to complete the traversal.

The following figure can vividly illustrate the difference between clustered index and non-clustered index

Speechless, I was almost broken by the interviewer, and asked me about the MySQL index

 

From the above figure, you can see that the data of the leaf node of the secondary index of the clustered index stores the value of the primary key, and the data of the leaf node of the primary index stores the data itself, that is to say, the data and the index are stored together, and the index The place to be queried is the data itself, so the order of the index and the order of the data itself are the same;

The data of the primary index of the non-clustered index and the leaf node of the secondary index are the physical addresses of the stored data, which means that the index and the data are not stored together, and the order of the data has nothing to do with the order of the index. That is, the index order has nothing to do with the physical order of the data.

In addition, the differences between MyISAM and innoDB are summarized as follows:

Speechless, I was almost broken by the interviewer, and asked me about the MySQL index

 

Summarized as follows:

  • InnoDB supports transactions, row-level locking, supports indexes such as B-tree and Full-text, and does not support Hash indexes;
  • MyISAM does not support transactions, supports table-level locking, supports indexes such as B-tree and Full-text, and does not support Hash indexes;

In addition, Memory does not support transactions, supports table-level locking, supports indexes such as B-tree and Hash, and does not support Full-text indexes;

Five, index use strategy

When should I use the index?

  • The primary key automatically creates a unique index;
  • Columns often appearing in WHERE or ORDER BY statements as query conditions should be indexed;
  • As a sorted column to be indexed;
  • Query the fields associated with other tables in the query, and create an index for the foreign key relationship
  • Prone to combined index under high concurrency conditions;
  • Columns used for aggregate functions can be indexed, for example, column_1 when max (column_1) or count (column_1) is used, it needs to be indexed

When not to use indexes?

  • Do not create indexes for columns that are frequently added, deleted, and modified;
  • There are a lot of duplicate columns without indexing;
  • Table records are too few, do not create an index. Only when there is enough test data in the database, its performance test results have practical reference value. If there are only a few hundred data records in the test database, they are often all loaded into the memory after the first query command is executed, which will make subsequent query commands executed very fast-regardless of whether the index is used or not . Only when the records in the database exceed 1,000 and the total amount of data exceeds the total amount of memory on the MySQL server, the database performance test results are meaningful.

Index failure situation:

  • There cannot be a column with a value of NULL in the composite index. If there is, this column is invalid for the composite index.
  • In a SELECT statement, the index can only be used once, if it is used in WHERE, then do not use it in ORDER BY.
  • In the LIKE operation,'%aaa%' will not use the index, that is, the index will be invalid, but'aaa%' can use the index.
  • Using expressions or functions on the indexed columns will invalidate the index, for example: select * from users where YEAR(adddate)<2007, operations will be performed on each row, which will cause the index to fail and perform a full table scan, so We can change it to: select * from users where adddate<'2007-01-01'. Other wildcards are the same, that is, when using regular expressions in query conditions, the index can only be used if the first character of the search template is not a wildcard.
  • Use inequality in query conditions, including <symbol,> symbol and! = Will cause the index to become invalid. Especially if it is used for the primary key index! = Will not invalidate the index. If you use the <sign or> symbol for the primary key index or integer type index, the index will not be invalidated. (As reminded by classmate erwkjrfhjwkdb, it is not equal to, including <symbol,> symbol and !, if it accounts for a small proportion of the total record, it will not be invalid)
  • Using IS NULL or IS NOT NULL in the query conditions will cause the index to become invalid.
  • Strings without single quotes will cause the index to become invalid. To be more precise, inconsistent types will result in invalidation. For example, if the field email is of string type, using WHERE email=99999 will result in failure. It should be changed to WHERE email='99999'.
  • Using OR to connect multiple conditions in the query condition will cause the index to fail, unless each condition of the OR link is added with an index, then it should be changed to two queries, and then connected with UNION ALL.
  • If the sorted field uses an index, then the select field must also be an index field, otherwise the index becomes invalid. Especially if the sorting is the primary key index, select * will not cause the index to fail.
  • Try not to include multi-column sorting, if you must, it is best to build a composite index for this queue;

Six, index optimization

1. The leftmost prefix

The leftmost prefix of the index is related to the "leftmost prefix principle" in B+Tree. For example, if the combined index <col1,col2,col3> is set, then the index can be used in the following 3 situations: col1,<col1, col2>, <col1,col2,col3>, other columns, such as <col2,col3>, <col1,col3>, col2, col3, etc. cannot use indexes.

According to the principle of the leftmost prefix, we generally put the column with the highest sorting frequency on the left, and so on.

2. Fuzzy query optimization with index

As mentioned above, when using LIKE for fuzzy query,'%aaa%' will not use the index, that is, the index will fail. If this is the case, you can only use full-text indexing for optimization (mentioned above).

3. Build a full-text index for the search conditions, and then use

SELECT * FROM tablename MATCH(index_colum) ANGAINST(‘word’);

4. Use short indexes

To index the list, you should specify a prefix length if possible. For example, if there is a CHAR(255) column, if the multi-value is unique within the first 10 or 20 characters, then do not index the entire column. Short index can not only improve query speed but also save disk space and I/O operations.

Guess you like

Origin blog.csdn.net/GYHYCX/article/details/108629027
Recommended