What is an index in MySQL? How to optimize?

      The index is similar to the bibliographic index established by the university library, which can improve the efficiency of data retrieval and reduce the IO cost of the database. MySQL's performance begins to gradually decline around 3 million records, although the official document says 500~800w records, so it is very necessary to build indexes for large data volumes. MySQL provides Explain, which is used to display the detailed information of SQL execution, and can perform index optimization.

1. Reasons for slow SQL execution:

      1. Hardware problem. Such as slow network speed, insufficient memory, low I/O throughput, full disk space, etc.

      2. There is no index or the index is invalid. (Generally in Internet companies, DBAs will lock the table in the middle of the night and rebuild the index, because when you delete a certain data, the tree structure of the index is not complete. Therefore, the data of Internet companies is fake deletion. One is to do data analysis, and the other is to not destroy the index)

      3. Too much data (sub-database and sub-table)

      4. Server tuning and various parameter settings (adjust my.cnf)

2. When analyzing the reasons, we must find the entry point:

      1. First observe, open the slow query log, set the corresponding threshold (for example, if it exceeds 3 seconds, it is slow SQL), and after the last day of running in the production environment, see which SQL is slow.

      2.Explain and slow SQL analysis. For example, the SQL statement is badly written, the index is missing or invalid, there are too many related queries (sometimes it is a design flaw or an unreasonable requirement) and so on.

      3. Show Profile is a step closer to the execution details than Explain. You can query what each SQL has done and how many seconds each of these things took.

      4. Find DBA or operation and maintenance to tune MySQL server parameters.

3. What is an index?

      MySQL's official definition of an index is: An index is a data structure that helps MySQL efficiently obtain data. We can simply understand it as: a data structure for fast search and sorting. Mysql index mainly has two structures: B+Tree index and Hash index. The index we usually refer to, unless otherwise specified, generally refers to the index organized by the B-tree structure (B+Tree index). The index is as shown:

             

      The outermost light blue disk block 1 contains data 17, 35 (dark blue) and pointers P1, P2, P3 (yellow). The P1 pointer indicates disk blocks smaller than 17, P2 is between 17-35, and P3 points to disk blocks larger than 35. The real data exists in the cotyledon nodes, that is, the bottom layer 3, 5, 9, 10, 13... Non-leaf nodes do not store real data, but only store data items that guide the search direction, such as 17, 35 .

      Search process: For example, to search for 28 data items, first load the disk block 1 into the memory, an I/O occurs, and the pointer at P2 is determined by binary search. Then it is found that 28 is between 26 and 30, and the disk block 3 is loaded into the memory through the address of the P2 pointer, and the second I/O occurs. Disk block 8 is found in the same way, and a third I/O occurs.

      The real situation is that the B+Tree in the upper 3 layers can represent millions of data, and the millions of data only have three I/Os instead of millions of I/Os, and the time improvement is huge.

4. Explain Analysis

      The foreshadowing of the previous article is completed, enter the practical part, and first insert the data required for the test:

CREATE TABLE `user_info` (
  `id`   BIGINT(20)  NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL DEFAULT '',
  `age`  INT(11)              DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `name_index` (`name`)
)ENGINE = InnoDB DEFAULT CHARSET = utf8;

INSERT INTO user_info (name, age) VALUES ('xys', 20);
INSERT INTO user_info (name, age) VALUES ('a', 21);
INSERT INTO user_info (name, age) VALUES ('b', 23);
INSERT INTO user_info (name, age) VALUES ('c', 50);
INSERT INTO user_info (name, age) VALUES ('d', 15);
INSERT INTO user_info (name, age) VALUES ('e', 20);
INSERT INTO user_info (name, age) VALUES ('f', 21);
INSERT INTO user_info (name, age) VALUES ('g', 23);
INSERT INTO user_info (name, age) VALUES ('h', 50);
INSERT INTO user_info (name, age) VALUES ('i', 15);

CREATE TABLE `order_info` (
  `id`           BIGINT(20)  NOT NULL AUTO_INCREMENT,
  `user_id`      BIGINT(20)           DEFAULT NULL,
  `product_name` VARCHAR(50) NOT NULL DEFAULT '',
  `productor`    VARCHAR(30)          DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `user_product_detail_index` (`user_id`, `product_name`, `productor`)
)ENGINE = InnoDB DEFAULT CHARSET = utf8;

INSERT INTO order_info (user_id, product_name, productor) VALUES (1, 'p1', 'WHH');
INSERT INTO order_info (user_id, product_name, productor) VALUES (1, 'p2', 'WL');
INSERT INTO order_info (user_id, product_name, productor) VALUES (1, 'p1', 'DX');
INSERT INTO order_info (user_id, product_name, productor) VALUES (2, 'p1', 'WHH');
INSERT INTO order_info (user_id, product_name, productor) VALUES (2, 'p5', 'WL');
INSERT INTO order_info (user_id, product_name, productor) VALUES (3, 'p3', 'MA');
INSERT INTO order_info (user_id, product_name, productor) VALUES (4, 'p1', 'WHH');
INSERT INTO order_info (user_id, product_name, productor) VALUES (6, 'p1', 'WHH');
INSERT INTO order_info (user_id, product_name, productor) VALUES (9, 'p8', 'TE');

First experience, the effect of executing Explain:

The index usage is in the three columns of possible_keys, key and key_len. Next, we will explain it from left to right.

1.id

--id相同,执行顺序由上而下
explain select u.*,o.* from user_info u,order_info o where u.id=o.user_id;

--id不同,值越大越先被执行
explain select * from  user_info  where id=(select user_id from order_info where  product_name ='p8');

2.select_type

You can see the execution instance of id, there are the following types:

  • SIMPLE: Indicates that this query does not contain UNION queries or subqueries
  • PRIMARY: Indicates that this query is the outermost query
  • SUBQUERY: The first SELECT in a subquery
  • UNION: Indicates that this query is the second or subsequent query of UNION
  • DEPENDENT UNION: The second or subsequent query statement in the UNION, depending on the outer query
  • UNION RESULT, result of UNION
  • DEPENDENT SUBQUERY: The first SELECT in a subquery, which depends on the outer query. That is, the subquery depends on the result of the outer query.
  • DERIVED: Derived, representing a SELECT of the derived table (subquery of the FROM clause)

3.table

table represents the table involved in the query or a derived table:

explain select tt.* from (select u.* from user_info u,order_info o where u.id=o.user_id and u.id=1) tt

<derived2> with id 1 indicates that the u and o tables with id 2 are derived.

4.type

The type field is more important, and it provides an important basis for judging whether the query is efficient. Through the type field, we judge whether the query is a full table scan or an index scan.


Commonly used values ​​for type are:

  • system: There is only one piece of data in the table, and this type is a special const type.
  • const: An equivalent query scan against the primary key or unique index returns at most one row of data. A const query is very fast because it is only read once. For example, the following query uses the primary key index, so the type is const: explain select * from user_info where id = 2;
  • eq_ref: This type usually appears in join queries of multiple tables, indicating that for each result in the former table, only one row of results in the latter table can be matched. And the comparison operation of the query is usually =, and the query efficiency is higher. For example: explain select * from user_info, order_info where user_info.id = order_info.user_id;
  • ref: This type usually appears in join queries of multiple tables, for non-unique or non-primary key indexes, or for queries that use the leftmost prefix rule index. For example, in the following example, a query of ref type is used: explain select * from user_info, order_info where user_info.id = order_info.user_id AND order_info.user_id = 5
  • range: Indicates that the index range query is used to obtain some data records in the table through the index field range. This type usually appears in =, <>, >, >=, <, <=, IS NULL, <=>, BETWEEN, IN() operations. For example, the following example is a range query: explain select * from user_info where id between 2 and 8;
  • index: Indicates a full index scan, similar to the ALL type, except that the ALL type is a full table scan, while the index type only scans all indexes without scanning data. The index type usually appears in: the data to be queried can be obtained directly in the index tree without scanning the data. When this is the case, the Extra field displays Using index.
  • ALL: Indicates a full table scan, this type of query is one of the worst performing queries. Generally speaking, our query should not appear ALL type of query, because such a query will be a huge disaster to the performance of the database in the case of a large amount of data. If a query is an ALL type query, then in general, it can be avoided by adding an index to the corresponding field.

      Generally speaking, the performance relationship of different types is as follows:
      ALL < index < range ~ index_merge < ref < eq_ref < const < system
      ALL type is a full table scan, so under the same query conditions, it is the slowest . Although the index type query is not a full table scan, it scans all the indexes, so it is slightly faster than the ALL type. The latter types use indexes to query data, so they can filter part or most of the data. Therefore, the query efficiency is relatively high.

5.possible_keys

      It represents the index that mysql may use when querying. Note that even though some indexes appear in possible_keys, it does not mean that this index will actually be used by mysql. Which indexes mysql uses when querying is determined by the key field.

6.key

      This field is the index actually used by mysql in the current query. For example, for a dinner party, possible_keys is how many people should be reached, and the key is how many people are actually reached. When we are not indexing:

explain select o.* from order_info o where  o.product_name= 'p1' and  o.productor='whh';
create index idx_name_productor on order_info(productor);
drop index idx_name_productor on order_info;

Create a composite index and then query:

7.key_len

      Indicates the number of bytes of the index used by the query optimizer. This field can evaluate whether the composite index is fully used.

8.ref

      This indicates which column of the display index is used and, if possible, is a constant. There is also ref in the type attribute of the previous article, pay attention to the difference.

9.rows

      Rows is also an important field. The mysql query optimizer estimates the number of rows of data that sql needs to scan and read to find the result set based on statistical information. This value is a very intuitive indicator of the efficiency of sql. In principle, the fewer rows the better. You can compare the example in the key. For one without indexing money, the rows are 9. After the index is established, the rows are 4.

10.extra

A lot of extra information in explain will be displayed in the extra field, the common ones are the following:

  • using filesort : Indicates that mysql requires additional sorting operations, and the sorting effect cannot be achieved through the index order. Generally, using filesort is recommended to be optimized and removed, because such a query consumes a lot of CPU resources.
  • using index: Covering index scan, which means that the query can find the required data in the index tree without scanning the table data file, which often means that the performance is good.
  • using temporary: The query uses a temporary table, which generally occurs in the case of sorting, grouping and multi-table join. The query efficiency is not high, and optimization is recommended.
  • using where : The table name is filtered using where.

V. Optimization case

explain select u.*,o.* from user_info u LEFT JOIN  order_info o on u.id=o.user_id;

Execution result, type has ALL, and no index:

Start optimization, create an index on the associated column, obviously see that the ALL of the type column becomes ref, and the index is used, and the rows are also scanned from 9 rows to 1 row:

There is a general rule here: the left link index is added to the right table, and the right link index is added to the left table.

6. Do you need to create an index?   

      Although indexes can be very efficient to improve query speed, they will slow down the speed of updating tables. In fact, the index is also a table, which saves the primary key and index fields, and points to the records of the entity table, so the index column also takes up space.

              

      I am an ordinary programmer with limited level, and the articles are inevitably wrong. Readers who sacrifice their precious time are welcome to express their opinions directly on the content of this article. My purpose is just to help readers.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324378239&siteId=291194637