Index Creation and Design Principles

1. Classification of indexes

From a functional logic perspective, it is divided into: ordinary index, unique index, primary key index, and full-text index.

  • Normal index (NORMAL) : An index without any restrictions, just to improve query efficiency.
  • Unique index (UNIQUE) : The value of the index must be unique, but null values ​​are allowed. A table can have multiple unique indexes.
  • Primary key index : A special unique index that adds a non-empty constraint. A table has at most one primary key index.
  • Full text index (FULLTEXT) : Use word segmentation technology and other algorithms to analyze the frequency and importance of keywords, suitable for large data sets . Full-text indexes can only be created on fields of CHAR, VARCHAR or TEXT type and their series types . When querying string-type fields with large amounts of data, using full-text indexes can improve query speed . For example, the field information in the table student is of type TEXT. This field contains a lot of text information. After establishing a full-text index on the field information, the speed of querying the field information can be improved. With the advent of the big data era, relational databases are no longer able to cope with the demand for full-text indexing, and are gradually being replaced by specialized search engines such as Solr and ElasticSearch .

According to the physical implementation, it is divided into: clustered index and non-clustered index.

  • Clustered index : An index built on the primary key is a data storage method. The data rows of the table are stored in the leaf pages of the index tree.
  • Non-clustered index : An index built on non-primary keys, which does not save complete records and is sorted by index columns.

According to the number of fields used, it is divided into: single column index and joint index.

  • Singleton index : Create an index on a single field in the table. A single column index only indexes based on this field. A single-column index can be a normal index, a unique index, or a full-text index. Just make sure that the index only corresponds to one field. A table can have multiple single-column indexes.
  • Multi-column index : Create an index on a combination of multiple fields in a table. The index points to multiple corresponding fields when it was created. You can query through these fields, but only the first field among these fields will be used in the query conditions. For example, create a multi-column index idx_id_name_gender on the fields id, name and gender in the table. This index will only be used when the field id is used in the query conditions. Follow the leftmost prefix set when using combined indexes.

Different storage engines support different index types.

  • InnoDB: supports B+ tree, Full-text and other indexes, does not support Hash index
  • MyISAM: supports B+ tree, Full-text and other indexes, does not support Hash index
  • Memory: supports B+ tree, Hash and other indexes, does not support Full-text index

2. Index design principles

Suitable for creating indexes

1. Fields frequently used as WHERE conditions

If a field is frequently used in the WHERE condition of a SELECT, UPDATE or DELETE statement, then you need to create an index for this field. Especially when the amount of data is large, creating a common index can greatly improve the efficiency of data query.

2. Frequent GROUP BY and ORDER BY columns

Indexing allows data to be stored or retrieved in a certain order. Therefore, when we use GROUP BY to group data or use ORDER BY to sort data, we need to index the grouped or sorted fields. If there are multiple columns that need to be indexed, you can create a composite index on these columns.

  • When there is only one column that needs to be indexed, for example, GROUP BY through student_id
SELECT student_id, COUNT(*) AS num 
FROM student_info 
GROUP BY student_id

We can add an index to student_id to increase the query speed. Similarly, if it is ORDER BY, you can also add indexes to the corresponding fields.

  • When there are multiple columns that need to be indexed, for example, there are both GROUP BY and ORDER BY:
SELECT student_id, COUNT(*) AS num FROM student_info 
GROUP BY student_id 
ORDER BY create_time DESC 

If you create indexes for student_id and create_time respectively, you will find that only the index of student_id is actually used, and the execution speed is slow. Because multiple single-column indexes will only take effect in a multi-condition query (MSOL will select the most restrictive one as the index), it is best to create a joint index when performing a multi-condition joint query.

Normally we should create a joint index according to the order of SQL execution. Since group by is executed first, we should create a joint index of (student_id, create_time) instead of (create_time, student_id). Let me mention here that if the (create_time, student_id) sequential joint index and the student_id single-column index coexist, the student_id single-column index will still be used, because the group by is executed first.

3. Frequently DISTINCT columns

Sometimes we need to deduplicate a certain field. Using DISTINCT, creating an index on this field will also improve query efficiency.

4. Things to note when creating an index during multi-table JOIN connection operations

  1. First of all, the number of connection tables should not exceed 3 as much as possible, because each additional table is equivalent to adding a nested loop, and the order of magnitude growth will be very fast, seriously affecting the efficiency of the query.
  2. Secondly, create an index on the WHERE condition, because WHERE is the filter for data conditions. If the amount of data is very large, filtering without WHERE conditions is very scary.
  3. Finally, create an index on the field used for the join. And the type of this field must be consistent in multiple tables. If function conversion is used, the index will be invalid.

5. The data type of the column should be as small as possible

The type size we are talking about here refers to the size of the data range represented by the type.

  • The smaller the data type, the faster the comparison operation at query time
  • The smaller the data type, the less storage space the index takes up, and more records can be placed in one data page, thereby reducing the performance loss caused by disk I/O, which means that more data pages can be Cache in memory to speed up reading and writing efficiency

This suggestion is more applicable to the primary key of the table , because not only the primary key value will be stored in the clustered index, but also the primary key value of a record will be stored in all other secondary index nodes. If the primary key uses smaller data type, which means saving more storage space and more efficient I/O

6. Create index using string prefix

Assuming that our string is very long, storing a string will take up a lot of storage space. When we need to index this string column, two problems arise:

  • String comparison is slow
  • Takes up a lot of storage space

We can create an index by intercepting the previous part of the field, which is called a prefix index. In this way, although the location of the record cannot be accurately located when searching for a record, the location of the corresponding prefix can be located, and then the complete string value can be returned to the table based on the primary key value of the record with the same prefix. It not only saves space, but also reduces string comparison time.

Calculate the hash degree of interception of different lengths (selective)

For example, detect the hash degree of intercepted address length:

select count(distinct left(列名,索引长度)) / count(*) from 表;

The closer to 1, of course, the better the effect, indicating a high hashing degree.

7. Columns with high distinction (high hashability) are suitable as indexes

The cardinality of a column refers to the number of unique data in a column. For example, a column contains the values ​​2, 5, 8, 2, 5, 8, 2, 5, 8. Although there are 9 records, the cardinality of the column is 3. That is to say, when the number of record rows is certain, the greater the cardinality of a column, the more dispersed the values ​​in the column; the smaller the cardinality of the column, the more concentrated the values ​​in the column are.

The cardinality index of this column is very important and directly affects whether we can effectively utilize the index. Because the value of the index is to help you locate quickly, if there is a lot of data that needs to be located, then the index loses its use value, such as the usual gender field.

You can use a formula to calculate the degree of distinction. The closer it is to 1, the better. Generally, if it exceeds 33%, it is considered a relatively efficient index, so unique fields are suitable for index creation.

select count(distinct 列名)/count(*) from 表名

8. Place the most frequently used columns on the left side of the joint index

This also allows fewer indexes to be created. At the same time, due to the "leftmost prefix principle", the usage of joint indexes can be increased.

9. When multiple fields need to be indexed, joint indexes are better than single-value indexes.

Not suitable for creating indexes

1. It is best not to use indexes for tables with small data volumes

If the table has too few records, such as less than 1,000, then there is no need to create an index. There are too few table records, and whether to create an index has little impact on query efficiency. It is even said that the query may take less time than traversing the index, and the index may not produce an optimization effect.

2. Avoid creating too many indexes on frequently updated tables

  • It is not necessary to create an index for frequently updated fields, because when the data is updated, the index must also be updated. If there are too many indexes, it will cause pressure on the server during updates, thus affecting efficiency.
  • Avoid creating too many indexes on frequently updated tables, and have as few columns as possible in the index. Although the query speed is improved at this time, the speed of updating the table will also be reduced.

3. Do not create indexes on columns with a large amount of duplicate data.

Create indexes on columns with many different values ​​that are often used in conditional expressions. However, if there is a large amount of duplicate data in the field, there is no need to create an index. For example, the gender field in the student table only has two values: male and female, so there is no need to create an index. If you create an index, not only will it not improve query efficiency, but it will seriously reduce the data update speed.

3. Limit the number of indexes

In actual work, we also need to pay attention to balance. The more indexes, the better. We need to limit the number of indexes on each table. It is best to have no more than 6 indexes on a single table . reason:

  1. Each index requires disk space. The more indexes, the more disk space is required.
  2. Indexes will affect the performance of INSERT, DELETE, UPDATE and other statements, because when the data in the table changes, the index will also be adjusted and updated, which will cause a burden.
  3. When the optimizer chooses how to optimize a query, it will evaluate each available index based on unified information to generate the best execution plan. If there are many indexes that can be used for the query at the same time, it will increase The MySQL optimizer generates execution plan times that reduce query performance.

Guess you like

Origin blog.csdn.net/qq_62767608/article/details/132018029