MySQL Database - A First Look at MySQL Database Indexes

An index is a data structure that the storage engine uses to find records quickly.

The storage type of the index:

(1) B-Tree index 

B-Trees are sequentially organized and stored for index columns, so they are very suitable for finding range data. An index sorts multiple values ​​based on the order of the columns when the index was defined in the CREATE TABLE statement.

limit:

The index cannot be used without starting the search by the leftmost column of the index.

If there is a range query (like) of a column in the query, then all columns to the right of it cannot use the index to optimize the lookup.

(2) Hash index

Based on the hash table implementation, only queries that exactly match all columns of the index are valid. Because the index itself only needs to store the hash value of the index column, the structure of the index is very compact, which also makes the hash index search speed very fast.

limit:

cannot be used for sorting

Partial index column match lookups are not supported

Range queries such as where price > 100 are not supported

(3) There are also R-Tree, full-text index, fractal tree index, Patricia tries, etc.

The advantages of indexing

  • First, by creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.
  • Second, it can greatly speed up data retrieval, which is the main reason for creating indexes.
  • Third, it can speed up table-to-table joins, especially in terms of achieving referential integrity of data.
  • Fourth, when using the grouping and sorting clauses for data retrieval, the time for grouping and sorting in the query can also be significantly reduced.
  • Fifth, by using the index, the optimization hider can be used in the query process to improve the performance of the system.

Disadvantages of Indexing

  • First, creating and maintaining indexes takes time, which increases with the amount of data.
  • Second, indexes need to occupy physical space. In addition to the data space occupied by the data table, each index also occupies a certain amount of physical space. If a clustered index is to be established, the required space will be larger.
  • Third, when adding, deleting and modifying the data in the table, the index should also be dynamically maintained, which reduces the speed of data maintenance.

What situations need to be indexed?

  • On columns that need to be searched frequently, it can speed up the search;
  • On the column as the primary key, enforce the uniqueness of the column and the arrangement of the data in the organization table;
  • On the columns that are often used in the connection, these columns are mainly some foreign keys, which can speed up the connection;
  • Create an index on a column that often needs to be searched based on a range, because the index is already sorted and its specified range is contiguous;
  • Create an index on a column that often needs to be sorted, because the index is already sorted, so that the query can use the sorting of the index to speed up the sorting query time;
  • Create indexes on columns that are often used in the WHERE clause to speed up conditional judgment.

In what cases do not need to add an index?

  • First, indexes should not be created on columns that are rarely used or referenced in queries. This is because, since these columns are rarely used, indexing or no indexing does not improve query speed. On the contrary, due to the addition of indexes, the maintenance speed of the system is reduced and the space requirement is increased.
  • Second, indexes should not be added to columns with few data values. This is because because these columns have few values, such as the gender column of the personnel table, in the query results, the data rows of the result set account for a large proportion of the data rows in the table, that is, the data that needs to be searched in the table The proportion of rows is large. Increasing the index does not significantly speed up the retrieval speed.
  • Third, no indexes should be added to columns defined as text, image and bit data types. This is because the amount of data in these columns is either quite large or has very few values. 
    Fourth, indexes should not be created when the modification performance is much greater than the retrieval performance. This is because modification performance and retrieval performance are contradictory. When increasing the index, the retrieval performance will be improved, but the modification performance will be reduced. When reducing the index, it will improve the modification performance and reduce the retrieval performance. Therefore, indexes should not be created when the modification performance is much greater than the retrieval performance.

Index strategy:

1. Single column index: ordinary index index, unique index unique index, primary key index

2. Combined index: an index contains multiple columns

3. Prefix index:

Sometimes it is necessary to index long and long character columns, which will make the index large and slow. One strategy is to use a simulated hash index, that is, calculate the hash value of each column and persist it, and treat the column as an index column. In addition, some characters at the beginning can be indexed, which can greatly save the index space. Create a prefix index:

ALTER TABLE table_name ADD KEY (column_name(prefix_length))

However, myql cannot use prefix indexes to do ORDER BY and GROUP BY.

Applicable scene:

For very small tables, a simple full table scan is more efficient in most cases.

For medium to large tables, indexes are very effective. For columns with many values, it is more effective to build indexes on the columns involved in where and order by.

For very large tables, the cost of creating and using indexes will increase accordingly. For TB-level data, it is not meaningful to locate a single record, so block-level metadata technology is often used instead of indexes.

The use of the index, take the B-Tree index as an example:

1. There are three ways to create an index:

CREATE INDEX index_name ON table_name(column_name1,column_name2...)
ALTER table tableName ADD INDEX index_name(column_name1,column_name2...)
CREATE TABLE mytable(  
 
ID INT NOT NULL,   
 
username VARCHAR(16) NOT NULL,  
 
INDEX [indexName] (username(length))  
 
);

2. Delete the index:

ALTER TABLE table_name DROP INDEX index_name

Index optimization tips:

1. Try to avoid the Null value judgment of the queue field in the where clause, otherwise the engine will give up the use of the index and perform a full table scan.

  It is recommended to use not null constraints and default values

2. Indexes cannot be used for negative conditional queries: select * from order where status !=0; can be optimized as an in clause.

3. Indexes cannot be used for leading fuzzy queries: select * from stu where name like '%xx', non-leading can be like 'xx%'

4. Fields with little data discrimination should not use indexes, such as gender columns. In general, indexes can be used when 80% of the data can be filtered.

5. Try to avoid using or connection in the where clause, which will cause additional index merge operations, and the performance is not as good as full table scan. It is recommended to use the union all statement.

6. The performance of a single query hash index is better.

7. If it is clear that only one result is returned, limit 1 can improve efficiency

select * from user where login_name =? limit 1, the database can actively stop the cursor movement.

8. Try to use numeric fields, and use numeric types to store strings that only contain numbers, because the engine will compare each character in the string one by one when processing queries and connections, and for numeric types, you only need to compare it once. enough.

9. Use "EXPLAIN + query statement" to see the usage of the index. The type item is the type of the index. If the type is "index_merge", it means that there is an index merge.

10. How to choose the appropriate index column order?

Put the column with higher selectivity on the left, and select the calculation method: COUNT(DISTINCT column_name)/COUNT(*), that is, the column with rich content is placed on the left.




Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325475291&siteId=291194637