In our work, we often encounter scenarios where slow SQL needs to be optimized. Often, our first reaction is: add an index! Indeed, adding an index is indeed much faster than before, but if you do not understand the underlying principles and blindly add an index, it will often backfire and not a long-term solution.

Database classification

Insert picture description here

Relational Database

Advantages : strong query function, high data consistency, high data security, and support for secondary indexes
Disadvantages : performance is slightly inferior to MongoDB, especially for data above one million, it is prone to slow queries

Non-relational database (NoSQL not only sql)

Advantages : high performance, strong scalability, flexible mode, especially outstanding in high concurrency scenarios
Disadvantages : data consistency, data security, and query complexity still have a certain gap with relational databases

Comprehensive understanding of database indexes

1. The nature of the index

1. What is an index

Insert picture description here
Through Baidu Encyclopedia, we can know that an index is actually a storage structure. The "directory"

data that stores all the data in the database is stored on the disk in the form of a file , and each row of data has its disk address. If there is no index, to retrieve a piece of data from 10 million rows of data, you can only traverse all the data in this table one by one until you find this piece of data. That would be too inefficient.

But if we add a directory to the database table and query the desired data based on this directory, just like searching a Chinese dictionary, query the desired data through pinyin or strokes, then the efficiency will be greatly improved.

2. How to create an index

Basic grammar

添加普通索引：（也叫非唯一索引，是最普通的索引，没有任何的限制）
ALTER TABLE <表名> ADD INDEX <索引名> ( <字段名，联合索引，多个字段用“,”分开> ) USING BTREE;

添加唯一索引：（唯一索引要求键值不能重复）
ALTER TABLE <表名> ADD UNIQUE <索引名> ( <字段名，联合索引，多个字段用“,”分开> );

添加主键索引：（主键索引是一 种特殊的唯一索引，它还多了一个限制条件，要求键值不能为空）
ALTER TABLE <表名> ADD PRIMARY KEY (<字段名>);

添加全文索引：（针对比较大的数据，比如我们存放的是消息内容，有几KB的数据的这种情况，如果要解决 like 查询效率低的问题，可以创建全文索引。只有文本类型的字段才可以创建全文索引，比如 char、varchar、text）
MyISAM 和 InnoDB 支持全文索引。
ALTER TABLE <表名> ADD FULLTEXT <索引名> (<字段名>);
全文索引的使用
SELECT * FROM <表名> WHERE MATCH(<字段名>) against ('明天天气怎么样' IN NATURAL LANGUAGE MODE);

删除索引
DROP INDEX <索引名> ON <表名>;

查询表中索引信息
SHOW INDEX FROM <表名>;
可以通过添加 \g 来格式化输出信息。
例：show index from table_name \g

Note: When adding a full-text index, there is a parameter that needs attention: Click here for the detailed meaning of WITH PARSER ngram

2. Deduction of index storage model

2.1 Binary search

When I was just learning java before, there was a classic problem: to query a certain value in an array with the least number of times. What I learned at the time was to sort the array first, then use binary search (binary search), and use recursion to find a given value.

Ordered array
binary search will reduce the searched candidate data by half, so the efficiency is doubled. If I were to design a database index, with my current knowledge level, I might choose an ordered array as the storage structure of the database index. But here comes the problem. If an ordered array is used as an index, the query efficiency may be higher, but the update efficiency is very low, because the index corner of the array needs to be maintained. So ordered arrays are only suitable for storing static data.
Singly linked list
In order to support frequent modifications, such as inserting data, we need to use a linked list. In the case of a linked list, if it is a singly linked
list, its search efficiency is still not high enough
BST (Binary Search Tree) binary search tree was born.

2.2 Binary Search Tree (BST Binary Search Tree)

The characteristics of binary search tree:

All nodes in the left subtree are smaller than the parent node, and all nodes in the right subtree are larger than the parent node. After projecting onto the plane, it is an ordered linear table.

As shown in the figure
, the node of the left subtree <parent node The node of the
right subtree> parent node
Insert picture description here

Advantages of binary tree:

Binary search trees take into account the advantages of ordered arrays and linked lists: Binary search trees can achieve fast search and fast insertion.

Disadvantages of binary tree:

Its search time is related to the depth of the tree. In the worst case, the time complexity will degenerate to O(n).

What is the worst case?
Insert picture description here
It can be seen that our binary tree has become a linked list structure (we call this tree "oblique tree"). In this case, the purpose of speeding up retrieval cannot be achieved, and there is no difference between sequential search efficiency. If I want to check 6, then it needs to traverse all elements to find the maximum value. This is a linear time operation, or O(n) time

What caused it to tilt?
Because the depth difference between the left and right subtrees is too large, the left subtree of this tree has no nodes at all—that is, it is not balanced enough.
So, do we have a more balanced tree that is not so different in depth between the left and right subtrees?
This is a balanced binary tree, called Balanced binary search trees , or AVL tree (AVL is
the name of the person who invented this data structure).

2.3 Balanced binary tree (AVL tree) (left-handed, right-handed)

Definition: The absolute value of the depth difference between the left and right subtrees cannot exceed 1.

For example, the depth of the left subtree is 2, and the depth of the right subtree can only be 1 or 3.
At this time, we will insert 1, 2, 3, 4, 5, and 6 in order. This must be the case, and it will not become an "oblique tree".
Insert picture description here
How does a balanced binary tree ensure "balance"?
Example:
Insert 1, 2, 3. When we insert 1 and 2, according to the definition of the binary search tree, 3 must be on the right of 2. At this time, the depth of the right node of root node 1 will become 2, but the depth of the left node is 0. Because it has no child nodes, it violates the definition of a balanced binary tree.
What should we do then? Because it is a right node under the right node, right-right type , so at this time we have to lift 2 up. This operation is called left rotation .
Insert picture description here
The same is true for the left subtree, which becomes **"left-left"**, and the operation is called right
rotation. So in order to maintain balance, the AVL tree performs a series of calculation and adjustment operations when inserting and updating data.