MySQL index 1 - basic concepts and index structure (B tree, R tree, Hash, etc.)

Table of contents

Index (INDEX) basic concept

Index Structure Classification

B+Tree tree index structure

Hash index structure

Full-Text index

R-Tree index


Index (INDEX) basic concept

what is index

An index is an ordered data structure that helps MySQL efficiently retrieve data

Creating indexes for some columns in the database table is to sort the values ​​of some columns in the database table through different data structures

After indexing the columns, in addition to maintaining the data, the database also maintains data structures that satisfy specific search algorithms. These data structures point to the data in a certain way, so that fast queries can be implemented on these data structures. This data structure is the index

The role of the index

Indexes can change unordered data into ordered data, enabling quick access to specific information in database tables

Advantages and disadvantages

advantage

Improve the efficiency of data retrieval and reduce the IO cost of the database

Sort data through indexes, reduce the cost of data sorting, and reduce CPU consumption

shortcoming

Indexes take up space

The index improves the query efficiency of the table, but reduces the speed of updating the table (Insert, Update, Delete)

Indexing is only a factor to improve efficiency. If MySQL has a table with a large amount of data, you need to spend time researching the best index (that is, you need to study which fields to index to maximize efficiency, because a query statement will only refer to to an index, and it is generally recommended that the number of indexes established for a table should not exceed 5)


Index Structure Classification

Index structures are mainly divided into four categories

B+Tree index - (B+ tree)

The most common index type, most storage engines support this index

Hash index - (Hash table)

The underlying data structure is implemented with a hash table. Only queries that exactly match index columns are valid, and range queries are not supported.

Full-Text Index - (Inverted Index)

Also known as full-text index, it is a way to quickly match documents by establishing an inverted index

R-Tree index (R-Tree tree)

Also known as spatial index, it is a special index type of MyISAM engine, which is mainly used for geographic location data and is rarely used

Storage engine support for different indexes (default B+Tree index)

 In the MySQL database, it is the Memory engine that supports the Hash index; while InnoDB has the function of adaptive Hash, which is automatically constructed according to the B+tree index under specified conditions.

B+Tree tree index structure

The B+Tree tree is derived from the binary tree → red-black tree (self-balancing binary tree) → B-Tree tree fireworks. We introduce these three data structures before introducing the B+Tree tree

binary tree

Each node of the binary tree has at most two child nodes (two subtrees); and the two child nodes are ordered

Take a single node as an example: the left child node is smaller than itself, and the right child node is larger than itself

shortcoming

  1. In the case of a large amount of data, the hierarchy is deep and the retrieval speed is slow
  2. Easy to form leaning trees (leaning left or right)

 How Binary Trees Work

 Data insertion of binary tree (insert 30, 40, 20, 19, 21, 39, 35 in sequence)

 Binary tree data traversal

 Data lookup of binary tree (find 39, 21, 25)

 Data deletion of binary tree (delete 19, 39, 30 in sequence)

Red-black tree (self-balancing binary tree)

Red-black tree is a variant of binary tree, which can solve the problem of inclined tree when inserting values ​​in binary tree

Any node has a color (red or black), and the color is used to ensure the balance of the tree when inserting and deleting

The root node must be black; Null nodes are considered black; the two leaf nodes of each red node are black

Consecutive red nodes cannot appear on the path from each leaf node to the root

The number of black nodes passed by any node to reach the leaf node must be equal

When inserting and deleting operations in the red-black tree, the tree structure will be repaired by left-handed, right-handed, and recolored to maintain the balance of the tree

shortcoming

  1. In the case of a large number of insertion and deletion operations, it may cause frequent tree reconstruction and affect performance
  2. The implementation of the red-black tree is more complicated, and the color and balance of the nodes need to be maintained
  3. The red-black tree is also a binary tree in essence. In the case of a large amount of data, the hierarchy is deeper, and the retrieval speed will decrease.

How Red-Black Trees Work

The data insertion of the red-black tree (insert 30, 40, 20, 19, 21, 39, 35 in sequence) uses right rotation

Data traversal of red-black tree

Data lookup of red-black tree (find 39, 21, 25)

Data deletion of red-black tree (delete 19, 39, 30 in sequence)

B-Tree tree (multi-way balanced search tree)

A Node node of a binary tree can only store a Key and a Value, and has only two child nodes; compared with a multi-way tree, a Node node can store more Keys and Values, and can carry more child nodes, and the height of the tree will increase. lower than the binary tree

How many Keys and Values ​​can be stored in a node of the B-Tree tree, and how many child nodes can be determined by the maximum degree (MAX-Degree is also called the order)

A B-Tree of order m

       Each node in the tree has at most m child nodes, m-1 Key and Value (the pointers of the two subtrees sandwich a Key and Value)

       The root node of the tree has at least one Key and Value, and at least two child nodes

shortcoming

Both leaf nodes and non-leaf nodes of the B-tree store data, making the amount of pointers saved by non-leaf nodes smaller

If you store a large amount of data, you need to increase the height of the tree, resulting in more IO operations and lower query performance

How the B-Tree tree works

B-Tree tree data insertion Max-Degree is 3 (insert 30, 40, 20, 19, 21, 39, 35 in sequence)

Data traversal of B-Tree tree

Data lookup of B-Tree tree (find 39, 21, 25)

Data deletion of B-Tree tree (delete 19, 39, 30 in sequence)

B+Tree tree

The B+Tree tree is a variant of the B-Tree tree, and it is also a multi-way search tree. The definition is basically the same as that of the B-Tree

B+Tree only stores data in leaf nodes, and all elements will appear in the leaf nodes. All leaf nodes form a one-way linked list; leaf nodes arrange data according to size, and arrange adjacent leaf nodes according to size

Non-leaf nodes do not store data, but only store keys, which only serve as indexes. Under the same amount of data, B+Tree trees are shorter and stronger

How the B-Tree tree works

B+Tree tree data insertion Max-Degree is 3 (insert 30, 40, 20, 19, 21, 39, 35 in sequence)

Data traversal of B+Tree tree

Data lookup of B+Tree tree (find 39, 21, 25)

Data deletion of B+Tree tree (delete 19, 39, 30 in sequence)

MySQL 's B+Tree index structure

The index data structure of MySQL optimizes the classic B+Tree. On the basis of the original B+Tree, a linked list pointer pointing to the adjacent leaf nodes is added. All the leaf nodes form a two-way linked list, which improves the traversal speed.

MySQL queries the corresponding key value (Key) according to the query condition, and then extracts the data corresponding to the key value (Value)

Hash index structure

The hash index uses a certain hash algorithm to convert the key value into a new Hash value, and maps the hash value to a bucket, which stores pointers to all data rows with the same hash value, and then stores them in the Hash table middle;

When querying, MySQL will first calculate the hash value of the query condition through the hash function, look up the corresponding bucket in the Hash table, and then look up the corresponding data row in the corresponding bucket

hash collision

If two or more key values ​​are mapped to the same slot (bucket), they will have a hash conflict, which is resolved through a linked list

 features

  1. Hash indexes can only be used for peer-to-peer comparisons (=, in, etc.), and range queries (between, >, <, etc.) are not supported
  2. Hash index cannot be used to complete the sorting operation; because the Hash index stores the Hash value after Hash calculation, the size of this value is not necessarily exactly the same as the key value before the Hash operation
  3. The Hash index cannot avoid table scanning, that is, the entire table must be scanned every time; because the Hash index stores the result and the corresponding row pointer information in a Hash table after the key value is hashed, because different index keys may There is the same Hash value, that is, a hash conflict, so the record hops of the data satisfying a certain Hash key value cannot be directly queried from the Hash index. It is still necessary to compare the actual data in the access table and get corresponding result
  4. For joint indexes, Hash cannot use part of the index key query (either all of them are used, or all of them are not used)
  5. Hash only needs to do one operation to find the bucket where the data is located; it is not searched from the order of the root and leaf nodes like the tree structure; so the query efficiency of the Hash index is theoretically higher than that of B+Tree; but for In the case of a large number of the same Hash value, the performance is not necessarily higher than B+Tree

Full-Text index

Build a Full-Text index by building an inverted index (Inverted Index) to improve data retrieval efficiency

Inverted index is a data structure that maps words/Chinese characters in documents to their occurrence positions, and is mainly used to solve the problem of judging whether a certain character/Chinese character is contained in the value of a field

For simple business or business with small amount of data, we can use the Like() keyword to judge; but for business with large amount of data, the efficiency of using Like will be greatly reduced

Different storage indexes support Full-Text indexes

Before MySQL 5.6, only the MYISAM storage engine supports full-text indexing

After MySQL 5.6, InnoDB can support full-text indexing; however, it only supports full-text indexing in English, not in Chinese; later, it supports Chinese indexing through a built-in tokenizer (ngram)

Configure the minimum length of ngram

Add the following fields to the MySQL configuration file

ft_min_word_len = 2 #This minimum length is the minimum length of word segmentation, the default is 2

That is: for a sentence, it can be divided into multiple Chinese character groups, and each Chinese character group has at least 2 Chinese characters

    I want to learn database can be divided into words: I want to learn three groups of database

Generally, the ngram will not be set very small, if it is small, it will take up a lot of space, so we generally do not modify this minimum length, and the default is 2

The process of full-text indexing

The user enters the content to be searched → SQL execution engine → ngram performs word segmentation on the searched content → puts the word after word segmentation in the inverted index to search → returns the corresponding record

The tokenizer ngram will tokenize the value in the field when indexing; it will also tokenize the content to be found when querying

R-Tree index

There are a variety of data structures for building spatial indexes, such as quadtrees and R-Tree trees

In MySQL, the spatial index is built through the R-Tree tree, which is a technology to speed up the query of spatial data.

R-tree divides spatial data into a series of rectangular areas, each node can represent a rectangular area, and can contain other nodes or data items. This hierarchical structure allows MySQL to locate the required data faster in spatial queries, reducing the search range, thereby improving query performance

For example:

A certain field in a table stores the latitude and longitude location information of a local restaurant. Now we need to find a restaurant within 1 kilometer based on our location.

We can find the latitude and longitude range within 1 kilometer by calculating our position, and then query the value in the table that satisfies this latitude and longitude; in order to speed up the retrieval efficiency, we can create a spatial index for the field that stores the latitude and longitude location information

The construction process of R-Tree - R-tree is to extend the idea of ​​B-tree to multi-dimensional space

1. Data division

All data items also become objects (points, lines or areas) are treated as a single rectangle

2. Build a leaf node (the leaf node is the bottom node of the R tree)

Group the divided rectangles and build leaf nodes; each leaf node contains multiple objects and their corresponding rectangles

3. Merge leaf nodes

When the number of leaf nodes exceeds the maximum capacity specified by R-Tree, R-Tree will try to merge adjacent leaf nodes to reduce the height of the tree and improve query efficiency

4. Build non-leaf nodes

Construct the merged leaf as a new non-leaf node; the non-leaf node is also a rectangle, including the rectangular range of all its child nodes

5. Recursive construction

Repeat the above operations until the root node of the entire R tree is constructed (the topmost node of the R tree will contain all data ranges)

The specific R tree construction method can refer to the following articles

From B tree, B+ tree, B* tree to R tree_v_JULY_v's blog - CSDN blog https://blog.csdn.net/v_JULY_v/article/details/6530142

Guess you like

Origin blog.csdn.net/m0_49864110/article/details/132092917