MySQL advanced index (super detailed!!!)

1. What is an index?

The index is like the directory of a dictionary.
We usually go to the directory to find the key radicals or letters and then look it up. It is
much faster than looking up the dictionary directly.
Insert picture description here

2. Why should there be an index?

However, when we use the mysql database, we also query like a dictionary with an index. It must be much faster

2.1 Question:

1.Where is mysql data stored?

Disk

2. Querying data is slow, where is it usually stuck?

I

3. Go to the disk to read the data, how much is used to read?

Disk read ahead

The principle of locality: data and programs have a tendency to cluster together, and the data that has been visited before is likely to be queried again, spatial locality, temporal locality

Disk read-ahead: When there is data interaction between memory and disk, there is generally a smallest logical unit, page. The page size is generally considered by the operating system, 4k or 8k, and when we are interacting with data, we can take an integer multiple of the page to read

The innodb storage engine reads data every time, reading 16k

4. Where is the index stored?

Disk, when querying data, the index will be loaded into the memory first

5. What information is needed when the index is stored? What field values ​​need to be stored?

key: the value stored in the actual data row

File address

offset: offset

6. What kind of data structure should be used to store data in this format?

key-values

Hash table, tree (binary tree, red-black tree, AVL tree, B tree, B+ tree)

7. The mysql index system is not stored in the format just mentioned, why?

OLAP: Online Analytical Processing----Analyze massive historical data and generate decision-making strategies----Data Warehouse—Hive

OLTP: online transaction processing-requires a very short time to return the corresponding results-database-relational database (mysql, oracle)

Three, mysql index data structure

3.1 Hash table:

Insert picture description here
The structure of HashMap array and linked list is not suitable for indexing reasons:

1. Hash conflict will cause uneven data hashing, will generate a large number of linear queries, and waste time

2. Range query is not supported. When performing range query, it must be traversed one by one

3. The requirements for memory space are relatively high


Advantages: if it is equivalent query, very fast


Is there a hash index in mysql?

1. The memory storage engine uses a hash index

2.innodb supports adaptive hash

create table test(id int primary key,name varchar(30))
engine='innodb/memory/myisam'
-- 5.1之后默认innodb

3.2 Tree:

There are many data structures such as trees. The common ones are:
binary tree, BST, AVL, red-black tree, B tree, B+ tree

① Binary tree: unordered insertion
Insert picture description here
This is the structure diagram of our tree, but the data insertion of the binary tree is unordered, which means that when you need to find it, you still have to traverse and find one by one.

②BST (Binary Search Tree): The
inserted data is ordered, the left subtree must be smaller than the root node, and the right subtree must be larger than the root node--------Use binary search to improve efficiency
Insert picture description here
. If you want to query data, The binary search can be used to quickly narrow the scope and reduce the time complexity.
However, if the insertion order is ascending or descending, the shape of the tree will become as follows:
Insert picture description here
At this time, the binary search tree will degenerate into a linked list, and the time complexity will be Becomes O(n)

③AVL: Balanced Binary Tree
In order to solve the above problems, the tree is balanced by rotating left or right. The
shortest subtree and the longest subtree can not exceed 1 in height
Insert picture description here
. As we can see from the figure, when inserting sequentially, it will automatically rotate. In order to achieve a balance,
but will compensate for the query performance improvement by inserting performance loss.

When we insert a lot of data, and the query is very small, because the inserted data will rotate, it will also consume a lot of time.
④ Red-black tree (solving read and write The same number of requests)
is also the behavior of balancing the tree through left and right rotation, and changing the color. The
longest subtree does not exceed twice the shortest subtree. The
Insert picture description here
query performance and the insertion performance are approximately balanced,
but as the data is inserted, the tree is discovered The depth of the tree will become deeper and the depth of the tree will become deeper, which means that the more IO times, the efficiency of data reading will be affected

⑤ B-tree
In order to solve the problem of excessive data insertion and deepening of the tree depth, we use B-tree
to turn the original ordered binary tree into an ordered multi-tree.
Insert picture description here
Example: If you want to query select * from table where id=14?

  1. The first step is to load the disk one into the memory and find that 14<16, look for the address disk 2
  2. The second step is to load disk two into the memory, find 14>11, look for the address disk 7
  3. The third step is to load Disk Seven into the memory and find that 14=14, read data, take out data, and finish
    thinking: Is the B-tree perfect?
    Question 1: B-tree does not support fast search of range query. If we query a range of data and find a boundary of the range, we need to go back to the root node and traverse the search again. We need to traverse multiple times from the root node, even if we find the range Another boundary, query efficiency will be reduced.
    Question 2: If data stores row records, the size of the row will increase as the number of columns increases. At this time, the amount of data that can be stored in a page will decrease, the tree will be correspondingly taller, and the number of disk IOs will increase.
    Thinking 2: How many records can a three-level B-tree store?
    Answer: Assuming that a data is 1k, the innodb storage engine reads 16k data at a time, and the three layers are 16 16 16 = 4096;
    but often in development, the data of a table is much larger than 4096, should we continue to add layers? Wouldn't it increase IO

4. Why use B+ tree?

When actually storing table data, how to store it?
Key
complete data row
transformation B+ tree
Insert picture description here
B+ tree improves the B tree, putting all the data in the leaf nodes, and using two-way pointer connections between the leaf nodes, the lowest leaf node forms a two-way ordered linked list.
For example: query range select * from table where id between 11 and 35?

  1. The first step is to load the disk one into the memory, find that 11<28, look for the address disk 2
  2. The second step is to load disk two into the memory, find 10>11>17, look for the address disk 5
  3. The third step is to load disk five into the memory, find that 11=11, read data
  4. The fourth step, continue to query to the right, read disk 5, find that 35=35, read the data between 11-35, and it can be seen from the end
    that this range query is much faster than the B-tree

Compare B-tree and B+tree?

  • Only put data in leaf nodes

  • No data is stored in non-leaf nodes

  • Each node of the B+ tree contains more nodes. The advantage of this is that the height of the tree can be reduced, and the data range can be changed into multiple intervals. The more intervals the faster the query

Question: Use int or varchar when creating an index?

Answer: It depends on the situation, but remember to make the key as small as possible

Five, index creation

Before creating the index, let me talk about the storage engine
storage engine: it represents the different manifestations of different data on the disk. You can
observe the mysql disk file and you will find
innodb: innodb data and indexes are stored in one file. idb
myisam: The index of myisam is stored in the .MYI file, and the data is stored in the .MYD

5.1 clustered index and non-clustered index

Concept: Judging whether it is a clustered index depends on whether the data and index are in a file
innodb:

  1. There can be only one clustered index, but there are many non-clustered indexes
  2. When inserting data into innodb, it must contain the key value of an index
  3. The key value of this index can be the primary key. If there is no primary key, then it is a unique key. If there is no unique key, then it is a self-generated 6-byte rowid.

myisam: non-clustered index

MySQL—innodb—B+tree
index and data are stored together, find the index to read the corresponding data
Insert picture description here
MySQL—myisam—B+ tree
index and the address of the stored data together, find the index to get the address value, and then Find the corresponding data by address
Insert picture description here

5.2 Back to the table

Next, I will create a case table to show you

CREATE TABLE user_test(
id INT PRIMARY KEY AUTO_INCREMENT,-- id为主键
uname VARCHAR(20) ,
age INT,
gender VARCHAR(10),
 KEY `idx_uname` (`uname`) -- 索引选择为名字
)ENGINE = INNODB;

INSERT INTO user_test VALUES(1,'张三',18,'男');
INSERT INTO user_test VALUES(NULL,'马冬梅',19,'女');
INSERT INTO user_test VALUES(NULL,'赵四',18,'男');
INSERT INTO user_test VALUES(NULL,'王老七',22,'男');
INSERT INTO user_test VALUES(NULL,'刘燕',16,'女');
INSERT INTO user_test VALUES(NULL,'万宝',26,'男');
select * from user_test where uname = '张三';
-- 当我们表中有主键索引的时候,我们再去设置一个uname为索引,那么此时这条sql语句的查询过程应该如下:

Insert picture description here
First query the id based on uname, and then query the row information based on the id.
This operation walks two B+ trees, which is to return to the table.
After querying the key value of the clustered index based on the ordinary index, cluster it according to the key value. Obtaining data from the index,
we can find that such an operation is a waste of time, so in our daily operations, try to reduce the number of times back to the table

5.3 Covering Index

select id,uname from table where uname = '张三';
-- 根据uname 可以直接查询到id,uname两个列的值,直接返回即可
-- 不需要从聚簇索引查询任何数据,此时叫做索引覆盖

5.4 Leftmost match

Before talking about the leftmost match, let's talk about a few nouns.
Primary key (usually a column) -------->Joint primary key (multiple columns)
Index -------->Joint index ( May contain multiple index columns)

-- 假设有一张表,有id,name,age,gender四个字段,id是主键,name,age是组合索引列
-- 组合索引使用的时候必须先匹配name,然后匹配age

select * from table where name = ? and age = ? ;-- 生效
select * from table where name = ?;-- 生效
select * from table where age = ? ;-- 不生效
select * from table where age = ? and name = ? ;-- 生效

--在mysql内部有优化器会调整对应的顺序

5.5 Index Push Down

After mysql5.7, a feature supported by default is
an example:

select * from table where name = ? and age = ? ;
-- mysql里的三层架构:
-- 客户端:JDBC
-- 服务端:server
-- 存储引擎:数据存储
在没有索引下推之前,根据name从存储引擎中获取符合规则的数据,在server层对age进行过滤
有索引下推之后,根据name、age两个条件从存储引擎中获取对应的数据

Analysis: There is the benefit of index push-down. If we have 50 data, we will get 10 data through filtering. If there is no index push-down, we will first get 50 and then exclude to get 10. And with push-down, We will filter them into 10 directly in the storage engine

Guess you like

Origin blog.csdn.net/zjdzka/article/details/112148860