Get to know the MySQL database index data structure that is no longer a sufficient Meng Meng da

Like multi-point bold, is as big production! Micro-channel public number search [ Damocles pen ] for more resources, attached to the end the two-dimensional code!
Github address: https: //github.com/stt0626/JavaGreat continued to update the information included

Foreword

  When it comes to database optimization blurt out is to add the index, if not with the venue and "unlock the database Series" database index is ready for you! If you are like me, I never understood the underlying data structures, database indexes, ignorant X X fruit trees ignorant, ignorant X under a tree for you and me, please leave a message to tell me more than me, and my heart feel better at the bottom, but now I have a thorough understanding, read the whole article I believe you and I, like fog on the day with a smile: 就这? For it is this, it did not talk much immediate start!

Index data structure

  Database indexes are mainly Hash tables, binary trees, red-black tree, B tree, B + tree, we MySQL uses a B + tree!

Hash index

Introduction

  Hash index (hash index) based on a hash table implementations, for each row of data, a storage engine are calculated hash code of all of the indexed columns (hash code), the hash code is a small value, all the hash index hash code is stored in the index, while preserving a pointer to each row of data in a hash table. If you will be using the same hash code is stored in the form of a linked list, similar to HashMap, Hash index is suitable for precise queries.

For example

  The following table there
Here Insert Picture Description
  if we build the index in the name column, name the database will calculate the hash value of the name column of each row of data using a hashing algorithm and stored. Hash values are calculated as random, so there may be a conflict, if the calculation result as
Here Insert Picture Description
  Hash index data structure follows
Here Insert Picture Description

We have a SELECT id, name, age FROM t_user WHERE name = ' small stone Tim'; Such a SQL directly to 石小添a Hash value by hashing calculated, to find the record pointer corresponding to by this value, the record pointer found in the table of which line data, comparing last name whether it is a small stone to add to ensure the line is to be found.
But if we have SELECT id, name, age FROM t_user WHERE name> ' to add small stone'; such a SQL are powerless, because Hash table 支持快速的精确查询,但是不支持范围查询.

Hash index summary

  • Hash index contains only the row pointer and a hash value, without storing the field value, the value of the index can not be used to avoid the read row
  • Hash index data is not stored in the order in accordance with the index value, so it can not be used for sorting
  • Hash index portion of the indexed column does not support the match lookups because hash index is always to use all content indexed columns to calculate a hash value
  • The hash index only supports the equivalent of comparing a query, the query does not support any range
  • Access hash index data very quickly, unless there are Duoha Xi conflict (different index column values ​​there is the same hash value). When the hash collision occurs, the storage engine must traverse all of the row pointer linked list, line by line comparison, until you find all the qualifying rows
  • If the hash conflict many words, some of the cost of index maintenance operations will be high. For example, if a low selectivity (many hash collisions) in the column to establish a hash index, then when you delete a row from the table, the storage engine needs through each row in the list corresponds to the hash value, and to find delete the corresponding reference line, the more conflict, the greater the cost

Binary Tree

  Binary Tree (Binary Tree) each node has at most two sub-trees tree structure. Subtree generally referred to as "left subtree" (left subtree), and "right subtree" (right subtree). Binary tree is often used to implement a binary search tree and a binary heap.
Here Insert Picture Description

For example

Here Insert Picture Description

id add columns for storing binary tree indexes, as shown below

Here Insert Picture Description

If our data is unilateral growth eventually could become a binary tree list, we query the data in the following figure

Here Insert Picture Description

If there is a SQL SELECT id,name,age FROM tb_user WHERE id=7, create an index for that field and you are using a binary tree to maintain its looks for six times, and speed to create the index is not the same!
Binary Tree index in the index field is a continuous scene is low or when other properties, and this tree is heavily skewed imbalance, this leads us红黑树

Binary Tree Features

  • If its left subtree is not empty, then the value of the left sub-tree, all the nodes are less than the value of the root node;
  • If it is all right subtree of nodes which are greater than the value of the root node;
  • Its left and right subtrees are also binary sort tree

Red-black tree

  Red-black tree (Red Black Tree) is a node and can contain red and black self-balancing binary search tree is a balanced binary tree. Each node stores a red-black tree has nodes of color, may be red (Red) or black (Black)

Same, id columns add an index using a red-black tree is stored, as shown below, we will find will make an adjustment so that the tree is relatively balanced, low value placed on the left upper nodes larger value put on the right parent node

Here Insert Picture Description

The same as to find two data 4 and 7, as in FIG.

Here Insert Picture Description

After it is clear that we use red-black tree with respect to a binary tree, this tree is more balanced, search data faster, MySQL still do not have the data structure to maintain the index data is Why? Below to analyze thoroughly buttoned

Red-black tree malpractice

  • Currently there are six tables of data, so it is necessary to use these six stored red-black tree maintenance, then the tree height h = 4, respectively, 2,4,6,7 four nodes, there is no problem, right
  • The data we actually impossible only a few items are one million, ten million data, the red-black tree if you want to maintain one million, ten million data, which can be red-black tree of height h =? Good calculations, if we want pieces of data stored in a table 100W, 100W there is a red and black nodes, each node has two branches, the whole tree full to 2 ^ n = 1000000, n h is the depth, do the math right
  • By analyzing the top, we found that the use of red-black tree maintenance index data, this tree too deep, too deep to ~~~
  • If the data you're looking at a leaf node, then the number of queries also find many

The more times we can find data on top of the red-black tree by the majority of the height of the higher, the higher the tree query data needed, we control the height of the tree, you can control the number of queries, this is our tree to B Albert to complete, so you may wish to a cup of tea, think about the height of the base of the tree on the red-black tree control layer 3-5, and then the data is stored ten million, if you will how?

B-tree

  Red-black tree is a binary tree and a storage node data, and B is the number of nodes on a store more data on the basis of red-black tree, the so-called BTree, BTree, B trees that are the same thing , full name Balance-treetranslated 平衡多路查找树, 平衡evenly distributed to the left and right. · 多路With respect to a binary tree, the binary search tree is two-way, when looking for only two, while the B-tree multiple routes that have multiple children parent node, plug-speak
Here Insert Picture Description

Oh, this is

Here Insert Picture Description

  • Is a top node 18,25,60, 20 and 23 is a node, stores the plurality of data on the same node
  • data is the data stored on that node, if mysql is used in the B-tree data is stored in the disk address is the location of the line data we're looking for on the disk
  • To the index to find the corresponding node, find the corresponding data in the node, and then get the disk address can be found on line data

See B-tree data storage, after deposit and a total of four nodes, a 2,4 storage node, a node 1, a node 3, a node 6, 7, 1 1 to 2 hours so that the left side 2, 3 2 majority than 4 hours, so that the right side of the left side 2 4, 2, 4 in the same storage node, 6,7 large than 4, it is stored on a right side of node 4

Here Insert Picture Description

Take a look at the B-tree data, the first time we take the data directly to the root node 4, 7 to take second, to determine where to find the two nodes

Here Insert Picture Description

B-Tree Features

  • B-Tree can significantly reduce intermediate location history experienced during the process, so that faster access. This data structure is generally used index database, higher overall efficiency
  • Teeth key set distributed throughout the tree, any one keyword appears once and only at a junction point
  • Data node from left to right ordering
  • There are search may end in a non-leaf node, 叶子结点is the node degree is 0没有子结点的结点
  • Search B- tree root from the beginning, keyword (ordered) sequence of nodes within the binary search, if the hit is ended, otherwise go to the son node belongs to the scope of the query keywords; repeated until the corresponding son pointer is null, or is already a leaf node

B+Tree

  B + Tree is a B-Tree variants, MySQL B + Tree is used as the index data structure, the FIG.
Here Insert Picture Description

  • Non-leaf nodes do not store data
  • Redundant node, leaf nodes contain all non-leaf nodes
  • Data stored in the leaf nodes, and leaf nodes there between arrow points

B + Tree stored data

Here Insert Picture Description

B + Tree fetch

Here Insert Picture Description

B + Tree Why the leaf nodes storing data redundantly

and the data storage node needs space, if the data is removed, can save more nodes, MySQL used in each node B + Tree can store up to 16KB of data can SHOW GLOBAL STATUS LIKE 'InnoDb_page_size';query the SQL this, in the case of 16KB MySQL使用B+Tree可以存储更多的索引元素, if table id bigint used as an index representing 8Byte, while using the recording 6Byte child node then a location index field representing 8 + 6 = 14Byte, 16KB / 14Byte = 1170, each node may store elements 1170
Here Insert Picture Description

B + Tree is full how much data can be stored

We calculated the top of each node can store 1170 elements, each node also has a child node, if the tree height 3 each index account for the size of 1KB, 1KB This is no longer small, it can store 1170*1170*16=2190Wpieces of data, we fully meet queries million level data tables

B + Tree Leaf node is doing haircut

B + non-leaf node in a redundant tree leaf node, and connected with pointers between the leaf nodes, Hash beginning does not support range queries, binary tree height is high, only B B + tree with some one , B tree node can store a plurality of elements, with respect to the red-black tree whole tree height is reduced, efficiency is improved disk IO. The B + tree is an upgraded version of the B-tree, just look at the non-leaf node redundancy, benefit of this is to improve the efficiency range of the search. The reason for the increase is no more than a leaf node have a pointer to the next node

B + Tree Features

  • All keywords appear in the list leaf node (dense index), and the list of keywords just ordered
  • In the B- tree, based on increased list pointer is a leaf node, all keywords appear in a leaf node, the non-leaf nodes as an index leaf node;
  • B + tree to a leaf node is always only hit data can not be hit in the non-leaf nodes
  • More suitable for document indexing system
  • More single node storage elements, so that the number of IO queries less, so it makes it more suitable as the underlying data structure of MySQL database

To which we introduced Hash, binary tree, red-black tree, B-Tree, B + Tree for each data structure, and concluded that the use Mysql B + Tree as a disk IO efficiency when maintaining the data structure of the index, the index can improve query and can improve the efficiency of range queries, and the B + tree elements is in order, then we will talk about two common Mysql storage engine how to use specific index

Myisam storage engine index to achieve

Create a table last line ENGINE = MyISAM

CREATE TABLE `tb_myisam` (
  `id` int(11) NOT NULL,
  `col1` varchar(255) DEFAULT NULL,
  `col2` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Disk Storage

Here Insert Picture Description

After a myisam table will create three files on disk frm, MYD, MYImaintenance
frm: storage table structure
MYD: storing table data
MYI: storage table index

adding data

insert into tb_myisam (id,col1,col2) VALUES
(2,"测试数据2","测试数据22"),
(4,"测试数据4","测试数据44"),
(5,"测试数据5","测试数据55"),
(7,"测试数据7","测试数据77"),
(1,"测试数据1","测试数据11"),
(3,"测试数据3","测试数据33"),
(6,"测试数据6","测试数据66");

View data

SELECT id,col1,col2 FROM tb_myisam

Here Insert Picture Description

Can see the data sorted in the order of insertion sort

Myisam index maintenance

Created in the id column index, the data maintained in the upper left corner of the structure B + Tree, bottom right corner of the data table data, B + Tree leaf nodes below the point of data storage is the row address corresponding to the data disk, adding there is a SELECT id FROM tb_myisam WHERE id=3will first to index file to find the node 3, and then remove the disk address corresponds to MYD file, locate this line data for query.

Here Insert Picture Description

InnoDB storage engine to achieve index

Create a table

CREATE TABLE `tb_innodb` (
  `id` int(11) NOT NULL,
  `col1` varchar(255) DEFAULT NULL,
  `col2` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Disk Storage

Here Insert Picture Description

After a InnoDB table will create two files on disk frm, ibdmaintenance
frm: storage table structure
ibd: storage table and index data

adding data

insert into tb_innodb (id,col1,col2) VALUES
(2,"测试数据2","测试数据22"),
(4,"测试数据4","测试数据44"),
(5,"测试数据5","测试数据55"),
(7,"测试数据7","测试数据77"),
(1,"测试数据1","测试数据11"),
(3,"测试数据3","测试数据33"),
(6,"测试数据6","测试数据66");

Query data

Here Insert Picture Description

mysql automatically create a primary key index, innodb search engine ordered by the primary key

InnoDB index maintenance

InnoDB storage engine, table data file that is ibd file itself is in accordance with the structure of the index file B + Tree organization, leaf node contains a complete data record
Here Insert Picture Description

If there is no index how do we InnoDB, the data is not stored yet?

  We should pass sentence InnoDB table must have a primary key, and the primary key is recommended to use integer auto increment, if you table has a primary key that will add an index on the primary key to maintain, do not specify the primary key if you create a table, the database will your table to find that the only data column to maintain, if not find such a column, the default database will increase its own one to maintain.
  First reason is recommended to use integer integer smaller storage footprint, and faster sorting comparison, some companies may use UUID as primary key, etc., UUID is a random string, you need to convert in the comparison and then compare and occupation a large space, it is not recommended to use
  the recommended auto-increment because we leaf node data from left to right in ascending order, more convenient when doing range queries, if your value is random it is possible to modify the original tree structure, lead to a split, split affect performance, you can see the figure, for example, we have finally come to see changes add 8
Here Insert Picture Description

Joint index look like?

  We generally do not create a separate index in the development of the project, but several key joint index creation, and now as long as you understand the whole joint index underlying principle, the principle of optimization of MySQL indexes online to see who you can go to understand the underlying principles, rather than go back, I am very nasty things back, back end soon forget boring, below is the joint index looks
Here Insert Picture Description

Suppose we combined index (col1, col2, col3), three lines of data in the green squares are for the FIG., Respectively according col1, col2, col3 three sorting, purple other non-indexed fields, where the joint is to be understood the index is according to what sort of 最左前缀法则, 为什么索引会失效as well Be sure to think hard!

Article Ideas

"High Performance MySQL"
"Inside MySQL: InnoDB storage engine."

Come onlookers a little real idea

First of all I am grateful readers concerned about 本工具人inner thoughts, thank you so concerned about me 抹眼泪, or leave it is 好奇, Thank you 关注and 点赞. The second is some ideas, this article really spent a lot of time and energy, the underlying principle of the thing more research find themselves poorly understood, in order to write this article so that we understand this is added to the first article in a motion picture , moving map making software took a point of effort, time learning to read is not to say we also learned a thing himself Haha, before you write a blog are in my blog on the use of technology, because I think technology can be white from entry to master advanced one-stop, but find it to energy is limited, and the technology is getting started online can be found everywhere, but we would like to see some of it should also 原理层, 思想层aspects of the article to upgrade themselves 进入大厂收割Offer, not just stay in the initial stage, we need to grow fast, so the back of the article focused on writing 原理层, 思想层, 产品, 学习路线and other aspects of the article, you can see the white entry-learning course series of articles, job 1--3 of friends can look at the principles level, ideology, architecture article, 5 years or so friends can see the product, know how to design a good product from good macro, limited ability in advanced I do Hate does not come 好羞耻out, .

Writing is not easy, the code word hard, dizziness and sometimes really wrote no idea, but has been in and we hope to help a friend like [**] ** not white whore, remember thumbs up, underneath there is a problem or concern public comments adding micro signals friends, QQ friends, etc. to explore each other and grow together is no longer standing still confused, no longer by his colleagues to the ground friction, no longer admire others, while others become the focus

We are here to prepare learning materials, direct links will be posted blog operators notice little sister, want to be old iron landscape of public concern number, [reply 资料] to!

Plus lines of it, two pictures together too abrupt, scan the next Fanger Wei code number of public attention, access to resources reply [information]

to sum up

Benpian N explain the data structure, MySQL index underlying principles, salted fish ready to convert this blog update direction 世间唯一不变的便是变化, Thaksin is better self-confidence, learning together stood up on the other side of the salted
road is long Come, happiness and earth

Published 27 original articles · won praise 148 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_36386908/article/details/104730786