"Layman's language, then the data structure" is what the series B tree, B + tree? Why not do a binary search tree?

This article will introduce B-tree and B + tree, first introduced the scenario B-tree, why the B-tree; then introduces queries and B tree insertion process; and finally talk about the B + tree improvement for the B-tree.
Before talking about the B-tree, first talk about scenarios for which the B-tree. Then the B-tree is used to do it? Into a B-tree data structure is a secondary storage design, and widespread use of the database in the file system. Take, for example, I am sure you are not familiar with the database, for example, now we have a table with one million records, and now you are looking to find out which particular data, how to quickly find the required piece from one million records record it? Everyone's first reaction is certainly a binary search tree, here to talk about why the binary tree can not.

Why not do a binary search tree

Or just that example, there is now a table in one million records, we have the primary key of the table to build a binary search tree. Unbalanced binary tree is not considered the situation degenerates into a linked list, assuming that is the ideal situation, this is a perfectly balanced binary tree. After the build is complete it should look like this below (schematic)
Binary search tree
height of the tree should

start looking now, join us to find the value of node 100, how to find it? You should first get the root node of the tree. In general, the index table itself is great, the disk can not all be stored in memory, so the index is often stored in the form of an index file. That we build a binary search tree is too big, the memory does not fit, generally exist on disk, use the time to re-read from disk into memory. Now suppose that we know where the root disk location, you should first read into the memory of the root node, here an IO operation, and then determine the value of looking for big or small than the root node 100 is greater than 4, so go Find the right subtree. So how do you find the node 6 where it? 6 root node stores the location where the node is stored on disk. Also, it needs to be read into memory first, and then continue to look down judgment, this is also a IO operation. The following procedure is similar to not start. In summary, we need to find that record probably need 20 times IO operations, which is the height of the tree, because looking at each layer down, it is necessary to conduct a IO operation.
Why the emphasis on IO operations, rather than the number of comparisons made in memory of it? As compared in terms of memory is very slow speed of the disk. Such as common around 7200RPM hard drive, rocker revolution requires 60 / 7200≈8.33ms, in other words, to make a complete revolution of the disk to find the required data needed 8.33ms, which common than 100,000 times slower memory 100ns this does not include the time to move the rocker arm. So here restrict the search speed is not the number of comparisons, but the number of IO operations. In other words, if the IO operation time can be reduced, and comparing it in memory 100 does not matter, because the speed difference between the two 100,000 times. And we can assume that the number of IO operations to approximately equal to the height of the tree, how to calculate the height of the tree is it? Look at this formula

We want to make this value as small as possible, it can only make a small real numbers, or big base number. The true number is the number of data records, we can not decide. Then the base number it? This 2 from where it? Of course, is that two binary tree from. Then the base number can not be changed much? Of course! ! ! That is not a binary tree, but rather use the multi-branch tree, this is what we have to say the B-tree.

What is a B tree

B-tree, also known as B- tree, which is more than one balanced search trees. Mentioned B-tree and B + trees are behind the simple binary mapped from. It needs to be specified when the order described in a B-tree, represents the order of a node has the maximum number of child nodes, generally indicated by the letter order m. Let us look at the definition of the B-tree
(1) Each node in the tree has at most m subtrees (m refers to the order of the tree);
explanation : Some say that the definition of each node has at most m-1 key word is the same for each node in the subtree is the number equal to the number of keywords plus 1, will be exemplified below.
(2) If the root node is not a leaf node, then at least two subtrees;
explained : When the root node is a leaf node, there may be no subtree; is not a leaf node, at least one keyword, i.e. at least two sub-trees.
(3) all non-leaf nodes other than the root node having at least ⌈m / 2⌉ child nodes;
explained : ⌈m / 2⌉ represents rounding up, such as when m = 5, ⌈m / 2⌉ = 3, indicates that at least three sub-nodes, of course, up to five.
(4) all non-leaf node contains the following data: (n, A0, K1, A1, K2, ..., Kn, An)
explained : n = number of node keys, Ki (i = 1,2, ..., n) is the key, and of Ki <. 1 of Ki +
Ai is a pointer to a child node (i = 0,1, ..., n ), and the pointer keyword Ai-1 referred to all nodes in the subtree are less than Ki (i = 1,2, ..., n), An all keywords referred subtree nodes greater than Kn. (Where you can also see the number of child nodes for each node is more than the number of keywords than 1)
(5) all the leaf nodes appear on the same level, that is, all the leaf nodes have the same depth equal to the height of the tree. In addition to the leaf node contains keywords and keyword record pointer also has a pointer to its child nodes are just its address pointer is null.
The following specific example illustrates, I believe that this example will read the above definition have a more profound understanding.
Example 4 B-stage tree

Find a B-tree

以上图为例,假设要查找15的节点,查找流程如下
(1)获取根节点的关键字进行比较,当前根节点关键字为50,50>15,所以找到指向左边的子节点;
(2)拿到关键字10和30,10<15<30 所以直接找到10和30中间的指针指向的子节点;
(3)拿到关键字15,就是要查找的目标值, 所以直接返回关键字和指针信息(如果树结构里面没有包含所要查找的节点则返回null)
至此我们便完成了B树的查找过程,比较简单,且与二叉查找树类似。
关于B树的插入操作,可以参考【为什么有红黑树?什么是红黑树?看完这篇你就明白了】这篇推文中关于2-3树的插入操作的详细介绍,其实2-3树就是一种特殊的B树。限于篇幅,本文不再赘述。

从B树到B+树

B + tree is derived from the B-tree, the tree has more advantages than B. B + tree with respect to the B-tree made two major improvements:
(1) non-leaf node having an index-only action, with the information about the recording are stored in the leaf node.
He explained : B + non-leaf nodes of the tree does not save the record pointer keyword, carried only data index; B + leaves node pointer saved all keywords recorded parent node, all data must address in order to get to the leaf nodes. Or give an example of just the B-tree, B tree root keywords for 50. If we have to find the primary key for the record of 50, then the IO operation only once in the B-tree, the root node is read into memory, you can direct hit. In the B + tree is different, B + tree in any inquiry must be to get to the leaf nodes, so every time the number of data queries are the same, and this is a B-tree is very different.
All leaf nodes (2) of the tree constitute an ordered list, you can traverse all the records sorted in order of keywords.
Explanation : This has two advantages. First, the scope of inquiries carried out more quickly, high compactness data, the cache hit rate will be higher than the B-tree. Second, the whole B + tree node traversal faster, B + tree traversal whole tree only needs to traverse all of the leaf nodes, without the need for as B-tree, like the need to traverse each layer, which is conducive to the database to do a full table scan.
Recommended Reading
Why red-black tree? What is the red-black tree? After reading this you will understand
all of 2020, I heard you will not merge sort? Taught you merge sort algorithm handwritten
Why Multithreading? What is thread safe? How to ensure thread safety?

Find the article useful, thumbs + attention chant, so that more people see this article also encourage bloggers to write more good articles.
More about algorithms, data structures and basic knowledge of computer content, please scan code public attention to my original number " super Wyatt programming ."

Super Wyatt Programming

Guess you like

Origin www.cnblogs.com/exzlc/p/12208793.html
Recommended