--B principle underlying the index tree, B + tree

  In the file storage system will be used to store the B-tree, why do it? What is the reason for this is?

  If we go to query data in a lot of data, one by one comparison is the easiest way to think of the time complexity of the algorithm one by one comparison is O (n), it seems to be slow points. Then we thought of the AVL tree, the search time such a data structure complexity is O (logn), that is to sort the data, and then use the idea of ​​binary search to find, of course, is the time complexity of O (logn) , then use the AVL tree on it, why should the introduction of B-tree, first look at the B-tree

  First, B tree is a binary search tree, just search tree of order m, has the following properties:

1.树中的每个节点最多有m个孩子
2.除根节点和叶子节点的节点孩子数至少为ceil(m-1)
3.除根节点外的关键字数至少为 ceil(m-1)-1<=关键字<=m-1
注:ceil为取上限函数

Just to see these features may not understand why the number of children and the number of keywords at least so much, we take a look at the B-tree insertion, and then come back to explain

B-tree insertion

Illustration, inserts a set of letters in the order B Species 5:
CNGAHEKQMFWLTZDPRXYS
noted, insertion need to follow the same properties of the binary tree search, the left small and large on the right, prior to insertion 5

Here Insert Picture Description
Since the order is 5, the node key at most 4, it is necessary to split up, i.e., lifting up the G, AC, and the left and right children, respectively HN
Here Insert Picture Description
insertion E, K, N, find a suitable location, as follows:
Here Insert Picture Description
Insert M is, found inserted node 5 has been satisfied, then the node split up
Here Insert Picture Description
to find the appropriate position, insert F, W, L, T
Here Insert Picture Description
inserted Z, is found inserted node NQTWZ, five have been met, it is necessary to up splitting, as shown
Here Insert Picture Description
insert D, the current node is ACDEF, need to split up, postmitotic follows:
Here Insert Picture Description
insert P, R, X, Y
Here Insert Picture Description
insertion S, becomes the current node NPQRS, Q split up, finished split follows:
Here Insert Picture Description
then found 5 has a root node, DGMQT, need to continue to divide, to go up to M
Here Insert Picture Description
  this completes the insertion of M, the latter will not continue drawing, and here we also understand the absolute B-tree is balanced, it is because he split up the results. To summarize node insertion process, when inserted node is not full time, directly into the current node, when inserted node is full, the current node will need to split up, after splitting the parent node is equivalent to insert a new key word, if not full, direct insertion, if full, then continue iterating upwards.

  Here to explain the second B-tree, the three properties. When split up, will be reduced by a first key, which is split up, the node key is then distributed to at least half of the number, if the number of the original node is a key of m, is split after (m-1) / 2, here take the lower limit, to take if the upper limit is ceil (m / 2) -1, the two results are identical. If the number of keywords at least ceil (m / 2) -1, then the number of child larger than the number key 1, i.e. ceil (m / 2)

Here Insert Picture Description
The figure is a B-tree diagram, the blue represents a keyword, yellow children, and red with the keyword data lines bound now to explain why AVL could be done, why the need to complete the B-tree:
  in each level attained when the query is correct, because the large amount of data is present on the disk, so every time the transfer of data is required to complete a disk IO, at the operating system level is concerned, it is in memory pages unit, the disk is a block basis, regardless of the amount of data each time the size of disk IO, the minimum to be a data block as a unit, which resulted in waste, tree height B will be very low, which means that the height of a few , you need to perform several disk IO, when the real environment it is possible to order several hundred B-tree, thus ensuring a disk IO will make full use of a unit block data. In conclusion, the use of B-tree, AVL tree instead, is to reduce the number of disk IO, and reduce waste.

However, in the actual process, using a B + tree, B is an evolved version of the tree, a B + tree as shown below:
Here Insert Picture Description
Similarly, a blue key, a child yellow, red is bound with the keyword data line will It found that the B + tree with several different B-tree features:

  1. Data appear in the leaf node, so the time to get the data is the same. Trunk node is used to search
  2. There is a chain strung the entire data table, which is more conducive to the scope of inquiry

Then the B + tree queries with respect to the advantages of the B-tree has the following three points:

  1. Since the keyword is used only for non-leaf nodes of the search, the row address data is not stored, so the more the same block can contain keywords, the more low height of the tree, the higher the efficiency of the query
  2. Because the data row addresses are in the leaf node, so the time of the query is relatively stable
  3. Due to the linked list pointers, so more conducive to range queries
Published 117 original articles · won praise 283 · views 60000 +

Guess you like

Origin blog.csdn.net/weixin_42220532/article/details/104411533