Java data structure and algorithm-multiple search tree (2-3 tree, B tree, B+ tree, B* tree) [day11]

Multiple search tree


Binary tree and B tree

Problem analysis of the

binary tree : The operation efficiency of the binary tree is high, but there are also problems. Please see the following binary tree:

Insert picture description here

二叉树需要加载到内存的,如果二叉树的节点少,没有什么问题,
但是如果二叉树的节点很多(比如1亿), 就存在如下问题:

问题1:在构建二叉树时,需要多次进行i/o操作(海量数据存在数据库或文件中),
节点海量,构建二叉树时,速度有影响.
    
问题2:节点海量,也会造成二叉树的高度很大,会降低操作速度.
    
解决上述问题 -> 多叉树

Polytree

1. In a binary tree, each node has data items, and there are at most two child nodes.
If each node is allowed to have more data items and more child nodes , it
is a multiway tree (multiway tree)

2. The 2-3 tree described later, the 2-3-4 tree is a multiway tree , multi-way tree The tree can optimize the binary tree by reorganizing the nodes and reducing the height of the tree.

3. Example: (The 2-3 tree below is a polytree)

Insert picture description here


Basic introduction of B-tree

The B-tree improves efficiency by reorganizing nodes, reducing the height of the tree, and reducing the number of reads and writes.
Insert picture description here

1. As shown in Figure B, the tree reduces the height of the tree by reorganizing the nodes .

2. The designer of the file system and database system uses the principle of disk read-ahead,
setting the size of a node equal to a page (the page size is usually 4k),
so that each node only needs one I/O to be fully loaded
.

3. Set the degree M of the tree to 1024. Among 60 billion elements, it can be read by only 4 I/O operations at most To the desired element, B-tree (B+) is widely used in file storage systems and database systems .


2-3 trees

Basic introduction of 2-3 tree:

  • 2-3 tree is the simplest B-tree structure
  • It has the following characteristics: All leaf nodes of the 2-3 tree are in the same layer. (As long as the B tree meets this condition)
  • A node with two child nodes is called a second node. The second node either has no child nodes
    or has two child nodes.
  • A node with three child nodes is called a three node, and a three node either has no child nodes or has three child nodes.
  • The 2-3 tree is a tree composed of two nodes and three nodes. 2-3 tree application case
将数列{16, 24, 12, 32, 14, 26, 34, 10, 8, 28, 38, 20}
 构建成2-3树,并保证数据插入的大小顺序。

(演示构建2-3树的过程 ——> 如下:)

插入规则:
1.2-3树的所有叶子节点都在同一层.(只要是B树都满足这个条件)
2.有两个子节点的节点叫二节点,二节点要么没有子节点,
要么有两个子节点.
3.有三个子节点的节点叫三节点,三节点要么没有子节点,
要么有三个子节点
4.当按照规则插入一个数到某个节点时,不能满足上面三个要求,
就需要拆,先向上拆,如果上层满,则拆本层,
拆后仍然需要满足上面3个条件。 
5.对于三节点的子树的值大小仍然遵守(BST 二叉排序树)的规则

Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here

Description:

  1. When inserting 10, it should be at 10-12 -14, but it is full at this time, so looking up, 16-26 is also full
  2. So split 10-12-14 into 10 <-12->14, because other splitting methods cannot meet the requirements of two-node or three-node.

Insert picture description here

  1. But at this time, the leaf nodes are not all in the same layer, and the value of 26 needs to be adjusted to the following (as shown in the figure)
    Insert picture description here
    Insert picture description here
    Insert picture description here
    Insert picture description here
    Insert picture description here

other instructions

In addition to 23 trees, there are 234 trees, etc. The concept is similar to that of 23 trees, and it is also a B-tree. As shown:


B tree, B+ tree and B* tree

Introduction of B-tree

B-tree is B-tree , and B is Balanced, meaning balance. Some people translate B-tree into B-tree, which is easy to misunderstand. One would think that B-tree is a kind of tree, and B-tree is another kind of tree. In fact, B-tree refers to B-tree .

I have already introduced 2-3 tree and 2-3-4 tree. They are B-trees (English: B-tree is also written as B-tree). Here we will make another explanation. When we are learning Mysql, we often hear that A certain type of index is based on B-tree or B+ tree, as shown in the figure:
Insert picture description here

B树的说明:
1.B树的阶:节点的最多子节点个数。比如2-3树的阶是3,2-3-4树的阶是4

2.B-树的搜索,从根结点开始,对结点内的关键字(有序)序列进行二分查找,
如果命中则结束,否则进入查询关键字所属范围的儿子结点;
重复,直到所对应的儿子指针为空,或已经是叶子结点

3.关键字集合分布在整颗树中, 即叶子节点和非叶子节点都存放数据.

4.搜索有可能在非叶子结点结束

5.其搜索性能等价于在关键字全集内做一次二分查找

Introduction of B+ tree

B+ tree is a variant of B tree, and it is also a multi-path search tree.

Insert picture description here

B + Description tree:
1.B + tree searches the B-tree is basically the same, except that the B + tree only has reached a leaf node only hit (B-tree may be a non-leaf nodes hit), its performance is equivalent to the critical A binary search for the complete set of words
2 All keywords appear in the linked list of leaf nodes (that is, data can only be in leaf nodes [also called dense index]), and the keywords (data) in the linked list happen to be in order .
3. It is impossible to hit a non-leaf node.
4. A non-leaf node is equivalent to the index of a leaf node (sparse index), and the leaf node is equivalent to a data layer
for storing (keyword) data. 5. It is more suitable for a file index system
6 .B-tree and B+-tree each have their own application scenarios. It cannot be said that B+-tree is completely better than B-tree, and vice versa.


Introduction of B*tree

B* tree is a variant of B+ tree. In the non-root and non-leaf nodes of B+ tree, pointers to brothers are added.
Insert picture description here

Description of B*tree:

  • B* tree defines that the number of non-leaf node keywords is at least (2/3)*M, that is, the minimum usage rate of blocks is 2/3, and the minimum usage rate of blocks of B+ tree is 1/2 of B+ tree .

  • From the first feature, we can see that the probability of B* tree allocating new nodes is lower than that of B+ tree , and the space utilization rate is higher.

Insert picture description here

Guess you like

Origin blog.csdn.net/SwaeLeeUknow/article/details/109082568