B-tree of data structure

Table of contents

1. B-tree of data structure

 1.1 Definition of 2-3-4 tree

 Second, the characteristics of B-tree

3. Store data

4. Advantages

5. Comparison of red-black tree and B-tree

5.1, memory performance comparison

5.2. Disk performance comparison


1. B-tree of data structure

1. The data storage of B-tree is key-value type;

2. How many forks the B tree has: it is not sure, it depends on the specific implementation;

3. M-order B-tree: each node has at most M-1 values, and they are arranged in ascending order.

 1.1 Definition of 2-3-4 tree

1, 2--node

        Contains a key (and its corresponding value) and two chains, the left link points to a node smaller than the node in the 2-3-4 tree, and the right link points to a value greater than the node.

2, 3--node

        Contains two keys (and their corresponding values) and three chains, the keys in the 2-3-4 tree pointed to by the left link are all smaller than the node, and the keys in the 2-3-4 tree pointed to by the middle link are all located in the node Between the two keys, the keys in the 2-3-4 tree that the right join points to are all larger than the node.

3, 4--Node

        Contains three keys (and their corresponding values) and four links. The keys in the 2-3-4 tree pointed to by the left link are smaller than the node, and the keys in the 2-3-4 tree pointed to by the left middle link and the right middle are both Between the two keys of the node, the keys in the 2-3-4 tree pointed to by the right join are all greater than the node.

The characteristics of 2-3-4 tree: all leaf nodes have the same depth.

 Second, the characteristics of B-tree

        A B-tree allows a node to contain multiple keys, which can be 3, 4, 5 or even more. It is not sure, and it depends on the specific implementation. Now we choose a parameter M to build a B-tree, which we can call an M-order B-tree. Then the tree will have the following characteristics:

  • Each node has at most M-1 Keys, and they are arranged in ascending order
  • Each node can have at most M child nodes
  • The root node has at least two child nodes

3. Store data

        If the parameter M is selected as 5, then each node contains at most 4 key-value pairs. Let's take the 5th-order B-tree as an example to see the data storage of the B-tree.

①Insert 36 in the empty tree

 ② Insert 21, 95 and 40

 ③Continue to insert 50 (50 will be inserted in the middle of 40 and 95, then this node will not be satisfied that each node has a maximum of M-1 Keys, so only 40 (intermediate key) can be ascended)

 ④Continue to insert 10 and 18

 

⑤Continue to insert 37 (37 will be inserted behind 36, then this node will not meet the requirement that each node has a maximum of M-1 Keys, so only 21 (intermediate key) can be ascended)

 .....Proceed in this way sequentially, and the B-tree is constructed.

4. Advantages

        In practical applications, the order of the B-tree is generally relatively large (usually greater than 100), so even if a large amount of data is stored, the height of the B-tree is still relatively small, so that its advantages can be reflected in some reference scenarios: Reduce the number of visits to the tree to achieve tree balance.

5. Comparison of red-black tree and B-tree

5.1, memory performance comparison

1. Why is there a B-tree?

        The performance is higher than the red-black tree, and the number of comparisons is relatively small, so the memory is better than the multi-fork tree, which is the optimal memory for the binary tree.

        The B-tree disk addressing is faster, so the B-tree is mostly used for disks. The reason is that multiple forks are separated, the height is reduced, and the number and time of addressing are reduced.

2. How is the data addressed?

        Data addressing mainly occurs in the disk, and the time consumed is the physical time of head movement + the time consumed by the average disk surface to rotate half a circle.

5.2. Disk performance comparison

B-trees are mostly used on disks because they are divided into multiple forks, which reduces the height of the tree and reduces the number of times and time for addressing.

Both the red-black tree and the B-tree are in the disk ---> data addressing wastes time----> the physical time of the head moving + the average disk rotation half a circle

Guess you like

Origin blog.csdn.net/qq_54247497/article/details/131583031