B tree, B+ tree, red-black tree comparison

B-tree is a multi-way balanced search tree produced to improve the search efficiency of disks or external storage devices.

B+ tree is a deformed structure of B tree, designed for storage in most databases or file systems.

 

The difference between B-tree and red-black tree

In the case of large-scale data storage, red-black trees often have too many disk IO reads and writes due to the excessive depth of the tree, leading to low efficiency. Why does this happen? We know that to get data on the disk, we must first move to the cylinder where the data is located by the disk moving arm, then find the designated disk, then rotate the disk to find the track where the data is, and finally read and write the data. The cost of disk IO is mainly spent on the cylinders required for searching. Excessive depth of the tree will cause frequent disk IO reads and writes. According to the number of disk search accesses are often determined by the height of the tree, so as long as we reduce the tree structure through a better tree structure to minimize the height of the tree, B-trees can have multiple children, from tens to up Thousands can reduce the height of the tree.

 

The difference between B tree and B+ tree

1. In B-tree, all nodes have pointers (ROWID) to records (data), and only leaf nodes in B+ tree have pointers (ROWID) to records (data). Because the B+ tree stores all satellite data (or pointers to data) in leaf nodes, internal nodes only store keywords and child pointers, and will not bring pointers to records (ROWID). In this way, in one block Can accommodate more index items, one can reduce the height of the tree. The second is that an internal node can locate more leaf nodes (advantage 1).

2. Each leaf node in the B+ tree contains a pointer to the next leaf node. All leaf nodes are connected by pointers, but B-trees do not. The leaf nodes are connected by pointers, and range scanning will be very simple (advantage 2), while for B-trees, it needs to move back and forth between the leaf nodes and internal nodes.

   B+ tree also has one of the biggest benefits, traversal is more efficient, easy to scan the library (advantage 2), B-tree must use the method of in-order traversal to scan the library in order, and the B+ tree directly scans the leaf nodes one by one to finish, B+ Tree supports range-query is very convenient, but B-tree does not. This is the main reason why the database chooses B+ trees. (The traversal of the B+ tree is more efficient. The B tree needs to traverse the nodes in an in-order manner, while the B+ tree can be traversed from beginning to end by stringing all the leaf nodes into a linked list).

3. Each node of the B+ tree has as many pointers as the key, and each node of the B-tree has one more pointer than the key.

 

 

Why is B+ more suitable for file indexing and database indexing of operating systems in practical applications than B-tree ?

1) B+ disk read and write costs are lower

The internal node of B+ does not have a pointer to the specific information of the keyword. Therefore, its internal nodes are smaller than the B-tree. If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk block can hold. The more keywords that need to be searched are read into the memory at one time. Relatively speaking, the number of IO reads and writes is reduced.

2) The query efficiency of B+tree is more stable

Because the non-leaf node is not the node that ultimately points to the content of the file, but just the index of the keyword in the leaf node. Therefore, any keyword search must take a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.

The main reason for using B+ tree for database index is that B tree improves disk IO performance and does not solve the problem of low efficiency of element traversal. It is to solve this problem that the B+ tree came into being. As long as the B+ tree traverses the leaf nodes, the entire tree can be traversed. And range-based queries in the database are very frequent, and B-tree does not support such operations (or too low efficiency)

 

 

Each has its advantages: 

Advantages of B+ tree:

1. Non-leaf nodes will not carry a pointer to the record (ROWID), so that more index items can be accommodated in a block, one is to reduce the height of the tree. The second is that an internal node can locate more leaf nodes.

2. The leaf nodes are connected by pointers, and the range scan will be very simple. For B-trees, it needs to move back and forth between the leaf nodes and internal nodes.

 

Advantages of B-tree:

The data in the internal nodes can be obtained directly, and there is no need to locate according to the leaf nodes.

Guess you like

Origin blog.csdn.net/orzMrXu/article/details/102529925