Detailed B-tree (1)

introduction

We have already talked about many trees, such as ordinary binary trees, binary heaps, binary search trees, balanced binary trees, etc. Now there is a question, what are so many trees used for? In fact, everything has the inevitability of development, all to solve problems. As the scale and depth of the problem continue to deepen, the corresponding solutions have also evolved. Most of these trees are to solve the search efficiency, or to ensure the order of the search results. In the actual business scenario, it is nothing more than reading and writing (deleting and updating are regarded as writing). Regarding write operations, stability is more important. Correct writing is the first priority and writing speed is the second. Because the source flow of production data is certainly not very large, it is ok to spend more time writing. However, reading is the direction of continuous pursuit of efficiency. Historical data continues to accumulate. Too slow to obtain data will seriously affect system performance, and it will be a real sense of experience at the user end. Therefore, in view of the problem of reading efficiency, various data structures have been developed to improve search efficiency. This is the evolution process from binary search tree to balanced binary tree. We know that the search time complexity of the balanced binary tree is already O(logN), but it is actually the fastest. However, historical data must be stored in the disk to prevent loss. Disk IO is also an important factor affecting search efficiency. With the advent of big data, the scale of data continues to expand, reaching millions, tens of millions, billions, or even billions. Disk IO has become the main performance bottleneck when searching. Because the storage of the balanced binary tree on the disk is not continuous, the number of elements read per disk IO is limited, which leads to an increase in the number of disk IOs, which seriously affects the search performance as a whole. The original solution is no longer suitable for this scenario. So we will talk about the B-tree we are going to talk about today to improve the search efficiency in large-scale data scenarios.

Introduction

B-tree is B-tree. Because the original English name of B-tree is B-tree, and many people in China like to translate B-tree as B-tree. In fact, this is a very bad literal translation, which is easy to misunderstand. For example, people might think that B-tree is a kind of tree, and B-tree is a kind of tree. In fact, B-tree refers to B-tree.
The emergence of B-trees is to bridge the huge difference in access speed between different storage levels and achieve efficient I/O. The search efficiency of the balanced binary tree is very high, and the search efficiency can be improved by reducing the depth of the tree. However, when the amount of data is very large, the number of elements stored in the tree is limited. This will cause the binary search tree structure to cause too frequent disk I/O reads and writes due to the excessive depth of the tree, resulting in low query efficiency. In addition, the amount of data is too large and the memory space is not enough to accommodate all the nodes of the balanced binary tree. B-tree is a good structure to solve this problem

concept

First of all, the B-tree should not be confused with the binary tree. In computer science, the B-tree is one kind 自平衡树数据结构, and it is maintained 有序数据and allowed 以对数时间进行搜索,顺序访问,插入和删除. B-tree is a generalization of binary search tree, because a node can have more than two child nodes. Unlike other self-balancing binary search trees, B-trees are very suitable for storage systems that read and write relatively large data blocks (such as optical discs). It is commonly used in databases and file systems.

Starting from the case

The database index is stored on the disk (for example, mysql's InnoDB engine uses B+ tree as the index tree structure, B+ tree is a deformation of B-tree), when the amount of data is relatively large, the size of the index may be several G even more. When we use index query, can we load the entire index into memory? Obviously impossible. All we can do is load each disk page one by one,这里的磁盘页对应这索引树的节点

Supplementary note : The distribution of the index tree on the disk is not continuous, and it is connected by node pointers. Disk IO has a read-ahead mechanism (which will be specifically mentioned below). When we look for a node, each disk IO read is a disk page. The information of the node is contained in the disk page, so the above said a disk The page corresponds to a node of the index tree. The size of a disk page is related to the operating system, generally 4K or 8K, so the size of the information contained in a node cannot exceed the size of a disk page.
Insert picture description here

We know that the search time complexity of a balanced binary tree is O(logN), so why not use a binary search tree as an index structure? When using a balanced binary tree, assuming that the height of the tree is 4 and the value to be searched is 10, the process is as follows:
balanced binary tree:
Insert picture description here
first search (first disk IO, read the first disk page, the same below):
Insert picture description here
…(omitted process)

Fourth disk IO:
Insert picture description here
From the results, in the worst case, the number of disk IO is equal to the height of the index tree.
At this time, to improve the search efficiency, the direction is to build fewer disk IO times, and change the original "skinny" tree structure into a "squat" tree structure (that is, reduce the height of the tree, increase the number of elements stored in each node, and at the same time The node size cannot exceed the disk page size), which is one of the characteristics of the B-tree.

Disk IO and read-ahead

We have been talking about disk IO above, here we will give a brief introduction.

计算机存储设备一般分为两种:
内存储器(main memory)和外存储器(external memory)

The internal memory is internal memory, which has a fast memory access speed, but has a small capacity, is expensive, and cannot store data for a long time (the data will disappear when the power is not turned on).

The external memory is the disk read. The disk reads data by mechanical movement. The time spent reading data each time can be divided into three parts: seek time, rotation delay, and transmission time. The seek time refers to the magnetic arm. The time required to move to the specified track is generally less than 5ms for mainstream disks; the rotation delay is the disk rotation speed we often hear. For example, a disk of 7200 revolutions means that it can rotate 7200 times per minute, which means that it can rotate 120 times per second. Second, the rotation delay is 1/120/2 = 4.17ms (the disk is a circle, so divide by 2); the transmission time refers to the time to read from the disk or write data to the disk, generally in a few tenths of a millisecond, relative to The first two times can be ignored. Then the time to access a disk, that is, the time of a disk IO is about 5+4.17 = 9ms, which sounds pretty good, but you must know that a 500-MIPS machine can execute 500 million instructions per second, because the instructions Relying on the nature of electricity, in other words, the time to execute an IO can execute 400,000 instructions, and the database can easily run 100,000 million or even tens of millions of data. Each time 9 milliseconds is obviously a disaster. The following figure is a comparison chart of computer hardware delays for your reference:

Insert picture description here
Considering that disk IO is a very expensive operation, the computer operating system has made some optimizations. When an IO, not only the data of the current disk address, but also the adjacent data are also read into the memory buffer, because of the local The pre-reading principle tells us that when a computer accesses the data of an address, the adjacent data will also be accessed quickly. 每一次IO读取的数据我们称之为一页(page),也即磁盘页. The amount of data in a specific page is related to the operating system, generally 4k or 8k, that is, when we read the data in a page, an IO actually occurs. This theory is very helpful for the design of the index data structure.

事实1 : 不同容量的存储器,访问速度差异悬殊。
  • Disk (ms level) << memory (ns level), 100000 times
  • If memory access takes 1s, one external memory access takes one day
  • In order to avoid 1 external memory access, I would rather access memory 100 times...so store the most frequently used data in the fastest memory
事实2 : 从磁盘中读 1 B,与读写 1KB 的时间成本几乎一样

From the above data it can be concluded in a sense, 索引查询的数据主要受限于硬盘的I/O速度,查询I/O次数越少,速度越快,所以B树的结构才应需求而生;B树的每个节点的元素可以视为一次I/O读取,树的高度表示最多的I/O次数,在相同数量的总元素个数下,每个节点的元素个数越多,高度越低,查询所需的I/O次数越少;假设,一次硬盘一次I/O数据为8K,索引用int(4字节)类型数据建立,理论上一个节点最多可以为2000个元素,2000*2000*2000=8000000000,80亿条的数据只需3次I/O(理论值),可想而知,B树做为索引的查询效率有多高;

It can also be seen同样的总元素个数,查询效率和树的高度密切相关

definition

B-tree is a balanced multi-branch tree, usually we say m-order B-tree, it is either an empty tree, or must meet the following conditions:

  • If the root is not a leaf node, then the root has at least two child nodes (otherwise it will become a single branch), with [1, m-1] elements
  • Each intermediate node (not the root node and leaf node) contains [math.ceil(m/2)-1, m-1] elements and [math.ceil(m/2), m] child nodes
  • Each leaf node contains [math.ceil(m/2)-1, m-1] elements
  • Each node has at most m child nodes.
  • All leaf nodes are on the same layer (the same height).
  • The elements in each node are arranged from small to large, and the k-1th element in the node is exactly the range division of the element contained in the kth child.

Note:
math.ceil(x) is an upward integer that returns a number. For example, math.ceil(1.2) returns 2.

Supplementary note:

Grasp one point:

Compared with the binary search tree, the B-tree needs to reduce the height of the tree and increase the number of node elements, so each parent node may have more than 2 child nodes, and the number of elements per node is no longer limited to 1.

什么是B树的阶 ?
B树中所有节点的子节点数目的最大值,用m表示,假如最大值为10,则为10阶B树

Fourth-order B-tree
Insert picture description here

B tree height

What is the maximum height of an m-level B-tree containing N total keywords?

log(m/2)(N+1)/2 + 1  ,log以(m/2)为低,(N+1)/2的对数再加1

Insert picture description here

Guess you like

Origin blog.csdn.net/csdniter/article/details/111589099