MySQL index principle and structure Detailed BTree

1. The nature of the index

MySQL is the official definition of the index: the index (Index) to help MySQL efficiently get the data structure of the data. Extracting a sentence trunk, you can get the essence of the index: the index is a data structure.

We know that the database query is one of the most important functions of the database. We all want to query the data rate can be as fast as possible, so designers can optimize the database system from the perspective of query algorithms. Of course, the most basic query algorithm is a sequential search (linear search), this complexity is obviously a bad time O (n) algorithm in a large amount of data, and good development of computer science provides a lot better search algorithm , for example, binary search (binary search), a binary tree lookup (binary tree search) and the like. If Little analysis will find that each search algorithm can only be applied on top of specific data structures, such as binary search request was ordered to retrieve the data, and binary tree can be applied only to find a binary search tree on, but the data itself structure is impossible to completely satisfy various data structures (e.g., theoretically impossible while the two are organized in sequence), so that, in addition to data, the database system also maintains data structures satisfy a particular search algorithm, the data structure reference (point) data in some way, so that you can achieve the advanced search algorithm on these data structures. This data structure is indexed.

Look at an example:

figure 1

Figure 1 shows a possible indexing. On the left is a data table, a total of two seven records, the far left is the physical address of the data record (adjacent to the attention logically recorded on the disk are also not necessarily physically adjacent). To accelerate the lookup Col2, can maintain a binary search tree shown on the right, respectively, each node includes an index key and a pointer pointing to the corresponding physical address of the data record, so that you can use a binary search in O (log2n) O the (log2n) corresponding to the complexity of the acquired data.

While this is a genuine index, but almost no actual database system using a binary search tree or evolutionary species red-black tree (red-black tree) to achieve, the reason will be introduced below.

2. B-Tree和B+Tree

Currently most database systems and file systems use B-Tree or a variant thereof as a B + Tree index structure in the section of this article will be accessed by a computer in conjunction with the memory and the principle of why the principles discussed and B-Tree in a B + Tree is so widely used in the index, this section first describes a simple data structure from the angle thereof.

2.1 B-Tree

In order to describe B-Tree, first define a data record to a tuple [key, Data], the key for the key recording, different data records, the key is different from each other; data is recorded in addition to the data of the key . Then the B-Tree is a data structure that satisfies the following conditions:

  • d is a positive integer greater than 1, it referred to as the degree of B-Tree.
  • h is a positive integer, called the height of the B-Tree.
  • Each non-leaf node of the n-1 key and n pointers, where d <= n <= 2d.
  • Each leaf node contains a key and a minimum of two pointers, one containing up key 2d-1 and 2d pointers, pointers to leaf nodes are null.
  • All leaf nodes have the same depth, height h is equal to the tree.
  • key and a pointer spaced from one another, both ends of the node pointers.
  • A node key non-descending from left to right.
  • All the nodes in the tree structure.
  • Or each pointer is null, either point to another node.
  • If a pointer is in all key leftmost node node is not null, it points to the node is less than v (key1) v (key1), where v (key1) v (key1), the first key value of the node.
  • If a pointer is in the rightmost node node is not null, all key points to the node which is greater than v (keym) v (keym), where v (keym) v (keym) as a key value of the last node.
  • If a pointer to the left and right adjacent node node key and are keyikeyi keyi + 1keyi + 1 is not null, all key points to the node which is less than v (keyi + 1) v (keyi + 1) and greater than V ( keyi) v (keyi).

FIG 2 is a schematic diagram d = B-Tree 2:

Since the characteristics of B-Tree in B-Tree Press algorithm key for retrieving data very intuitive: first binary search from the root node, if the found data corresponding to the node is returned, otherwise the pointer corresponding sections node pointed recursive lookup until you find the node or find a null pointer, the former to find success, the latter lookup fails. Search algorithm pseudocode on B-Tree as follows:

BTree_Search(node, key) {
    if(node == null) return null;
    foreach(node.key)
    {
        if(node.key[i] == key) return node.data[i];
        if(node.key[i] > key) return BTree_Search(point[i]->node);
    }
    return BTree_Search(point[i+1]->node);
}
data = BTree_Search(root, my_key);

On B-Tree have a number of interesting properties such as a degree d of the B-Tree, is provided with an index of N key, it tree height h of the upper limit logd ((N + 1) / 2) logd ((N + 1) / 2), a retrieval key, which is the number of nodes to find the asymptotic complexity of O (logdN) O (logdN). From this it can be seen, B-Tree is a very efficient index data structure.

In addition, due to the insertion and deletion new data record will destroy the nature of the B-Tree, so at the time of insertion and deletion, the need for tree a split, merge, transfer and other operations to keep the B-Tree nature, we do not intend to complete discussion of B-Tree these contents, as it has much of the information details the mathematical properties of B-Tree and insertion and deletion algorithms, friends who are interested can find the appropriate information to read in the references at the end of this column.

2.2 B+Tree

B-Tree There are many varieties, the most common is the B + Tree, for example, the widespread use of MySQL will implement its B + Tree index structure.

Compared with the B-Tree, B + Tree following point:

  • The upper limit of each node pointer not 2d 2d + 1.
  • The node does not store data, stores only Key; leaf node does not store a pointer.

FIG 3 is a simple schematic of the B + Tree:

Since not all nodes have the same domain, thus different B + Tree nodes and the nodes are generally middle size. This is different from the B-Tree, B-Tree though different nodes and storage key may not match the number of pointers, but the upper limit of the domain, and each node is the same, so the same B-Tree implementation for each node application tend the size of the space.

In general, B + Tree is more than the B-Tree index structure suitable for implementing the external memory, external memory and the specific reasons and principles related to the principle of computer access, will be discussed below.

2.2.1 B with sequential access pointers + Tree

Generally used in the database system or file system B + Tree structure is optimized on the basis of B + Tree is a classic, an increase of sequential access pointer.

As shown, each of the B + Tree leaf node 4 increases pointer pointing to a leaf node adjacent, is formed with a sequential access pointers B + Tree. This optimization is done in order to improve the performance of the access interval, e.g. in FIG. 4, if the query key is to record all the data from 18 to 49, 18 when they are found, and only along the node traversal order of the pointers can be one-time access all data nodes, referred to the great range of search efficiency.

This section of the B-Tree and B + Tree had a brief introduction, the following section describes the combined memory access principle why the current B + Tree is the preferred data structure of a database system implementation index.

2.3 Why B-Tree (B + Tree)

Mentioned above, such as red-black tree data structure can also be used to implement the index, but the file system and database system widely used B - / + Tree as an index structure, this section will combine knowledge of computer organization principle discussion B - / + Tree as a theoretical basis for the index.

In general, the index itself is also great, the disk can not all be stored in memory, so the index is often stored in the form of an index file. In this case, index lookup process will produce a disk I / O consumption, with respect to the memory access, I / O access to high consumption of several orders of magnitude, so the evaluation of the merits of a data structure as an index of the most important indicator is the disk I / O operations of progressive complexity in the discovery process. In other words, the structural organization of the index to minimize the number of accesses lookup process disk I / O's. The following describes the first memory and the principle of disk access, then B binding analysis of these principles - / + Tree efficiency as an index.

2.3.1 main memory access principle

The host computer memory currently used basically random access memory (RAM), the structure and principles of modern RAM access is relatively complex, particularly where different discard herein, a very simple abstract access model to illustrate the working principle of the RAM.

From an abstract point of view, the main memory is a matrix composed of a series of storage units, the storage unit stores for each fixed-size data. Each memory cell has a unique address, main memory addressing of modern complex rules, there will be simplified into one two-dimensional address: uniquely positioned to a storage unit via a row address and a column address. Figure 5 shows a model of the main memory of 4 x 4.

Main memory access process is as follows:

  1. 当系统需要读取主存时,则将地址信号放到地址总线上传给主存,主存读到地址信号后,解析信号并定位到指定存储单元,然后将此存储单元数据放到数据总线上,供其它部件读取。
  2. 写主存的过程类似,系统将要写入单元地址和数据分别放在地址总线和数据总线上,主存读取两个总线的内容,做相应的写操作。

这里可以看出,主存存取的时间仅与存取次数呈线性关系,因为不存在机械操作,两次存取的数据的“距离”不会对时间有任何影响,例如,先取A0再取A1和先取A0再取D3的时间消耗是一样的。

2.3.2 磁盘存取原理

上文说过,索引一般以文件形式存储在磁盘上,索引检索需要磁盘I/O操作。与主存不同,磁盘I/O存在机械运动耗费,因此磁盘I/O的时间消耗是巨大的。

图6是磁盘的整体结构示意图。

一个磁盘由大小相同且同轴的圆形盘片组成,磁盘可以转动(各个磁盘必须同步转动)。在磁盘的一侧有磁头支架,磁头支架固定了一组磁头,每个磁头负责存取一个磁盘的内容。磁头不能转动,但是可以沿磁盘半径方向运动(实际是斜切向运动),每个磁头同一时刻也必须是同轴的,即从正上方向下看,所有磁头任何时候都是重叠的(不过目前已经有多磁头独立技术,可不受此限制)。

图7是磁盘结构的示意图。

盘片被划分成一系列同心环,圆心是盘片中心,每个同心环叫做一个磁道,所有半径相同的磁道组成一个柱面。磁道被沿半径线划分成一个个小的段,每个段叫做一个扇区,每个扇区是磁盘的最小存储单元。为了简单起见,我们下面假设磁盘只有一个盘片和一个磁头。

当需要从磁盘读取数据时,系统会将数据逻辑地址传给磁盘,磁盘的控制电路按照寻址逻辑将逻辑地址翻译成物理地址,即确定要读的数据在哪个磁道,哪个扇区。为了读取这个扇区的数据,需要将磁头放到这个扇区上方,为了实现这一点,磁头需要移动对准相应磁道,这个过程叫做寻道,所耗费时间叫做寻道时间,然后磁盘旋转将目标扇区旋转到磁头下,这个过程耗费的时间叫做旋转时间。

2.3.3 局部性原理与磁盘预读

由于存储介质的特性,磁盘本身存取就比主存慢很多,再加上机械运动耗费,磁盘的存取速度往往是主存的几百分分之一,因此为了提高效率,要尽量减少磁盘I/O。为了达到这个目的,磁盘往往不是严格按需读取,而是每次都会预读,即使只需要一个字节,磁盘也会从这个位置开始,顺序向后读取一定长度的数据放入内存。这样做的理论依据是计算机科学中著名的局部性原理:

当一个数据被用到时,其附近的数据也通常会马上被使用。

程序运行期间所需要的数据通常比较集中。

由于磁盘顺序读取的效率很高(不需要寻道时间,只需很少的旋转时间),因此对于具有局部性的程序来说,预读可以提高I/O效率。

预读的长度一般为页(page)的整倍数。页是计算机管理存储器的逻辑块,硬件及操作系统往往将主存和磁盘存储区分割为连续的大小相等的块,每个存储块称为一页(在许多操作系统中,页得大小通常为4k),主存和磁盘以页为单位交换数据。当程序要读取的数据不在主存中时,会触发一个缺页异常,此时系统会向磁盘发出读盘信号,磁盘会找到数据的起始位置并向后连续读取一页或几页载入内存中,然后异常返回,程序继续运行。

2.3.4 B-/+Tree索引的性能分析

到这里终于可以分析B-/+Tree索引的性能了。

上文说过一般使用磁盘I/O次数评价索引结构的优劣。先从B-Tree分析,根据B-Tree的定义,可知检索一次最多需要访问h个节点。数据库系统的设计者巧妙利用了磁盘预读原理,将一个节点的大小设为等于一个页,这样每个节点只需要一次I/O就可以完全载入。为了达到这个目的,在实际实现B-Tree还需要使用如下技巧:

每次新建节点时,直接申请一个页的空间,这样就保证一个节点物理上也存储在一个页里,加之计算机存储分配都是按页对齐的,就实现了一个node只需一次I/O。

B-Tree中一次检索最多需要h-1次I/O(根节点常驻内存),渐进复杂度为O(h)=O(logdN)O(h)=O(logdN)。一般实际应用中,出度d是非常大的数字,通常超过100,因此h非常小(通常不超过3)。

综上所述,用B-Tree作为索引结构效率是非常高的。

而红黑树这种结构,h明显要深的多。由于逻辑上很近的节点(父子)物理上可能很远,无法利用局部性,所以红黑树的I/O渐进复杂度也为O(h),效率明显比B-Tree差很多。

上文还说过,B+Tree更适合外存索引,原因和内节点出度d有关。从上面分析可以看到,d越大索引的性能越好,而出度的上限取决于节点内key和data的大小:

dmax=floor(pagesize/(keysize+datasize+pointsize))dmax=floor(pagesize/(keysize+datasize+pointsize))

floor表示向下取整。由于B+Tree内节点去掉了data域,因此可以拥有更大的出度,拥有更好的性能。

这一章从理论角度讨论了与索引相关的数据结构与算法问题,下一章将讨论B+Tree是如何具体实现为MySQL中索引,同时将结合MyISAM和InnDB存储引擎介绍非聚集索引和聚集索引两种不同的索引实现形式。

3. MySQL索引实现

在MySQL中,索引属于存储引擎级别的概念,不同存储引擎对索引的实现方式是不同的,本文主要讨论MyISAM和InnoDB两个存储引擎的索引实现方式。

3.1 MyISAM索引实现

MyISAM引擎使用B+Tree作为索引结构,叶节点的data域存放的是数据记录的地址。下图是MyISAM索引的原理图:

这里设表一共有三列,假设我们以Col1为主键,则图8是一个MyISAM表的主索引(Primary key)示意。可以看出MyISAM的索引文件仅仅保存数据记录的地址。在MyISAM中,主索引和辅助索引(Secondary key)在结构上没有任何区别,只是主索引要求key是唯一的,而辅助索引的key可以重复。如果我们在Col2上建立一个辅助索引,则此索引的结构如下图所示:

同样也是一颗B+Tree,data域保存数据记录的地址。因此,MyISAM中索引检索的算法为首先按照B+Tree搜索算法搜索索引,如果指定的Key存在,则取出其data域的值,然后以data域的值为地址,读取相应数据记录。

MyISAM的索引方式也叫做“非聚集”的,之所以这么称呼是为了与InnoDB的聚集索引区分。

3.2 InnoDB索引实现

虽然InnoDB也使用B+Tree作为索引结构,但具体实现方式却与MyISAM截然不同。

第一个重大区别是InnoDB的数据文件本身就是索引文件。从上文知道,MyISAM索引文件和数据文件是分离的,索引文件仅保存数据记录的地址。而在InnoDB中,表数据文件本身就是按B+Tree组织的一个索引结构,这棵树的叶节点data域保存了完整的数据记录。这个索引的key是数据表的主键,因此InnoDB表数据文件本身就是主索引。

图10是InnoDB主索引(同时也是数据文件)的示意图,可以看到叶节点包含了完整的数据记录。这种索引叫做聚集索引。因为InnoDB的数据文件本身要按主键聚集,所以InnoDB要求表必须有主键(MyISAM可以没有),如果没有显式指定,则MySQL系统会自动选择一个可以唯一标识数据记录的列作为主键,如果不存在这种列,则MySQL自动为InnoDB表生成一个隐含字段作为主键,这个字段长度为6个字节,类型为长整形。

第二个与MyISAM索引的不同是InnoDB的辅助索引data域存储相应记录主键的值而不是地址。换句话说,InnoDB的所有辅助索引都引用主键作为data域。例如,图11为定义在Col3上的一个辅助索引:

这里以英文字符的ASCII码作为比较准则。聚集索引这种实现方式使得按主键的搜索十分高效,但是辅助索引搜索需要检索两遍索引:首先检索辅助索引获得主键,然后用主键到主索引中检索获得记录。

了解不同存储引擎的索引实现方式对于正确使用和优化索引都非常有帮助,例如知道了InnoDB的索引实现后,就很容易明白为什么不建议使用过长的字段作为主键,因为所有辅助索引都引用主索引,过长的主索引会令辅助索引变得过大。再例如,用非单调的字段作为主键在InnoDB中不是个好主意,因为InnoDB数据文件本身是一颗B+Tree,非单调的主键会造成在插入新记录时数据文件为了维持B+Tree的特性而频繁的分裂调整,十分低效,而使用自增字段作为主键则是一个很好的选择。

发布了8 篇原创文章 · 获赞 0 · 访问量 7030

Guess you like

Origin blog.csdn.net/fedorafrog/article/details/104247870