BTree properties
BTree also known as multi-channel balanced search trees , BTree characteristics of a fork m as follows:
- Each node in the tree contains up to m children.
- In addition to the root node and leaf nodes, each node has at least [ceil (m / 2)] children.
- If the root is not a leaf node, then there are at least two children.
- All leaf nodes are at the same level.
- Each non-leaf node key by n and n + 1 pointers, where [ceil (m / 2) -1] <= n <= m-1.
BTree insert
BTree fork 5, for example, the number of key: Formula Derivation [ceil (m / 2) -1] <= n <= m-1. Therefore 2 <= n <= 4. When n> 4, splitting the intermediate node to the parent node, nodes on both sides of the split.
- To insert CNGAHEKQMFWLTZDPRXYS example, the first four letters have nothing to say.
- Insert H, n> 4, an intermediate element split up the letters G to the new node.
- Insert E, K, Q do not need to split.
- M is inserted, the intermediate element M letters split up to the parent node G.
- Insert F, W, L, T do not need to split.
- Inserting Z, T intermediate element split up into the parent node.
- Insert D, D intermediate element split up into the parent node. Then insert P, R, X, Y need not split.
- Finally insert S, NPQR node n> 5, Q split up the intermediate node, but the parent node of the split DGMT the n> 5, the intermediate node M split up. It should be noted that the original BTree third child node will contain HKL DG node.
This, a BTree building is complete, how? Is not it simple. Delete slightly more complicated than the insertion, given the length, not narrative.
B+Tree
B + Tree is BTree variant, B + Tree is distinguished from BTree:
- N-subtree containing the B + Tree up to n key, and BTree contain up to n-1 th key.
- B + Tree Leaf node maintains all key information, key arrangement according to size.
- All non-leaf nodes can be regarded as part of the index key, the node contains only the maximum (or minimum) key of its child nodes.
由于B+Tree只有叶子节点保存key信息,查询任何key都要从root走到叶子。所以B+Tree的查询效率更加稳定。
带有顺序指针的B+Tree
MySql索引数据结构对经典的B+Tree进行了优化。
在原B+Tree的基础上,增加一个指向相邻叶子节点的指针,就形成了带有顺序指针的B+Tree,提高区间访问的性能。
如上图访问18-49的元素,只需要顺着18的指针走向49即可。
MySql索引数据结构
在mysql中,索引的实现方式与存储引擎相关,MySql支持多种索引类型,如B+Tree、Hash索引、全文索引等等。在此只关注MyISAM与InnoDB的B+Tree索引数据结构。
MyISAM的B+Tree索引
MyISAM的主键索引与辅助索引在结构上没有任何区别,只是主键索引要求key唯一。可以看出,MyISAM的索引叶节点保存的是表的行的物理地址值。
MyISAM的索引是“非聚集”的,这么称呼只是为了与InnoDB的聚集索引相区分。
InnoDB的B+Tree索引
InnoDB的索引实现方式与MyISAM截然不同,InnoDB的B+Tree叶子节点保存有完整的记录信息。这也解释了上篇所说的InnoDB的索引与数据文件是同一个文件。
上图是B+tree的主键索引,这种索引也叫做聚集索引。InnoDB索引必须按照主键聚集,所有InnoDB必须要包含有主键。如果没有显示指定,MySql会自动选择一个唯一标识列或生成一个隐含字段作为主键。
上图是InnoDB的B+Tree辅助索引,B+Tree的叶子节点只保存主键的值而不是行的地址值。所以辅助索引的检索需要检索两遍索引。
因此,对于InnoDB的B+Tree索引使用有两个注意点:
- 建议使用主键自增。由于B+Tree的特性,非自增的主键在插入时会造成B+Tree频繁的分裂。
- 不建议主键字段过长。由于所有的辅助索引都会检索主键索引,过长的主键索引会使辅助索引过大。