mysql index parsing

In MySQL, an index is a data structure used by the storage engine to quickly find the target record . Common index types include B-tree index, hash index, spatial index (R-Tree), full-text index, etc.

The index is implemented at the storage engine layer, and different storage engines work differently on the index.

The following focuses on B-tree indexes and the innodb and myisam storage engines.

Reasons for choosing a B-tree

The most expensive part of reading and writing to disk is seek. Accessing range data in sequence is fast for two reasons:

  1. Sequential I/O does not require multiple seeks, so it is much faster than random I/O (especially for hard drives).
  2. If the server can read the data in the required order, then no additional sorting operations are required, and goup by queries do not need to do sorting and grouping of rows into aggregates.

The index itself can be too large to fit all in memory, so it is often stored on disk in the form of index files. In this way, disk I/O consumption is incurred during the index lookup process. Compared with memory access, the consumption of I/O access is several orders of magnitude higher. When designing an index, its structural organization should minimize the number of disk I/O accesses during the search process.

The B-tree is a balanced search tree similar to the red-black tree, but it is better in reducing the number of disk I/O operations. The B-tree has a lower depth and only needs to load a few nodes from the disk to find an element. memory, you can quickly access the data you are looking for.

Introduction to B-trees

B-treeB-tree

A B-tree T is a rooted tree (root is root[T]) with the following properties:

  1. Each node x has the following domains:
    • xn, the number of keywords contained in node x.
    • xn keys themselves, in non-descending order, so .
    • x.leaf, boolean, TRUE if x is a leaf node, FALSE if it is an inner node
  2. Each inner node x contains x.n+1 pointers to its children . Leaf nodes have no children, so the pointer fields of their children are undefined.
  3. If ki is a keyword stored in a child of node x then:

  4. 每个叶节点具有相同的深度,即树的高度h

  5. 每个节点所包含的关键字个数x.n包含一个上界和下界,用一个固定的整数t>=2来表式;
    • 每个非根的节点至少包含t-1个关键字。每个非根的内节点至少有t个子女,如果树是非空的,则根节点至少包含一个关键字。
    • 每个节点至多包含2t-1个关键字,所以说一个内节点至少包含2t个子女,我们说一个节点是满的,如果这个节点恰好包含2t-1个关键字。

A B-tree of height 3 that contains the smallest possible number of keywords, displayed within each node x is n[x]一棵高度为3的B树,它包含最小可能的关键字数,在每个节点x内显示的是n[x]

B+树

B+树是B树的一个变种,B+树比B树更适合实现外存储索引结构,MySQL存储引擎普遍使用B+Tree实现其索引结构。内节点只包含键值以及指向子节点的指针,数据存储在叶子节点,所有记录节点都是按照键值的大小顺序存放在同一层的叶节点中,各节点指针进行连接(双向链表)。

A B+ tree of height 2一棵高度为2的B+树

如图:所有记录都在叶节点中,井且是顺序存放的,如果我们从最左边的
叶节点开始顺序遍历,可以得到所有镗值的顺序排序15、10、15、20、25、30、50、55、60、 65、 75、 80、 85、 90

B+树内节点和叶节点的大小可以是不同的。

索引实现

存储引擎以不同的方式使用B+树,索引列是按照顺序组织的。B+树索引在数据库中有一个特点就是其高扇出性,因此数据库中B+树的高度一般都在2~3层,也就是说对于查找某一键值的行记录,最多只需要2到3次IO。

Index built on B-tree structure建立在B树结构的索引

B+树索引

数据库中B+树索引可以分为聚集索引(clustered index)和辅助聚集索引(secondary index),但不管是聚集还是非聚集的索引,其内部都是B+树的,即高度平衡的,叶节点存放着所有的数据。

聚集索引与非聚集索引不同的是,叶节点存放的是否是一整行的信息。

来看看InnodDB和MyISAM是如何存储下面这张表的:

1
2
3
4
5
6
Create Table layout_test(
	col1 int not null,
	col2 int not null,
	primary key(col1),
	key(col2) -- 二级索引
)

MyISAM引擎使用B+Tree作为索引结构,索引文件和数据文件是分离的,索引文件仅保存数据记录的地址。叶节点每个项的数据域存放的是记录的地址记录写入时按照插入的顺序存储在磁盘上

Data distribution of MyISAM table layout_testMyISAM表layout_test的数据分布

MyISAM中主键索引和其它索引在结构上面没有什么不同,主键索引就是一个名为Primary的唯一非空索引。

Primary key distribution of MyISAM table layout_testMyISAM表layout_test的主键分布

Index distribution of col2 column of MyISAM table layout_testMyISAM表layout_test的col2列索引分布

而在InnoDB中,主键就是聚集索引,表数据文件本身就是按B+Tree组织的一个索引结构,叶子节点包含了主键和行的全部数据,内节点页只包含了索引列主键。聚集索引中键值的逻辑顺序决定了表中相应行的物理顺序。InnoDB二级索引的叶子节点中存储的不是“行指针”,而是主键值,并以此作为指向行的“指针”,可知聚集索引就是按照表的主键构造的一棵B+树。

Primary key distribution of InnoDB table layout_testInnoDB表layout_test的主键分布

聚集索引的叶节点的每一个项包含了主键、事务ID、用于事务和MVCC的回滚指针以及所有的剩余列。

Primary key distribution of InnoDB table layout_testInnoDB表layout_test的主键分布

InnoDB的二级索引和聚集索引很不相同。InnoDB二级索引的叶子节点存储的不是“行指针”,而是主键值,并以此作为指向“行的指针”。这样的策略减少了当前行移动或者数据页分裂时二级索引的维护工作。使用主键值当做索引指针会让二级索引占用更多的空间,换来的好处是,InnoDB在移动时无需更新二级索引中的这个“指针”。

从下图比较容易看出InnoDB和MyISAM保存数据和索引的区别。

Aggregated and non-aggregated comparison chart聚集和非聚集对比图

聚集索引的优点

聚集索引的存储井不是物理上的连续,相反是逻辑上连续的,页内是连续的,页间通过双向链表链接。

聚集索引的另一十好处是,它对于主键的排序查找和范国查找速度非常快。

innodb的逻辑存储结构, 从InnoDB存储引擎的逻辑存储结构看,所有数据都被逻辑地存放在一个空间中,称之为表空间(tablespace)。表空间又由段(segment)、区(extent)、页(page)组成。页在一些文档中有时也称为(block),InnoDB存储引擎的逻辑存储结构大致如图:

imageimage

  • 可以把相关数据保存在一起,尤其是访问的数据聚集在一个页上,可以减少io次数;
  • 数据访问更快。聚族索引将索引和数据保存在同一个B树中,因此从聚族索引中获取数据通常比在非聚族索引中查找更快。
  • 使用覆盖索引扫描的查询可以直接使用节点中的主键值。

聚集索引的缺点

  • Clustered data maximizes the performance of I/O-intensive applications, but if the data is all in memory, the order of access is not so important, and the clustered index is not so advantageous;
  • Insertion speed is heavily dependent on insertion order. Inserting in primary key order is the fastest way to load data into an InnoDB table. But if the data is not loaded in primary key order, it is best to use the OPTIMIZE TABLE command to reorganize the table after the load is complete.
  • Updating a clustered index column is expensive because InnoDB is forced to move each updated row to a new location.
  • A table based on a clustered index may face "page splits" when a new row is inserted, or when the primary key is updated and the row needs to be moved . When the primary key value of a row requires that the row must be inserted into a full page, the storage engine will split the page into two pages to accommodate the row, which is a split operation . Page splits cause tables to take up more disk space.
  • Clustered indexes can cause full table scans to be slow, especially if rows are sparse, or data storage is not contiguous due to page splits.
  • Secondary indexes (nonclustered indexes) may be larger than expected because the leaf nodes of the secondary index contain the primary key columns of the referenced row.
  • Secondary index access requires two index lookups instead of one.

refer to:

high performance mysql

The data structure and algorithm principle behind MySQL index

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327034246&siteId=291194637