Index of the database (Index)

In addition to data, database systems also maintain data structures that satisfy specific search algorithms.
These data structures reference (point to) data in some way.
This enables advanced lookup algorithms to be implemented on these data structures.

- This data structure is the index.


A database index is an ordered data structure in a database management system. To help quickly query and update data in database tables.

There is a price to pay for setting an index for a table.
One is that it increases the storage space of the database.
The second is that it takes more time to insert and modify data (because the index also changes accordingly).

__________________________________________________________________________________


Index Implementation: B Tree and Its Variant B+ Tree


One, B-tree (Balanced Tree)

In computer science, a B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions , and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children.
In computer science, B-trees are self-balancing tree data structures. It keeps the data sorted and allows searches, sequential accesses, insertions and deletions to be done in logarithmic time. A B-tree is a generalization of a binary search tree whose nodes can have more than two children.

Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. B-trees are a good example of a data structure for external memory. It is commonly used in databases and filesystems.
With Unlike self-balancing binary search trees, B-trees are optimized for systems that read and write large blocks of data. A B-tree is a good example of a data structure for external memory. It is usually used for databases and file systems.



The figure above shows a possible indexing method:
the data table on the left has a total of seven records in two columns.
The leftmost one is the physical address of the data record (note that logically adjacent records are not necessarily physically adjacent on disk).
To speed up the lookup of Col2, a binary search tree shown on the right can be maintained, each node containing the index key value and a pointer to the physical address of the corresponding data record. In this way, the corresponding data can be obtained within the complexity of O(log2n) using binary search.




A third-order B-Tree







2. B+ Tree (Balanced+ Tree)

Features of B+:
      1. All data are stored in the linked list of leaf nodes, and the data in the linked list are stored in order;
      2. Leaf nodes connected by pointers.
      3. It is impossible to hit the non-leaf node;
      4. The non-leaf node is equivalent to the index of the leaf node, and the leaf node is equivalent to the data layer for storing data;     
      5. It is more suitable for the file index system;

B+ performance: equal to Perform a binary search on the data set; the difference between

a second-order B+Tree


and B-Tre:
      A B+ tree hits only when it reaches a leaf node.
      B-trees can hit non-leaf nodes.





A B+ tree is an n-ary tree with a variable but often large number of children per node.
A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.
B+ trees are mutable n-ary trees, usually each node has a large number of children.
A B+ tree consists of a root, internal nodes and leaves. A root can be a leaf or a node with two or more child nodes.


A B+ tree can be viewed as a B-tree in which each node contains only keys (not key-value pairs), and to which an additional level is added at the bottom with linked leaves.
A B+ tree can be viewed as a B-tree, where each (internal) node contains only keys (not key-value pairs), with a chain of leaf nodes appended at the bottom.


The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context — in particular, filesystems.
This is primarily because unlike binary search trees , B+ trees have very high fanout (number of pointers to child nodes in a node, typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree.
The main value of a B+ tree is to store data for storage in a block-oriented storage context (specifically is the file system) for efficient retrieval.
This is mainly because, unlike binary search trees, B+ trees have a very high fan-out (the number of pointers in a node to children, usually around 100 or more), which reduces finding elements in the tree Number of I/O operations required.


The ReiserFS, NSS, XFS, JFS, ReFS, and BFS filesystems all use this type of tree for metadata indexing;
BFS also uses B+ trees for storing directories.
NTFS uses B+ trees for directory indexing.
EXT4 uses extent trees (a modified B+ tree data structure) for file extent indexing.
Relational database management systems such as:

IBM DB2,
Informix,
Microsoft SQL Server,
Oracle 8,
Sybase ASE, and
SQLite

support this type of tree for table indices.

Key-value database management systems such as: (CouchDB and Tokyo Cabinet) support this type of tree for data access.

B+ tree indices are widely used in databases, file systems and other scenarios. By the way, one of the reasons why the xfs file system is much more efficient than ext3/ext4 is that its file and directory index structures all use B+ tree indexes, while the file directory structure of ext3/ext4 uses Linked list, hashed B-tree , Extents/Bitmap and other index data structures, so under high I/O pressure, its IOPS capability is not as good as xfs.

For details, see:

https://en.wikipedia.org/wiki/Ext4
https://en.wikipedia.org/wiki/XFS



_________________________________________________________________________________

Creating an index can greatly improve the performance of the system:

First, it can significantly improve the retrieval speed of data, which is the main reason for creating an index.
Second, by creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.
Third, it can speed up table-to-table joins, especially in terms of achieving referential integrity of data.
Fourth, when using the grouping and sorting clauses for data retrieval, the time for grouping and sorting in the query can also be significantly reduced.
Fifth, by using the index, the optimization hider can be used in the query process to improve the performance of the system.


Adding indexes also has a number of downsides:

First, it takes time to create and maintain indexes. This time increases with the amount of data.
Second, indexes need to occupy physical space. In addition to the data space occupied by the data table, each index also occupies a certain amount of physical space. If you want to create a clustered index, then the space required will be greater.
Third, when adding, deleting and modifying the data in the table, the index should also be dynamically maintained, which reduces the speed of data maintenance.


Depending on the capabilities of the database, three types of indexes can be created in the database designer: unique indexes, primary key indexes, and clustered indexes.





The implementation of the index: Hash index




Hash index uses a hash algorithm, converts the key value into a hash value, and does not need to search from the root node to the leaf node level by level like a B+ tree, just one hash algorithm can Immediately locate the corresponding position, the speed is very fast.


From the above figure, the obvious difference between the B+ tree index and the hash index is:

1. If it is an equivalent query, then the hash index obviously has an absolute advantage.
Because only one algorithm is needed to find the corresponding key value; of course, the premise is that the key value is unique. If the key value is not unique, you need to find the location of the key first, and then scan backwards according to the linked list until the corresponding data is found;

2. If it is a range query retrieval, then the hash index is useless.
Because the original key value is ordered, after the hash algorithm, it may become discontinuous, and there is no way to use the index to complete the range query retrieval; the hash index also cannot use the index to complete the sorting, and like 'xxx %' (this kind of partial fuzzy query is actually a range query);

3. The hash index does not support the leftmost matching rule of the multi-column joint index;
the keyword retrieval efficiency of the B+ tree index is relatively average , unlike the B-tree, which fluctuates greatly, in the case of a large number of duplicate key values, the efficiency of the hash index is also extremely low, because there is a so-called hash collision problem.










Attachment: binary algorithm

In computer science,
    binary search (binary search),
    also known as half-interval search (half-interval search),
    logarithmic search (logarithmic search),
is a search for finding a specific element in an ordered array algorithm.

This search algorithm reduces the search range by half for each comparison.






quote

The time complexity analysis

of the three-point search searched on the Internet about the time complexity analysis of the three-point search. Some people say that it is O(3log3(n)), but three points are more time-consuming than two points in the experiment, so they think that the time complexity cannot be superstitious. Let me correct it now (only for personal analysis, for the reference of netizens):
   1. The time complexity of binary search: because every time it is halved, a recursive tree can be constructed with a total of log2(n) layers, and each layer only needs O( 1) time. So it takes O(1)*log2(n)=O(log2(n)) time in total.
   2. For three-point search, a recursive tree can also be constructed similarly, with a total of log3(n) layers, and the number of comparisons required for each layer is 2, so the time complexity is
O(2log3(n)).

   Find that 2log3(n)>log2(n) always holds for n>0.

   Therefore, the performance of three-point search is worse than that of binary search.




Attachment: Index data structures

commonly used in MySQL There are two types of index data structures commonly used in MySQL: B+ tree index and hash index.

In the MySQL documentation, the B+ tree index is actually written as BTREE, for example, as follows:

CREATE TABLE t(
    aid int unsigned not null auto_increment,
    userid int unsigned not null default 0,
    username varchar(20) not null default ‘’,
    detail varchar(255) not null default ‘’,
    primary key(aid),
    unique key(uid) USING BTREE,
    key (username(12)) USING BTREE — here the uname column only creates a partial index of the leftmost 12 characters long
)engine=InnoDB;  






--Reference
:

The Implementation Principle of Database Index
http://blog.csdn.net/kennyrose/article/details/7532032

B-tree, B-tree, B+ tree, B* tree
http://www.cnblogs.com/oldhorse /archive/2009/11/16/1604009.html

B-tree
https://en.wikipedia.org/wiki/B-tree

B+Tree
https://en.wikipedia.org/wiki/B%2B_tree

B+tree The difference between index and hash index
http://imysql.com/2016/01/06/mysql-faq-different-between-btree-and-hash-index.shtml

binary search algorithm
https://zh.wikipedia.org /wiki/%E4%BA%8C%E5%88%86%E6%90%9C%E7%B4%A2%E7%AE%97%E6%B3%95Time

complexity analysis of three-point
searchhttp:/ /www.programgo.com/article/79762375640




_______________________________________________________________________________



HashMap series of articles (1):
Java's equals() and hashCode() HashMap


HashMap series of articles (2):
Java's HashMap deep learning


HashMap series of articles (3):
Database index (Index)


HashMap series of articles (4):
Java's HashMap VS . HashTable difference







-

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326473133&siteId=291194637