Index notes summary

 

  Today reviewed the content indexing part, some feel it is important to write something here, something wrong place, please feel free.

 

  1, the original data file is the main file, an index file is composed of data of the index file; file may have a plurality of master index file, the index may be a plurality of records having the same secondary code through the index file

  2, the index refers to a dense index entry for each record is established, data is not required to order the like; however sparse index is already sorted records index, an index item may be established on a set of records, and the index pointer to initial address recorded on the disk .

  3, an example of a secondary index: A total of 10,000 records, each record index is 8 bytes, the size of each disk is 1024 bytes, the need to 8 * 10000/1024 = 79 disks can create dense index, and this 79 index disk into a disk which can be re-established secondary index. It is an ordered index between such a disk, as long as each of the disk block stored in the secondary index which is the minimum recording it. Significance of establishing secondary indexes that disk can reduce the number of I / O : The above example can be seen, we can first determine a secondary index inside the disk, the disk can be read directly up; and there is no secondary index then you need a disk to a disk read.

  4, inverted index: The property value to create an index file that can support efficient retrieval based on attributes, but took hold inverted index table space costs, reducing the efficiency of the updated; there are inverted text files (word index/ full-text index)

  5, Dynamic index: B tree / B + tree

  . 6, B-tree is defined: m-order B-tree is an m-way search tree, each node of which may have multiple records, k - 1 records the node has k child nodes. While at least two sub-tree root node, non-leaf node has at least a floor (m / 2) sub-tree, the leaf nodes are in the same layer, there floor (m / 2) - 1 to m - 1 key code. B-tree to meet " the BST" nature, the general form of each node are (p0, k1, p1, k2 , ..., pn-1, kn, pn) that is if there are n elements of n + 1 pointers to n +1 child nodes.

  7, B tree search process, first the root of k1 ~ kn find it again, found on the bin, or we need to find the corresponding pointer to get the next node, recursion, if the last pointer to external nodes on A lookup failed.

  8, pay attention to maintaining its B-tree structure when inserted, so you may want to continue to split up, so that even split to the root tree height increased 1.

  9, when the deletion of the B-tree, if the node to be deleted and will not overflow the leaf nodes under it, then delete, or need to consider merging operations, to borrow a neighbor (node more), otherwise merge and neighbors (directed to a parent node); in the case of the internal node and the subsequent need to consider deleting exchange.

  10, the outbound B-tree convention: sufficient memory, retrieve the read point, when no read split up, an internal node (non-root) when necessary splitting write twice the disk, the root of the time division need to write three discs, so that a height h of the B-tree insertion of the take up to h + 2 (h-1) + 3 = 3h + 1 outer visit

  . 11, B + tree is a modified B-tree, all the nodes are leaf key point, and each node of the child node are recorded inside the maximum (minimum) key of replication, n-key code node has n number of leaf nodes, and other B-tree similar.

  12, there are N code key B-tree has N + 1 external null pointer, demonstrated as follows:

         When N = 1 Proposition is clearly established

         Suppose proposition N <= k established, considering the case of N = k + 1, assuming the root node has a key code s, s + 1 it has child nodes, XI B represents the number of i-th key tree having , total number of keys and children nodes as X . 1 + X 2 + ... + X S +. 1 = K +. 1 - assuming s, each child node corresponding to the sub B-tree satisfies the premise of induction, the the induction hypothesis, the number of external null pointer they provide for the sum (1 + xi, i = 1 to s + 1) = k + 1 - s + s + 1 = k + 2, so that the sub-way pair of N = k + case 1 also have to permit the

  13, 12 using conclusion, we can estimate the number of times outside a B Find the largest tree visit:

  

 

   Instead of using the height h k can be estimated.

  14, let us estimate a node insertion, the node number of the B-tree split (split may continually) for a m-order B-tree, with the N key, P nodes, then N should satisfy N> the key to obtain a key minimal - = 1 + [(p- 1) * floor (m / 2) - 1], that is a key node provides the other nodes provide floor (m / 2) code number. Except that then the first node from the other nodes are split, so s = (p-1) / (N-1) <= 1 / (floor (m / 2) - 1)

  15, a minimum number of times to read the disk, then the B-tree nodes as much as possible to put more keys, then read the disk number up by a previous estimation formula k can be estimated.

  16, RB tree after leaving to write together and AVL tree

Guess you like

Origin www.cnblogs.com/zyna/p/12082976.html