Indexing

Introduction

  • Indexing is a process of associating a key with the location of a corresponding data record(类似于查字典). Each key is associated with a pointer to a complete record
  • Might have several associated index files(一个指针可以指向多条记录)
    在这里插入图片描述
  • Index Types
    • Primary Key
      • A unique identifier for records
      • Not meaningful
    • Secondary Key
      • Alternate search key
        • People may be more interested in it
        • e.g. Score
      • Often not unique for each record
      • Frequently use

Linear Index

  • Index file organized as a simple sequence of key/record pointer pairs with key values are in sorted order
    在这里插入图片描述
  • Second-level index can be applied to secondary key search (二级检索)
    二级索引:叶子节点中存储主键值,每次查找数据时,根据索引找到叶子节点中的主键值,根据主键值再到聚簇索引中得到完整的一行记录。
    在这里插入图片描述
    • Drawbacks
      • The index update due to record inserting/deleting is expensive
        • 删除一个后面的都要往前移
      • Records with the same (secondary) key will waste memory space意思是第二行两个“Guangzhou”没必要
        • Solution1
          • A drawback to this approach is that the array must be of fixed size (Memory Wasting)在这里插入图片描述
        • Solution2(Inverted Index) : each secondary key has its own array list
          在这里插入图片描述
        • Solution3 : Combining the lists into one array based list
          在这里插入图片描述

Tree Index

2-3 Tree

  • Characters
    • A node contains one or two keys (一个key–两个小孩,两个key–三个小孩)
    • Every internal node has either
      • Two children (if it contains one key (left))
      • Three children (if it contains two keys)
    • All leaves are at the same level in the tree, so the tree is always height balanced
    • 左边的孩子比root小,中间的值在区间内,右边的值比root
  • ADT
    • template
    	template <class Elem>
    	class TTNode {
          
           // 2-3 tree node structure
    	public:
    		Elem lkey; // The node's left key
    		Elem rkey; // The node's right key
    		TTNode* left; // Pointer to left child
    		TTNode* center; // Pointer to middle child
    		TTNode* right; // Pointer to right child
    		TTNode() {
          
           
    			center = left = right = NULL; 
    			lkey = rkey = EMPTY; }
    		~TTNode() {
          
           }
    		bool isLeaf() {
          
           
    			return left == NULL; 
    		}
    	};
    
    • Find
    template <class Key, class Elem, class KEComp, class EEComp>
    bool TTTree<Key, Elem, KEComp, EEComp>::
    findhelp(TTNode<Elem>* subroot, const Key& K, Elem& e) const {
          
          
    	//检查是否为空
    	if (subroot == NULL) // val not found
    		return false; 
    	//检查左右键值是否相等
    	if (KEComp::eq(K, subroot->lkey)) {
          
           
    		e = subroot->lkey; //用来承接返回值
    		return true;
    	}
    	if ((subroot->rkey != EMPTY) && (KEComp::eq(K, subroot->rkey))) {
          
           
    		e = subroot->rkey;
    		return true; 
    	} 
    	//递归检查子节点
    	if (KEComp::lt(K, subroot->lkey)) // Search left
    		return findhelp(subroot->left, K, e);
    	else if (subroot->rkey == EMPTY) // Search center
    		return findhelp(subroot->center, K, e);
    	else if (KEComp::lt(K, subroot->rkey)) // Search center
    		return findhelp(subroot->center, K, e);
    	else 
    		return findhelp(subroot->right, K, e); // Right		
    }
    
    • insert
      • 注意这里有一个promotion的操作,当要插入的地方已经满了的时候,就把几个孩子按顺序排列后的中间那个放到父节点中去,剩下两边分裂成两个孩子,如果父节点满了也是相同的操作,直至插入为止。
    bool insert(const Elem& e) {
          
           // Insert node with value val
    	Elem retval; // Smallest value in newly created node
    	TTNode<Elem>* retptr = NULL; // Newly created node
    	bool isSuccess = inserthelp(root, e, retval, retptr);
    	if (retptr != NULL) {
          
           // Root overflowed: make new root
    		TTNode<Elem>* temp = new TTNode<Elem>; 
    		temp->lval = retval;
    		temp->left = root; 
    		temp->center = retptr;
    		root = temp; root->keycount = 1;
    	}
    	return isSuccess;
    }
    
    template <class Key, class Elem, class KEComp, class EEComp>
    bool TTTree<Key, Elem, KEComp, EEComp>::
    inserthelp(TTNode<Elem>*& subroot, const Elem& e, Elem& retval, TTNode<Elem>*& retptr) {
          
          
    	Elem myretv;
    	TTNode<Elem>* myretp = NULL;
    	//对于一个空树 新建的节点放左边
    	if (subroot == NULL) {
          
           // Empty tree: make new node
    		subroot = new TTNode<Elem>();
    		subroot->lkey = e; 
    	}
    	
    	else 
    		//插入位置是叶子结点
    		if (subroot->isLeaf()) // At leaf node: insert here
    			//不是满的(右边节点是空)
    			if (subroot->rkey == EMPTY) {
          
           // Easy case: not full
    				if (EEComp::gt(e, subroot->lkey)) 
    					subroot->rkey = e;
    				else {
          
          
    					subroot->rkey = subroot->lkey;
    					subroot->lkey = e; 
    				}
    			}
    			//是满的
    			else // if full
    				splitnode(subroot, e, NULL, retval, retptr); 
    		//插入位置是中间结点 就要一直找到叶子结点再插入
    		else 
    			if (EEComp::lt(e, subroot->lkey)) // Find a correct position
    				inserthelp(subroot->left, e, myretv, myretp);
    			else 
    				if ((subroot->rkey == EMPTY) || (EEComp::lt(e, subroot->rkey)))
    					inserthelp(subroot->center, e, myretv, myretp);
    				else inserthelp(subroot->right, e, myretv, myretp);
    /*				
    inptr : input value
    tval : return value
    retptr : return pointer
    */
    
    template <class Key, class Elem, class KEComp, class EEComp>
    bool TTTree<Key, Elem, KEComp, EEComp>::
    splitnode(TTNode<Elem>* subroot, const Elem& inval,TTNode<Elem>* inptr, Elem& retval, TTNode<Elem>*& retptr) {
          
          
    	retptr = new TTNode<Elem>(); // Node created by split
    	//插在最左边
    	if (EEComp::lt(inval, subroot->lkey)) {
          
           // Add at left
    		retval = subroot->lkey;  //这是被promote节点的值 作为返回值
    		subroot->lkey = inval;  //把插入值当成左孩子
    		
    		retptr->lkey = subroot->rkey;  //把右值作为promote节点的右孩子
    		retptr->left = subroot->center;
    		retptr->center = subroot->right;
    		subroot->center = inptr; 
    	}
    	//插在中间
    	else if (EEComp::lt(inval, subroot->rkey)) {
          
           // Center
    		retval = inval;
    		
    		retptr->lkey = subroot->rkey;
    		retptr->left = inptr;
    		retptr->center = subroot->right; 
    	}
    	//插在右边
    	else {
          
           // Add at right
    		retval = subroot->rkey;
    		
    		retptr->lkey = inval;
    		retptr->left = subroot->right;
    		retptr->center = inptr; 
    	}
    	subroot->rkey = EMPTY;
    }
    

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

B-Tree

  • A generalization of the 2-3 Tree (order three)
  • A B-Tree of order b has these properties:
    • The root is either a leaf or has at least two children
    • Each internal node has between ceil(b/2) and b children 注意这里是对于中间结点的
    • All leaves are at the same level in the tree, so the tree is always height balanced
    • 这里观察到order是一个root可以有的最多孩子数 实际上的格子数要比order小一个
  • Example
    在这里插入图片描述
    在这里插入图片描述

B±Tree

  • Characteristic

    • Internal nodes do not store record 中间结点只是一个索引分类的作用 实际上储存的值都在leaf
      • Only key values to guild the search
    • Leaf nodes store records or pointers to records
    • Leaf node, except the root, should store between ceil(t/2) and t values, t is the maximum number of values can be stored 注意这里和上面的不同 上面的是对于结点的孩子数而言的 而这个是对结点储存的值而言的 叶子结点能储存的最多数量和order+1数量一样 最少储存order的一半

    B+ tree of order four
    Internal Node: 1 – 4
    Leaf Node: Max 5 indexes

    • Leaf node may store more or less records than an internal node stores keys
  • Insert

    • 当要做promote操作但叶子节点为偶数时,选择中间偏右的结点promote.
    • 相同的数放在该节点的右边.
    • 对于internal node 和 leaf node插入的原则不一样。internal node同上,而leaf node 要copy一份数据值留在下面。
  • B+ Trees nodes are always at least half full.

  • Space Analysis

Example: Consider a B±Tree of order 100 with leaf nodes containing 100 records.
就是要算一个叶子节点能储存100个值的B+树 最少/最多能储存多少值
Tips : Min. number of records in N level must be smaller than or equal to Max. number of records in N-1 level.

  • 1st level B+tree :
    Min: 1
    Max: 100
  • 2nd level B+tree :
    Min : 2 leaves of 50 for 100 records
    Max: 100 leaves with 100 for 10,000 records
  • 3rd level B+tree :
    Min : 100 leaves of 50 for 5000 records
    Max: 10000 leaves with 100 for 1000,000 records
  • 4th level B+tree :
    Min : 100 leaves of 50 for 5000 records
    Max: 10000 leaves with 100 for 1000,000 records
    最小的一直是按50个(最大数量的一半)算,最大的一直是100个

猜你喜欢

转载自blog.csdn.net/yxyxxxyyyy/article/details/121477962