MIT6.830 lab5 一个简单数据库实现


前言

这次的lab是实现B+树索引,BTreeFile 由四种不同的页面组成。

首先是树的节点有两种不同类型的页面:BTreeInternalPageBTreeLeafPage

此外还有头部节点页面BTreeHeaderPage用于跟踪文件中的哪些页面正在使用

最后,还有一个B+树的根节点BTreeRootPtrPage

完整代码


一、关于lab5?

In this lab you will implement a B+ tree index for efficient lookups and range scans. We supply you with all of the low-level code you will need to implement the tree structure. You will implement searching, splitting pages,
redistributing tuples between pages, and merging pages.
You may find it helpful to review sections 10.3–10.7 in the textbook, which
provide detailed information about the structure of B+ trees as well as pseudocode for searches, inserts and deletes.
As described by the textbook and discussed in class, the internal nodes in B+ trees contain multiple entries, each consisting of a key value and a left and a right child pointer. Adjacent keys share a child pointer, so internal nodes
containing m keys have m+1 child pointers. Leaf nodes can either contain
data entries or pointers to data entries in other database files. For
simplicity, we will implement a B+tree in which the leaf pages actually contain the data entries. Adjacent leaf pages are linked together with right and left sibling pointers, so range scans only require one initial search through the root and internal nodes to find the first leaf page. Subsequent leaf pages are found by following right (or left) sibling pointers.

课程地址

Lab地址

二、lab5

1.Exercise 1

实现BTreeFile中的findLeafPage()方法,即根据提供的field找到对应的叶页面BTreeLeafPage。一个递归查询解决,特殊情况如果提供的值为 null,则每次都在最左边的孩子上递归,这里使用lab提供的BTreeFile.getPage()来获取页面,该方法需要一个额外的参数来跟踪脏页列表,下面的Exercise需要用到这个参数。

值得注意的是对BTreeInternalPage的迭代是以BTreeEntry为单位进行迭代,BTreeEntry包含内部节点中的key,左孩子的BTreePageId,以及右孩子的BTreePageId。

	private BTreeLeafPage findLeafPage(TransactionId tid, Map<PageId, Page> dirtypages, BTreePageId pid, Permissions perm,
                                       Field f)
					throws DbException, TransactionAbortedException {
    
    
		// some code goes here

		int type = pid.pgcateg();
		if(type == BTreePageId.LEAF){
    
    
			return (BTreeLeafPage) getPage(tid,dirtypages,pid,perm);
		}
		BTreeInternalPage internalPage = (BTreeInternalPage) getPage(tid,dirtypages,pid,Permissions.READ_ONLY);
		Iterator<BTreeEntry> it = internalPage.iterator();
		BTreeEntry entry = null;
		while (it.hasNext()){
    
    
			entry = it.next();
			if(f == null){
    
    
				return findLeafPage(tid,dirtypages,entry.getLeftChild(),perm,f);
			}
			if(entry.getKey().compare(Op.GREATER_THAN_OR_EQ,f)){
    
    
				return findLeafPage(tid,dirtypages, entry.getLeftChild(), perm,f);
			}
		}
		return findLeafPage(tid,dirtypages, entry.getRightChild(), perm,f);
	}

2.Exercise 2

实现splitLeafPage()和splitInternalPage()方法。findLeafPage()可用于找到我们应该插入元组的正确叶页。但是,每个页面的插槽数量有限,即使相应的叶页面已满,我们也需要能够插入元组。这样的话尝试将元组插入完整的叶页面会导致该页面拆分,以便元组在两个新页面之间均匀分布。

每次叶子页面分裂时,都需要将对应于第二页中的第一个元组的新条目添加到父节点。有时,内部节点也可能已满,无法接受新条目。在这种情况下,父级应该拆分并向其父级添加一个新条目,这可能会导致递归分裂并最终创建新的根节点。

在这里插入图片描述
在这里插入图片描述
首先实现分裂叶子节点:

	public BTreeLeafPage splitLeafPage(TransactionId tid, Map<PageId, Page> dirtypages, BTreeLeafPage page, Field field)
			throws DbException, IOException, TransactionAbortedException {
    
    
		// some code goes here
        //
        // Split the leaf page by adding a new page on the right of the existing
		// page and moving half of the tuples to the new page.  Copy the middle key up
		// into the parent page, and recursively split the parent as needed to accommodate
		// the new entry.  getParentWithEmtpySlots() will be useful here.  Don't forget to update
		// the sibling pointers of all the affected leaf pages.  Return the page into which a 
		// tuple with the given key field should be inserted.

		//1、通过getEmptyPage创建一个newRightPage,
		BTreeLeafPage newRightPage = (BTreeLeafPage) getEmptyPage(tid,dirtypages,BTreePageId.LEAF);

		//2、将当前page中一半的tuple插入到newRightPage中。插入时应该先从page中删除tuple,然后再插入到newRightPage。
		// (newRightPage插入tuple后会给其赋值新的recordId,page删除tuple时根据其recordId进行查找然后删除,而page无法定位到被赋值了新recordId的tuple,则无法将其删除)。
		int tuplesNum = page.getNumTuples();
		Iterator<Tuple> it = page.reverseIterator();
		for (int i=0; i<tuplesNum / 2; i++){
    
    
			Tuple tuple = it.next();
			page.deleteTuple(tuple);
			newRightPage.insertTuple(tuple);
		}

		//3、如果当前page有右兄弟oldRightPage,将oldRightPage左兄弟的指针指向newRightPage,
		// 将newRightPage的右兄弟指针指向oldRightPage。并将oldRightPage添加到dirtypages中。
		BTreePageId oldRightPageId = page.getRightSiblingId();
		BTreeLeafPage oldRightPage = oldRightPageId == null ? null : (BTreeLeafPage) getPage(tid,dirtypages,oldRightPageId,Permissions.READ_ONLY);
		if(oldRightPage != null){
    
    
			oldRightPage.setLeftSiblingId(newRightPage.getId());
			newRightPage.setRightSiblingId(oldRightPageId);
			dirtypages.put(oldRightPageId,oldRightPage);
		}

		//4、将page的右兄弟指针指向newRightPage,newRightPage的左兄弟指针指向page。将page、newRightPage添加到dirtypages中。
		page.setRightSiblingId(newRightPage.getId());
		newRightPage.setLeftSiblingId(page.getId());
		dirtypages.put(page.getId(),page);
		dirtypages.put(newRightPage.getId(),newRightPage);

		//5、获取指向该page的内部节点,在其中添加一个指向page和newRightPage的新entry。将父entry所在的page添加到dirtypages中。
		BTreeInternalPage parent = getParentWithEmptySlots(tid,dirtypages,page.getParentId(),field);
		Field mid = newRightPage.iterator().next().getField(keyField);
		BTreeEntry entry = new BTreeEntry(mid,page.getId(),newRightPage.getId());
		parent.insertEntry(entry);
		dirtypages.put(parent.getId(),parent);

		//6、更新page、newRightPage的父指针。
		updateParentPointers(tid, dirtypages, parent);

		//7、返回field所在的页(page或newRightPage)
		if(field.compare(Op.GREATER_THAN_OR_EQ,mid)){
    
    
			return newRightPage;
		}
		return page;
		
	}

再实现分裂内部节点

	public BTreeInternalPage splitInternalPage(TransactionId tid, Map<PageId, Page> dirtypages,
			BTreeInternalPage page, Field field) 
					throws DbException, IOException, TransactionAbortedException {
    
    
		// some code goes here
        //
        // Split the internal page by adding a new page on the right of the existing
		// page and moving half of the entries to the new page.  Push the middle key up
		// into the parent page, and recursively split the parent as needed to accommodate
		// the new entry.  getParentWithEmtpySlots() will be useful here.  Don't forget to update
		// the parent pointers of all the children moving to the new page.  updateParentPointers()
		// will be useful here.  Return the page into which an entry with the given key field
		// should be inserted.

		// 1、通过getEmptyPage创建一个newRightPage。
		BTreeInternalPage newRightPage = (BTreeInternalPage) getEmptyPage(tid,dirtypages,BTreePageId.INTERNAL);
		Iterator<BTreeEntry> it = page.reverseIterator();

		// 2、将当前page中一半的entry插入到newRightPage中。同样,先从page中删除entry,再将其插入到newRightPage中。
		int tuplesNum = page.getNumEntries();
		for(int i=0; i<tuplesNum / 2; i++){
    
    
			BTreeEntry entry = it.next();
			page.deleteKeyAndRightChild(entry);
			newRightPage.insertEntry(entry);	//当entry被添加到newRightPage之后它的recordId被更改了,再在page中删除,是找不到这个entry的
			//所以只能先删除再插入到新的Page中
		}

		// 3、分配完entry后,选出page中最大的entry,将其从page中删除,并将该entry的左孩子指针指向page,右孩子指针指向newRightPage,
		// 获取父节点parent,将该entry添加到父节点中(实现将中间的key“挤到”父节点中)。
		BTreeEntry mid = it.next();
		page.deleteKeyAndRightChild(mid);
		mid.setLeftChild(page.getId());
		mid.setRightChild(newRightPage.getId());
		BTreeInternalPage parent = getParentWithEmptySlots(tid,dirtypages,page.getParentId(),mid.getKey());
		parent.insertEntry(mid);

		// 4、将父节点parent、page、newRightPage添加到dirtypages中,并更新它们孩子节点的父指针。
		dirtypages.put(parent.getId(), parent);
		dirtypages.put(page.getId(), page);
		dirtypages.put(newRightPage.getId(),newRightPage);
		updateParentPointers(tid, dirtypages, parent);
		updateParentPointers(tid,dirtypages,page);
		updateParentPointers(tid,dirtypages,newRightPage);

		// 5、返回field所在的页(page或newRightPage)。
		if(field.compare(Op.GREATER_THAN_OR_EQ, mid.getKey())){
    
    
			return newRightPage;
		}
		return page;


	}

3.Exercise 3

为了保持树的平衡并且不浪费不必要的空间,B+Tree 中的删除可能会导致页面重新分配元组或最终合并

在这里插入图片描述
重新分配页面
在这里插入图片描述
在这里插入图片描述
尝试从未满一半的叶页中删除元组应该会导致该页从其兄弟之一窃取元组或与其兄弟之一合并。如果页面的其中一个兄弟页面有备用元组,则元组应在两个页面之间均匀分布,并且父项的条目应相应更新。

但是,如果兄弟页面也处于最小占用状态,则这两个页面应该合并,并且条目从父页面中删除。反过来,从父项中删除一个条目可能会导致父项变得少于半满。在这种情况下,父母应该从其兄弟那里窃取条目或与兄弟合并。如果从根节点删除最后一个条目,这可能会导致递归合并甚至删除根节点。

stealFromLeafPage的实现:

	public void stealFromLeafPage(BTreeLeafPage page, BTreeLeafPage sibling,
			BTreeInternalPage parent, BTreeEntry entry, boolean isRightSibling) throws DbException {
    
    
		// some code goes here
        //
        // Move some of the tuples from the sibling to the page so
		// that the tuples are evenly distributed. Be sure to update
		// the corresponding parent entry.

		// 1、根据传入的参数isRightSibling确定是从左兄弟中“窃取”,还是从右兄弟中“窃取”。
		Iterator<Tuple> it = isRightSibling ? sibling.iterator() : sibling.reverseIterator();

		// 2、根据兄弟节点中tuple的数量,确定“窃取的数量”。
		int curTuplesNum = page.getNumTuples();
		int siblingTuplesNum = sibling.getNumTuples();
		int targetTuplesNum = (curTuplesNum + siblingTuplesNum) / 2;
		while(curTuplesNum < targetTuplesNum){
    
    
			Tuple tuple = it.next();
			sibling.deleteTuple(tuple);
			page.insertTuple(tuple);
			curTuplesNum++;
		}

		// 3、参数entry是父节点中指向page和其兄弟节点的entry,将entry的key更改为page和其兄弟节点key的中间值。
		Tuple mid = it.next();
		entry.setKey(mid.getField(keyField));
		parent.updateEntry(entry);

	}

窃取InternalPage的实现:

	public void stealFromLeftInternalPage(TransactionId tid, Map<PageId, Page> dirtypages,
			BTreeInternalPage page, BTreeInternalPage leftSibling, BTreeInternalPage parent,
			BTreeEntry parentEntry) throws DbException, TransactionAbortedException {
    
    
		// some code goes here
        // Move some of the entries from the left sibling to the page so
		// that the entries are evenly distributed. Be sure to update
		// the corresponding parent entry. Be sure to update the parent
		// pointers of all children in the entries that were moved.

		//1、根据page及其左兄弟中key的数量,确定从其做兄弟中“窃取”几个key。
		Iterator<BTreeEntry> it = leftSibling.reverseIterator();
		int curEntriesNum = page.getNumEntries();
		int siblingEntriesNum = leftSibling.getNumEntries();
		int targetEntriesNum = (curEntriesNum + siblingEntriesNum) / 2;

		//2、因为内部节点与其父节点中的key值没有重复,迁移key的时候也需要将父节点中的key移动到page中。
		BTreeEntry entry = it.next();
		BTreeEntry mid = new BTreeEntry(parentEntry.getKey(),entry.getRightChild(),page.iterator().next().getLeftChild());
		page.insertEntry(mid);
		curEntriesNum++;

		//3、将page左兄弟节点中的key平均分配。
		while(curEntriesNum < targetEntriesNum){
    
    

			leftSibling.deleteKeyAndRightChild(entry);
			page.insertEntry(entry);
			curEntriesNum++;
			entry = it.next();
		}

		//4、分配之后,将page左兄弟节点中最大的key“挤到”父节点中。
		leftSibling.deleteKeyAndRightChild(entry);
		parentEntry.setKey(entry.getKey());
		parent.updateEntry(parentEntry);

		//5、更新更新page与其左兄弟的父指针。
		dirtypages.put(page.getId(),page);
		dirtypages.put(leftSibling.getId(),leftSibling);
		dirtypages.put(parent.getId(),parent);
		updateParentPointers(tid,dirtypages,page);
	}

	public void stealFromRightInternalPage(TransactionId tid, Map<PageId, Page> dirtypages,
			BTreeInternalPage page, BTreeInternalPage rightSibling, BTreeInternalPage parent,
			BTreeEntry parentEntry) throws DbException, TransactionAbortedException {
    
    
		// some code goes here
        // Move some of the entries from the right sibling to the page so
		// that the entries are evenly distributed. Be sure to update
		// the corresponding parent entry. Be sure to update the parent
		// pointers of all children in the entries that were moved.

		Iterator<BTreeEntry> it = rightSibling.iterator();
		int curEntriesNum = page.getNumEntries();
		int siblingEntriesNum = rightSibling.getNumEntries();
		int targetEntriesNum = (curEntriesNum + siblingEntriesNum) / 2;

		BTreeEntry entry = it.next();
		BTreeEntry mid = new BTreeEntry(parentEntry.getKey(), page.reverseIterator().next().getRightChild(), entry.getLeftChild());
		page.insertEntry(mid);
		curEntriesNum++;

		while(curEntriesNum < targetEntriesNum){
    
    
			rightSibling.deleteKeyAndLeftChild(entry);
			page.insertEntry(entry);
			entry = it.next();
			curEntriesNum++;
		}


		rightSibling.deleteKeyAndLeftChild(entry);
		parentEntry.setKey(entry.getKey());
		parent.updateEntry(parentEntry);
		dirtypages.put(page.getId(),page);
		dirtypages.put(rightSibling.getId(),rightSibling);
		dirtypages.put(parent.getId(),parent);
		updateParentPointers(tid,dirtypages,page);
	}

4.Exercise 4

实现mergeLeafPages()和mergeInternalPages()

	public void mergeLeafPages(TransactionId tid, Map<PageId, Page> dirtypages,
			BTreeLeafPage leftPage, BTreeLeafPage rightPage, BTreeInternalPage parent, BTreeEntry parentEntry) 
					throws DbException, IOException, TransactionAbortedException {
    
    

		// some code goes here
        //
		// Move all the tuples from the right page to the left page, update
		// the sibling pointers, and make the right page available for reuse.
		// Delete the entry in the parent corresponding to the two pages that are merging -
		// deleteParentEntry() will be useful here

		//1、将rightPage中的所有tuple添加到leftPage中。
		Iterator<Tuple> it = rightPage.iterator();
		while(it.hasNext()){
    
    
			Tuple tuple = it.next();
			rightPage.deleteTuple(tuple);
			leftPage.insertTuple(tuple);
		}

		//2、判断rightPage是否有右兄弟,如果没有leftPage的右兄弟为空,如果有leftPage的右兄弟指向rightPage的右兄弟。
		BTreePageId rightPageRightSiblingId = rightPage.getRightSiblingId();
		if(rightPageRightSiblingId == null){
    
    
			leftPage.setRightSiblingId(null);
		}
		else{
    
    
			leftPage.setRightSiblingId(rightPageRightSiblingId);
			BTreeLeafPage rightPageRightSibling = (BTreeLeafPage) getPage(tid,dirtypages,rightPageRightSiblingId,Permissions.READ_WRITE);
			rightPageRightSibling.setLeftSiblingId(leftPage.getId());
		}

		//3、调用setEmptyPage方法将rightPage在header标记为空。
		setEmptyPage(tid, dirtypages, rightPage.pid.getPageNumber());	//将rightPage在header处置空

		//4、调用deleteParentEntry方法,从父级中删除左右孩子指针指向leftPage和rightPage的entry。
		deleteParentEntry(tid, dirtypages, leftPage, parent, parentEntry);

		//5、将leftPage与parent添加到dirtypages中
		dirtypages.put(leftPage.getId(),leftPage);
		dirtypages.put(parent.getId(),parent);
	}

	public void mergeInternalPages(TransactionId tid, Map<PageId, Page> dirtypages,
			BTreeInternalPage leftPage, BTreeInternalPage rightPage, BTreeInternalPage parent, BTreeEntry parentEntry) 
					throws DbException, IOException, TransactionAbortedException {
    
    
		
		// some code goes here
        //
        // Move all the entries from the right page to the left page, update
		// the parent pointers of the children in the entries that were moved, 
		// and make the right page available for reuse
		// Delete the entry in the parent corresponding to the two pages that are merging -
		// deleteParentEntry() will be useful here

		//1、先将父节点中的指向leftPage和rightPage的entry添加到leftPage中
		BTreeEntry mid = new BTreeEntry(parentEntry.getKey(),leftPage.reverseIterator().next().getRightChild(),rightPage.iterator().next().getLeftChild());
		leftPage.insertEntry(mid);

		//2、将rightPage中的entry添加到leftPage中
		Iterator<BTreeEntry> rightIt = rightPage.iterator();
		while (rightIt.hasNext()){
    
    
			BTreeEntry entry = rightIt.next();
			rightPage.deleteKeyAndLeftChild(entry);
			leftPage.insertEntry(entry);
		}

		//3、更新leftPage孩子节点的指针(将原本父节点指向rightPage的孩子节点的父节点更新为leftPage)
		updateParentPointers(tid,dirtypages,leftPage);

		//4、调用setEmptyPage方法将rightPage在header标记为空。
		setEmptyPage(tid,dirtypages,rightPage.getId().getPageNumber());

		//5、调用deleteParentEntry方法,从父级中删除左右孩子指针指向leftPage和rightPage的entry。
		deleteParentEntry(tid,dirtypages,leftPage,parent,parentEntry);

		//6、将leftPage与parent添加到dirtypages中
		dirtypages.put(leftPage.getId(),leftPage);
		dirtypages.put(parent.getId(),parent);
	}

总结

lab5的讲义真的很重要,讲清楚了很多逻辑,另外也提供很多方法可以直接调用,如果是自己从0开始我觉得lab5的难度可能是最大的,不过理解清了B+树的结构的话做起来就没那么难受了。还剩最后一个lab的报告没写了,加油!

猜你喜欢

转载自blog.csdn.net/weixin_44153131/article/details/128864477