MIT6.830 lab5 A Simple Database Implementation


foreword

This lab is to implement B+ tree index, BTreeFile consists of four different pages.

First of all, the nodes of the tree have two different types of pages: BTreeInternalPage and BTreeLeafPage ;

There is also the head node page BTreeHeaderPage for tracking which pages in the file are being used

Finally, there is a B+ tree root node BTreeRootPtrPage

full code


1. About lab5?

In this lab you will implement a B+ tree index for efficient lookups and range scans. We supply you with all of the low-level code you will need to implement the tree structure. You will implement searching, splitting pages,
redistributing tuples between pages, and merging pages.
You may find it helpful to review sections 10.3–10.7 in the textbook, which
provide detailed information about the structure of B+ trees as well as pseudocode for searches, inserts and deletes.
As described by the textbook and discussed in class, the internal nodes in B+ trees contain multiple entries, each consisting of a key value and a left and a right child pointer. Adjacent keys share a child pointer, so internal nodes
containing m keys have m+1 child pointers. Leaf nodes can either contain
data entries or pointers to data entries in other database files. For
simplicity, we will implement a B+tree in which the leaf pages actually contain the data entries. Adjacent leaf pages are linked together with right and left sibling pointers, so range scans only require one initial search through the root and internal nodes to find the first leaf page. Subsequent leaf pages are found by following right (or left) sibling pointers.

course address

Lab address

Two, lab5

1.Exercise 1

Implement the findLeafPage() method in BTreeFile, that is, find the corresponding leaf page BTreeLeafPage according to the field provided. A recursive query is solved. In special cases, if the provided value is null, it will recurse on the leftmost child every time. Here, the BTreeFile.getPage() provided by the lab is used to obtain the page. This method requires an additional parameter to track dirty Page list, the following Exercise needs to use this parameter.

It is worth noting that the iteration of BTreeInternalPage is iterated in units of BTreeEntry. BTreeEntry contains the key in the internal node, the BTreePageId of the left child, and the BTreePageId of the right child.

	private BTreeLeafPage findLeafPage(TransactionId tid, Map<PageId, Page> dirtypages, BTreePageId pid, Permissions perm,
                                       Field f)
					throws DbException, TransactionAbortedException {
    
    
		// some code goes here

		int type = pid.pgcateg();
		if(type == BTreePageId.LEAF){
    
    
			return (BTreeLeafPage) getPage(tid,dirtypages,pid,perm);
		}
		BTreeInternalPage internalPage = (BTreeInternalPage) getPage(tid,dirtypages,pid,Permissions.READ_ONLY);
		Iterator<BTreeEntry> it = internalPage.iterator();
		BTreeEntry entry = null;
		while (it.hasNext()){
    
    
			entry = it.next();
			if(f == null){
    
    
				return findLeafPage(tid,dirtypages,entry.getLeftChild(),perm,f);
			}
			if(entry.getKey().compare(Op.GREATER_THAN_OR_EQ,f)){
    
    
				return findLeafPage(tid,dirtypages, entry.getLeftChild(), perm,f);
			}
		}
		return findLeafPage(tid,dirtypages, entry.getRightChild(), perm,f);
	}

2.Exercise 2

Implement the splitLeafPage() and splitInternalPage() methods. findLeafPage() can be used to find the correct leaf page where we should insert the tuple. However, each page has a limited number of slots, and we need to be able to insert tuples even if the corresponding leaf page is full. Such an attempt to insert a tuple into a full leaf page would cause that page to split so that the tuples are evenly distributed between the two new pages.

Every time a leaf page splits, a new entry corresponding to the first tuple in the second page needs to be added to the parent node. From time to time, an internal node may also be full and unable to accept new entries. In this case the parent should split and add a new entry to its parent, which may cause a recursive split and eventually create a new root node.

insert image description here
insert image description here
First implement the split leaf node:

	public BTreeLeafPage splitLeafPage(TransactionId tid, Map<PageId, Page> dirtypages, BTreeLeafPage page, Field field)
			throws DbException, IOException, TransactionAbortedException {
    
    
		// some code goes here
        //
        // Split the leaf page by adding a new page on the right of the existing
		// page and moving half of the tuples to the new page.  Copy the middle key up
		// into the parent page, and recursively split the parent as needed to accommodate
		// the new entry.  getParentWithEmtpySlots() will be useful here.  Don't forget to update
		// the sibling pointers of all the affected leaf pages.  Return the page into which a 
		// tuple with the given key field should be inserted.

		//1、通过getEmptyPage创建一个newRightPage,
		BTreeLeafPage newRightPage = (BTreeLeafPage) getEmptyPage(tid,dirtypages,BTreePageId.LEAF);

		//2、将当前page中一半的tuple插入到newRightPage中。插入时应该先从page中删除tuple,然后再插入到newRightPage。
		// (newRightPage插入tuple后会给其赋值新的recordId,page删除tuple时根据其recordId进行查找然后删除,而page无法定位到被赋值了新recordId的tuple,则无法将其删除)。
		int tuplesNum = page.getNumTuples();
		Iterator<Tuple> it = page.reverseIterator();
		for (int i=0; i<tuplesNum / 2; i++){
    
    
			Tuple tuple = it.next();
			page.deleteTuple(tuple);
			newRightPage.insertTuple(tuple);
		}

		//3、如果当前page有右兄弟oldRightPage,将oldRightPage左兄弟的指针指向newRightPage,
		// 将newRightPage的右兄弟指针指向oldRightPage。并将oldRightPage添加到dirtypages中。
		BTreePageId oldRightPageId = page.getRightSiblingId();
		BTreeLeafPage oldRightPage = oldRightPageId == null ? null : (BTreeLeafPage) getPage(tid,dirtypages,oldRightPageId,Permissions.READ_ONLY);
		if(oldRightPage != null){
    
    
			oldRightPage.setLeftSiblingId(newRightPage.getId());
			newRightPage.setRightSiblingId(oldRightPageId);
			dirtypages.put(oldRightPageId,oldRightPage);
		}

		//4、将page的右兄弟指针指向newRightPage,newRightPage的左兄弟指针指向page。将page、newRightPage添加到dirtypages中。
		page.setRightSiblingId(newRightPage.getId());
		newRightPage.setLeftSiblingId(page.getId());
		dirtypages.put(page.getId(),page);
		dirtypages.put(newRightPage.getId(),newRightPage);

		//5、获取指向该page的内部节点,在其中添加一个指向page和newRightPage的新entry。将父entry所在的page添加到dirtypages中。
		BTreeInternalPage parent = getParentWithEmptySlots(tid,dirtypages,page.getParentId(),field);
		Field mid = newRightPage.iterator().next().getField(keyField);
		BTreeEntry entry = new BTreeEntry(mid,page.getId(),newRightPage.getId());
		parent.insertEntry(entry);
		dirtypages.put(parent.getId(),parent);

		//6、更新page、newRightPage的父指针。
		updateParentPointers(tid, dirtypages, parent);

		//7、返回field所在的页(page或newRightPage)
		if(field.compare(Op.GREATER_THAN_OR_EQ,mid)){
    
    
			return newRightPage;
		}
		return page;
		
	}

Re-implement split internal nodes

	public BTreeInternalPage splitInternalPage(TransactionId tid, Map<PageId, Page> dirtypages,
			BTreeInternalPage page, Field field) 
					throws DbException, IOException, TransactionAbortedException {
    
    
		// some code goes here
        //
        // Split the internal page by adding a new page on the right of the existing
		// page and moving half of the entries to the new page.  Push the middle key up
		// into the parent page, and recursively split the parent as needed to accommodate
		// the new entry.  getParentWithEmtpySlots() will be useful here.  Don't forget to update
		// the parent pointers of all the children moving to the new page.  updateParentPointers()
		// will be useful here.  Return the page into which an entry with the given key field
		// should be inserted.

		// 1、通过getEmptyPage创建一个newRightPage。
		BTreeInternalPage newRightPage = (BTreeInternalPage) getEmptyPage(tid,dirtypages,BTreePageId.INTERNAL);
		Iterator<BTreeEntry> it = page.reverseIterator();

		// 2、将当前page中一半的entry插入到newRightPage中。同样,先从page中删除entry,再将其插入到newRightPage中。
		int tuplesNum = page.getNumEntries();
		for(int i=0; i<tuplesNum / 2; i++){
    
    
			BTreeEntry entry = it.next();
			page.deleteKeyAndRightChild(entry);
			newRightPage.insertEntry(entry);	//当entry被添加到newRightPage之后它的recordId被更改了,再在page中删除,是找不到这个entry的
			//所以只能先删除再插入到新的Page中
		}

		// 3、分配完entry后,选出page中最大的entry,将其从page中删除,并将该entry的左孩子指针指向page,右孩子指针指向newRightPage,
		// 获取父节点parent,将该entry添加到父节点中(实现将中间的key“挤到”父节点中)。
		BTreeEntry mid = it.next();
		page.deleteKeyAndRightChild(mid);
		mid.setLeftChild(page.getId());
		mid.setRightChild(newRightPage.getId());
		BTreeInternalPage parent = getParentWithEmptySlots(tid,dirtypages,page.getParentId(),mid.getKey());
		parent.insertEntry(mid);

		// 4、将父节点parent、page、newRightPage添加到dirtypages中,并更新它们孩子节点的父指针。
		dirtypages.put(parent.getId(), parent);
		dirtypages.put(page.getId(), page);
		dirtypages.put(newRightPage.getId(),newRightPage);
		updateParentPointers(tid, dirtypages, parent);
		updateParentPointers(tid,dirtypages,page);
		updateParentPointers(tid,dirtypages,newRightPage);

		// 5、返回field所在的页(page或newRightPage)。
		if(field.compare(Op.GREATER_THAN_OR_EQ, mid.getKey())){
    
    
			return newRightPage;
		}
		return page;


	}

3.Exercise 3

In order to keep the tree balanced and not waste unnecessary space, deletions in B+Tree may cause page reallocation tuples or eventual merge

insert image description here
reallocate pages
insert image description here
insert image description here
Attempting to remove a tuple from a leaf page that is not half full should cause that page to either steal the tuple from one of its siblings or merge with one of its siblings. If one of the page's sibling pages has alternate tuples, the tuples should be evenly distributed between the two pages, and the parent's entries should be updated accordingly.

However, if the sibling page is also at minimum occupancy, the two pages should be merged and the entry removed from the parent page. Conversely, removing an entry from a parent may cause the parent to become less than half full. In this case the parent should either steal the entry from its sibling or merge with the sibling. This can lead to recursive merges and even deletion of the root node if the last entry is removed from the root node.

Implementation of stealFromLeafPage:

	public void stealFromLeafPage(BTreeLeafPage page, BTreeLeafPage sibling,
			BTreeInternalPage parent, BTreeEntry entry, boolean isRightSibling) throws DbException {
    
    
		// some code goes here
        //
        // Move some of the tuples from the sibling to the page so
		// that the tuples are evenly distributed. Be sure to update
		// the corresponding parent entry.

		// 1、根据传入的参数isRightSibling确定是从左兄弟中“窃取”,还是从右兄弟中“窃取”。
		Iterator<Tuple> it = isRightSibling ? sibling.iterator() : sibling.reverseIterator();

		// 2、根据兄弟节点中tuple的数量,确定“窃取的数量”。
		int curTuplesNum = page.getNumTuples();
		int siblingTuplesNum = sibling.getNumTuples();
		int targetTuplesNum = (curTuplesNum + siblingTuplesNum) / 2;
		while(curTuplesNum < targetTuplesNum){
    
    
			Tuple tuple = it.next();
			sibling.deleteTuple(tuple);
			page.insertTuple(tuple);
			curTuplesNum++;
		}

		// 3、参数entry是父节点中指向page和其兄弟节点的entry,将entry的key更改为page和其兄弟节点key的中间值。
		Tuple mid = it.next();
		entry.setKey(mid.getField(keyField));
		parent.updateEntry(entry);

	}

Steal the implementation of InternalPage:

	public void stealFromLeftInternalPage(TransactionId tid, Map<PageId, Page> dirtypages,
			BTreeInternalPage page, BTreeInternalPage leftSibling, BTreeInternalPage parent,
			BTreeEntry parentEntry) throws DbException, TransactionAbortedException {
    
    
		// some code goes here
        // Move some of the entries from the left sibling to the page so
		// that the entries are evenly distributed. Be sure to update
		// the corresponding parent entry. Be sure to update the parent
		// pointers of all children in the entries that were moved.

		//1、根据page及其左兄弟中key的数量,确定从其做兄弟中“窃取”几个key。
		Iterator<BTreeEntry> it = leftSibling.reverseIterator();
		int curEntriesNum = page.getNumEntries();
		int siblingEntriesNum = leftSibling.getNumEntries();
		int targetEntriesNum = (curEntriesNum + siblingEntriesNum) / 2;

		//2、因为内部节点与其父节点中的key值没有重复,迁移key的时候也需要将父节点中的key移动到page中。
		BTreeEntry entry = it.next();
		BTreeEntry mid = new BTreeEntry(parentEntry.getKey(),entry.getRightChild(),page.iterator().next().getLeftChild());
		page.insertEntry(mid);
		curEntriesNum++;

		//3、将page左兄弟节点中的key平均分配。
		while(curEntriesNum < targetEntriesNum){
    
    

			leftSibling.deleteKeyAndRightChild(entry);
			page.insertEntry(entry);
			curEntriesNum++;
			entry = it.next();
		}

		//4、分配之后,将page左兄弟节点中最大的key“挤到”父节点中。
		leftSibling.deleteKeyAndRightChild(entry);
		parentEntry.setKey(entry.getKey());
		parent.updateEntry(parentEntry);

		//5、更新更新page与其左兄弟的父指针。
		dirtypages.put(page.getId(),page);
		dirtypages.put(leftSibling.getId(),leftSibling);
		dirtypages.put(parent.getId(),parent);
		updateParentPointers(tid,dirtypages,page);
	}

	public void stealFromRightInternalPage(TransactionId tid, Map<PageId, Page> dirtypages,
			BTreeInternalPage page, BTreeInternalPage rightSibling, BTreeInternalPage parent,
			BTreeEntry parentEntry) throws DbException, TransactionAbortedException {
    
    
		// some code goes here
        // Move some of the entries from the right sibling to the page so
		// that the entries are evenly distributed. Be sure to update
		// the corresponding parent entry. Be sure to update the parent
		// pointers of all children in the entries that were moved.

		Iterator<BTreeEntry> it = rightSibling.iterator();
		int curEntriesNum = page.getNumEntries();
		int siblingEntriesNum = rightSibling.getNumEntries();
		int targetEntriesNum = (curEntriesNum + siblingEntriesNum) / 2;

		BTreeEntry entry = it.next();
		BTreeEntry mid = new BTreeEntry(parentEntry.getKey(), page.reverseIterator().next().getRightChild(), entry.getLeftChild());
		page.insertEntry(mid);
		curEntriesNum++;

		while(curEntriesNum < targetEntriesNum){
    
    
			rightSibling.deleteKeyAndLeftChild(entry);
			page.insertEntry(entry);
			entry = it.next();
			curEntriesNum++;
		}


		rightSibling.deleteKeyAndLeftChild(entry);
		parentEntry.setKey(entry.getKey());
		parent.updateEntry(parentEntry);
		dirtypages.put(page.getId(),page);
		dirtypages.put(rightSibling.getId(),rightSibling);
		dirtypages.put(parent.getId(),parent);
		updateParentPointers(tid,dirtypages,page);
	}

4.Exercise 4

Implement mergeLeafPages() and mergeInternalPages()

	public void mergeLeafPages(TransactionId tid, Map<PageId, Page> dirtypages,
			BTreeLeafPage leftPage, BTreeLeafPage rightPage, BTreeInternalPage parent, BTreeEntry parentEntry) 
					throws DbException, IOException, TransactionAbortedException {
    
    

		// some code goes here
        //
		// Move all the tuples from the right page to the left page, update
		// the sibling pointers, and make the right page available for reuse.
		// Delete the entry in the parent corresponding to the two pages that are merging -
		// deleteParentEntry() will be useful here

		//1、将rightPage中的所有tuple添加到leftPage中。
		Iterator<Tuple> it = rightPage.iterator();
		while(it.hasNext()){
    
    
			Tuple tuple = it.next();
			rightPage.deleteTuple(tuple);
			leftPage.insertTuple(tuple);
		}

		//2、判断rightPage是否有右兄弟,如果没有leftPage的右兄弟为空,如果有leftPage的右兄弟指向rightPage的右兄弟。
		BTreePageId rightPageRightSiblingId = rightPage.getRightSiblingId();
		if(rightPageRightSiblingId == null){
    
    
			leftPage.setRightSiblingId(null);
		}
		else{
    
    
			leftPage.setRightSiblingId(rightPageRightSiblingId);
			BTreeLeafPage rightPageRightSibling = (BTreeLeafPage) getPage(tid,dirtypages,rightPageRightSiblingId,Permissions.READ_WRITE);
			rightPageRightSibling.setLeftSiblingId(leftPage.getId());
		}

		//3、调用setEmptyPage方法将rightPage在header标记为空。
		setEmptyPage(tid, dirtypages, rightPage.pid.getPageNumber());	//将rightPage在header处置空

		//4、调用deleteParentEntry方法,从父级中删除左右孩子指针指向leftPage和rightPage的entry。
		deleteParentEntry(tid, dirtypages, leftPage, parent, parentEntry);

		//5、将leftPage与parent添加到dirtypages中
		dirtypages.put(leftPage.getId(),leftPage);
		dirtypages.put(parent.getId(),parent);
	}

	public void mergeInternalPages(TransactionId tid, Map<PageId, Page> dirtypages,
			BTreeInternalPage leftPage, BTreeInternalPage rightPage, BTreeInternalPage parent, BTreeEntry parentEntry) 
					throws DbException, IOException, TransactionAbortedException {
    
    
		
		// some code goes here
        //
        // Move all the entries from the right page to the left page, update
		// the parent pointers of the children in the entries that were moved, 
		// and make the right page available for reuse
		// Delete the entry in the parent corresponding to the two pages that are merging -
		// deleteParentEntry() will be useful here

		//1、先将父节点中的指向leftPage和rightPage的entry添加到leftPage中
		BTreeEntry mid = new BTreeEntry(parentEntry.getKey(),leftPage.reverseIterator().next().getRightChild(),rightPage.iterator().next().getLeftChild());
		leftPage.insertEntry(mid);

		//2、将rightPage中的entry添加到leftPage中
		Iterator<BTreeEntry> rightIt = rightPage.iterator();
		while (rightIt.hasNext()){
    
    
			BTreeEntry entry = rightIt.next();
			rightPage.deleteKeyAndLeftChild(entry);
			leftPage.insertEntry(entry);
		}

		//3、更新leftPage孩子节点的指针(将原本父节点指向rightPage的孩子节点的父节点更新为leftPage)
		updateParentPointers(tid,dirtypages,leftPage);

		//4、调用setEmptyPage方法将rightPage在header标记为空。
		setEmptyPage(tid,dirtypages,rightPage.getId().getPageNumber());

		//5、调用deleteParentEntry方法,从父级中删除左右孩子指针指向leftPage和rightPage的entry。
		deleteParentEntry(tid,dirtypages,leftPage,parent,parentEntry);

		//6、将leftPage与parent添加到dirtypages中
		dirtypages.put(leftPage.getId(),leftPage);
		dirtypages.put(parent.getId(),parent);
	}

Summarize

The lecture notes of lab5 are really important. They explain a lot of logic clearly, and also provide many methods that can be called directly. If you start from 0, I think lab5 may be the most difficult, but if you understand the structure of the B+ tree, do it. It's not that uncomfortable. There is still the last lab report left, come on!

Guess you like

Origin blog.csdn.net/weixin_44153131/article/details/128864477