Home >  T.A >  CS179G ( Spring 2003 ) >  Projects >  Project Phase #3

  Project Phase #3 - B+ Tree
due date: Friday 23 May 2003

 B+tree Intro   ::    Lecture 1   ::    Lecture 2   ::    phase3.tar.gz   ::    Browse Files

1 Introduction

In this assignment, you will implement a B+ tree in which leaf level pages contain entries of the form [key, rid of a data record] (Alternative 2 for data entries, in terms of the textbook.) You must implement the full search and insert algorithms. In particular, your insert routine must be capable of dealing with overflows (at any level of the tree) by splitting pages. Deletes will be handled by simply marking the corresponding leaf entry as 'deleted'. You do not need to implement merging of nodes on deletes or sibling redistribution on inserts

You will be given HFPage and SortedPage. SortedPage is derived from HFPage, and it augments the insertRecord method of HFPage by storing records on the HFPage in sorted order by a specified key value. The key value must be included as the initial part of each inserted record, to enable easy comparison of the key value of a new record with the key values of existing records on a page. The documentation available in the header files is sufficient to understand what operation each function performs.

You need to implement two page­level classes, BTIndexPage and BTLeafPage, both of which are derived from SortedPage. These page classes are used to build the B+ tree index. You will write code to create, destroy, open and close a B+ tree index. You will also write code that will open a scan on the B+ tree, allowing its caller to iterate through all of the data entries (from the leaf pages) that satisfy some search criterion.

We will also be running the MOSS plagiarism-checking software against each group's code. We will compare each group's submission against all other groups. Don't copy someone else's code!!

2 Getting Started

Please Download the files for Phase 3 from here phase3.tar.gz into your working directory.

  1. cd project
  2. tar -zxvf phase3.tar.gz
Now you will, as usual, see 3 generated directories:
  1. lib/
  2. include/
  3. src/
src/ contains the files you will be working on. If you cd src/ and then make the project, it will create an executable named hfpage . Right now, it does not work; you will need to fill in the bodies of the HFPage class methods. The methods are defined (empty) in file hfpage.C.

Sample output of a correct implementation is available in sample_output.

In src/ you can find the following files:

  • Makefile: A sample Makefile for you to compile your project. Set up any dependencies (as needed) by editing this file.

  • Partial templates for btfile.h, btindex page.h, btleaf page.h, and btreefilescan.h : You should complete all these .h files (as needed) and also implement the methods in the corresponding .C files (which you will write from scratch).

  • main.C,btree driver.C,keys: B+ tree test driver program and the ascii key data that will used by the testing program.

  • results: correct test output
You can find other useful include files bt.h, hfpage.h, sorted_page.h, index.h,test_driver.h, btree driver.h, minirel.h and new_error.h as with the previous cases in include/.

Notice: The given code does not compile as is since the following files are not implemented yet.
btfile.C
btindex_page.C
btleaf_page.C
btreefilescan.C

Therefore you should receive the following on your first compile

cs179gaa@hill $ make
g++-2.95.3 -DUNIX -Wall -g -Iproject/phase3-bplus/in
clude -I. -c main.C
g++-2.95.3 -DUNIX -Wall -g -Iproject/phase3-bplus/in
clude -I. -c btree_driver.C
make: *** No rule to make target `btfile.C', needed by `btfile.o'. Stop.

Also Make sure that you edit Makefile to reflect the MINIBASE variable as well as to perform a make depend.

3 Design Overview

You should begin by (re­)reading the chapter Tree Structured Indexing of the textbook to get an overview of the B+ tree layer. There is also information about the B+ tree layer in the HTML documentation.

3.1 A Note on Keys for this Assignment

You should note that key values are passed to functions using void * pointers (pointing to the key values). The contents of a key should be interpreted using the AttrType variable. The key can be either a string(attrString) or an integer(attrInteger), as per the definition of AttrType in minirel.h. We just implement these two kinds of keys in this assignment. If the key is a string, it has a fixed maximum length, MAX_KEY_SIZE1, defined in bt.h.

Although the specifications for some methods (e.g., the constructor of BTreeFile) suggest that keys can be of (the more general enumerated) type AttrType, you can return an error message if the keys are not of type attrString or attrInteger.

The SortedPage class, which augments the insertRecord method of HFPage by storing records on a page in sorted order according to a specified key value, assumes that the key value is included as the initial part of each record, to enable easy comparison of the key value of a new record with the key values of existing records on a page.

3.2 B+ Tree Page-Level Classes

There are four separate pages classes, of which you will implement two. HFPage is the base class (given), and from it is derived SortedPage. You will derive BTIndexPage and BTLeafPage from SortedPage. Note that, as in the HFPage assignment, you must not add any private data members to BTIndexPage or BTLeafPage.

  • HFPage: This is the base class, you can look at hfpage.h to get more details.

  • SortedPage: This class is derived from the class HFPage. Its only function is to maintain records on a HFPage in a sorted order. Only the slot directory is re­arranged. The data records remain in the same positions on the page. This exploits the fact that the rids of index entries are not important: index entries (unlike data records) are never `pointed to' directly, and are only accessed by searching the index page.

  • BTIndexPage: This class is derived from SortedPage. It inserts records of the type [key, pageNo] on the SortedPage. The records are sorted by the key.

  • BTLeafPage: This class is derived from SortedPage. It inserts records of the type [key, dataRid] on the SortedPage. dataRid is the rid of the data record. The records are sorted by the key. Further, leaf pages must be maintained in a doubly­linked list.
For further details about the individual methods in these classes, look at the header pages for the class.

Lasly, you will need to create a structure to represent the header page of the B+ tree. Despite its name, the data structure used to represent the header page need not be derived from a Page object. It can be implemented simply as a C++ struct, with a field for each piece of information that must be stored in the header page. Just remember to cast pointers to this struct as (Page *) pointers when making calls to functions such as pinPage().

3.3 Other B+ Tree Classes

We will assume here that everyone understands the concept of B+ trees, and the basic algorithms, and concentrate on explaining the design of the C++ classes that you will implement.

A BTreeFile will contain a header page and a number of BTIndexPages and BTLeafPages. The header page is used to hold information about the tree as a whole, such as the page id of the root page, the type of the search key, the length of the key field(s) (which has a fixed maximum size in this assignment), etc. When a B+ tree index is opened, you should read the header page first, and keep it pinned until the file is closed. Given the name of the B+ tree index file, how can you locate the header page? The DB class has a method

Status add_file_entry(const char* fname, PageId header_page_num);

that lets you register this information when a file fname is created. There are similar methods for deleting and reading these `file entries' ([file name, header page] pairs) as well, which can be used when the file is destroyed or opened (See here). The header page contains the page id of the root of the tree, and every other page in the tree is accessed through the root page.

The following two figures show examples of how a valid B+ Tree might look.

Figure 1 shows what a BTreeFile with only one BTLeafPage looks like; the single leaf page is also the root. Note that there is no BTIndexPage in this case.

Figure 2 shows a tree with a few BTLeafPages, and this can easily be extended to contain multiple levels of BTIndexPages as well.

3.3.1 IndexFile and IndexFileScan

A BTree is one particular type of index. There are other types, for example a Hash index. However, all index types have some basic functionality in common. We've taken this basic index functionality and created a virtual base class called IndexFile. You won't write any code for IndexFile. However, any class derived from an IndexFile should support IndexFile(), Delete(), and insert(). (IndexFile and IndexFileScan are defined in include/index.h).

Likewise, an IndexFileScan is a virtual base class that contains the basic functionality all index file scans should support.

3.3.2 BTreeFile

The main class to be implemented for this assignment is BTreeFile. BTreeFile is a derived class of the IndexFile class, which means a BTreeFile is a kind of IndexFile. However, since IndexFile is a virtual base class all of the methods associated with IndexFile must be implemented for BTreeFile. You should have copied btfile.h into your directory, as per the instructions in Section 2.

The methods to be implemented include:

BTreeFile::BTreeFile There are two constructors for BTreeFile (as defined in btfile.h): one that will only open an index, and another that will create a new index on disk, with a given type and key size. Observe that the key type is passed as a value of type AttrType. For this assignment, you only need to handle keys of type attrString and attrInteger. If there is a call with a key whose type is not one of the two, return an error.

BTreeFile::insert The BTreeFile::insert method takes two arguments: (a pointer to) a key and the rid of a data record. The data entry to be inserted---i.e., a `record' in the leaf pages of the BTreeFile ---consists of the pair [key, rid of data record].

If a page overflows (i.e., no space for the new entry), you should split the page. You may have to insert additional entries of the form [key, id of child page] into the higher level index pages as part of a split. Note that this could recursively go all the way up to the root, possibly resulting in a split of the root node of the B+ tree.

BTreeFile::Delete The BTreeFile::Delete routine simply removes the entry from the appropriate BTLeafPage. You are not required to implement redistribution or page merging when the number of entries falls below threshold.

3.3.3 BTreeFileScan

Finally, you will implement scans that return data entries from the leaf pages of the tree. You will create the scan through a member function of BTreeFile (BtreeFile::new_scan(...) as defined in btfile.h). The parameters passed to new_scan() specify the range of keys that will be scanned in the B+ tree. They are explained in detail in btfile.h.

3.4 Errors

In the Buffer Manager assignment, you learned how to use the Minibase error protocol. Reviewing it now would be a good idea. In that assignment, all the errors you returned belonged to one of the categories in new error.h, namely BUFMGR. In this assignment, you will need to use BTREE, BTLEAFPAGE, and BTINDEXPAGE.

What to Turn In

You are required to turn in your copy of all source files through the online secure site https://www.cs.ucr.edu, this includes all the files needed to make this phase and run the test program. The TAs should be able to go to your handin directory and type make and run the program.

Please remember late submissions will not be accepted. Make sure to start early!

4.1 A Note on the Due Date

This assignment is significantly more difficult than the first three; not only does it build upon your understanding of the previous two coding assignments (as you must make use of both the Page classes and the Buffer Manager), you are also also called upon to implement a rather intricate data structure. Accordingly, we have given you extra time to complete this part of the project. You have roughly three weeks, and you will need them. If you don't start working on this assignment today,

you're already behind.



Top


sitemaster: Demetris Zeinalipour