You are to implement an in-memory extensible hash table class in C++. The hash table is to store integers. This class is obviously (I hope) not incredibly useful but is intended for you to learn and demonstrate your knowledge of the index structure discussed in class.
In practice, if the structure was used as an index the table would store key-address pairs where the address would be the disk page that the record with the matching key was contained. Since we are not going to connect our hash table to data we will just store the key value (an integer).
The hash table should be implemented as described in class. This means that the hash table class must have a directory that doubles in size at the appropriate times and buckets of a fixed size. Global depth and local depth must be recorded. Read the next two sections carefully for further details. Your hash table should allow duplicate values to be inserted.
You should implement two classes, an ExtensibleHash Table class and a Bucket class. Each class should be separated into .h and a .cpp files. The ExtensibleHash Table class must contain a directory structure that is an array of pointers to Buckets (not an array of Buckets). The Bucket class must contain an array of integers.
Both classes will require other attributes some of which are referenced in the descriptions of the methods and the implementation (the next section).
You must implement the following public methods:
I expect you to implement additional helper methods which should be made private. Given that this is a container class written in C++ you should also implement a (public) destructor, overloaded assignment operator and copy constructor.
You must use the names exactly as I've set out above and failure do to so will result in a significant mark penalty.
Hash Function
Keys should be assigned to buckets based on the key's bit value. The hash table should use a family of hash functions as described in class. Initially, only the right most bit of the key value should be used to determine which directory entry to look up and which bucket to insert the key in. As the directory size (and global depth) increases more bits should be used.
For example, if the key value was 273 (100101011) and the global depth was 1 the bucket pointer in index one of the directory should be followed. Conversely if the global depth was 5 then the bucket pointer in index eleven of the directory should be followed using the last five digits of the bit value of the key (01011). Note that you can find the bit value of the key by using C++ bit operations - but there is no need to do this. There is a much simpler way using powers of 2 and the modulo operation.
Structure
The hash table must have a directory which should be an array of size 2 global depth of bucket pointers. The bucket objects should contain the local depth and an array of keys. The keys in buckets do not have to be stored in order and may be searched using linear search. Note that this is an inefficient process for an inmemory hash table but recall that we are simulating a table where buckets represent disk pages and we are less concerned with in-memory search costs within a bucket.
Method Implementation Notes
Find
Use the hash function to find the directory index. Follow the pointer in the directory to a bucket and search the bucket for the key. Returns true if key is found, false otherwise.
Insertion
Use the hash function to find the directory index. Follow the pointer in the directory to a bucket. Insert the key if the bucket has room.
Otherwise, if the bucket is full, compare the local depth of the bucket to the global depth. If the global depth is greater then make a new bucket and assign a pointer to it in the appropriate index of the directory and distribute the values in the original bucket between the original and the new bucket as appropriate.
If global depth and local depth are the same, the directory size must be doubled before creating the new bucket. This also entails setting the pointers in the new half of the directory to existing buckets. Note that these processes may need to be repeated in the situation where a bucket splits, but all the values end up in one of the buckets.
For more details see the class presentation
If it is not possible to insert a value into the hash table, the insertion method should throw a runtime_error exception. This can occur where a bucket is filled with identical search key values and an attempt is made to insert another such value. I would suggest testing for this circumstance when the bucket splits.
Removal
Use the hash function to find the directory index. Follow the pointer in the directory to a bucket. Remove any incidences of the key from the bucket. You will probably need to reorganize the contents of the bucket to keep all the empty space in the "top" half of the underlying array. If the bucket is empty after the removal do not do anything else. That is, leave the bucket empty rather than deleting the bucket and possibly decreasing the size of the directory
Print the hash table. The directory contents and the buckets it points to should be printed. Each line of the output should show the directory index and pointer value, the latter displayed in hex, followed by the contents of the bucket pointed to and its local depth. Each bucket (its keys and local depth) should only be printed once which means that higher index directory entries that point to the same bucket as a lower index should not show the bucket again; see below for an example.
0: 000456D8 --> [64,200,-] (2)
1: 0004AFE --> [153,-,-] (3)
2: 000453D9 --> [66,218,-] (2)
3: 00045410 --> [67,-,-] (2)
4: 000456D8 -->
5: 9994B020 --> [13,253,109] (3)
6: 000453D0 -->
7: 00045410 -->
This example uses the same keys as the class example from the presentation. The addresses are expected to be different for different executions of the same program.
Use of STL Classes and Other C++ Features
You may not use STL container classes, such as vectors, for the directory or buckets; the directory and bucket containers should therefore be basic C++ arrays in dynamic memory. You may use smart pointers to reference these structures if you wish. Use of exception classes is allowed (and essential in at least one case).