it2051229 CS251 Hashing and Heaps

Overview

In this project you will code slightly different but practical implementations of hashing and heaps:

Cuckoo Hashing and sorting using Binary Heaps. Both implementations must be coded in the same project according to the specifications below. Read the handout thoroughly since details for each implementation are given. As on previous projects, it must be coded in C++.

1. Cuckoo Hashing

Cuckoo Hashing is a variation of the regular hashing concept. It was presented by R. Pagh and F. Rodler on the Proceedings of European Symposium on Algorithms in 2001 (original paper here). We are interested in this hashing mechanism since it is straightforward to implement and it has worst case constant lookup time. The name is received from the European cuckoo who throws out the eggs from other bird's nests in order to make room for its own eggs!

This method uses two hash tables T1 and T2 each of length r, and two hash functions h1(x) and h2(x). Every key x is stored in cell h1(x) of T1 or h2(x) of T2 but never in both. The lookup function is defined as follows:

bool functionlookup(x)
return T1[h1(x)] == x or T2[h2(x)] == x
end

For insertion, as described in the original paper, it turns out that the "cuckoo approach" (kicking other keys away until every key has its own nest) works very well. Specifically, if x is to be inserted we first see if cell h1 (x) of T1 is occupied. If not occupied, we set T1[h1(x)] = x and we are done. Otherwise, we set y = T1 [h1 (x)] and then set T1 [h1 (x)] = x anyway, thus making y "nestless". The key y is then inserted in T2. If its nest in T2 is occupied, we proceed in a similar way and so forth iteratively. Check the example in Figure 1a. It may happen that this process loops (see Figure 1b). Therefore the number of iterations is bounded by a MaxLoop value. If this number of iterations is reached, everything is rehashed (which is described later) and we try to accommodate the nestless key. see image.

The insert procedure is defined as follows (notation x y means the values of variables x and y are swapped):

procedureinsert(x)
iflookup(x) then return
loopMaxLoop times
if T1[h1(x)] = empty then{T1[h1(x)] = x; return}
x <-> T1[h1(x)]
if T2[h2(x)] = empty then{T2[h2(x)] = x; return}
x <-> T2[h2(x)]
end loop
rehash(); insert(x)
end

Removing a key is of course simple to perform. If lookup(x) is true then remove x from T1 [h1(x)] or T2[h2(x)]. Remember an element in T1 cannot be in T2 and vice versa. For rehashing, we will perform two operations: 1) double the size of both tables, and 2) add the elements of T1 and T2 into the doubledsize new tables. Keep in mind that you need to add such values in the same way as inserting any x values (i.e., use the insert operation).

In http://www.lkozma.net/cuckoo_hashing_visualization, you can visualize how Cuckoo Hashing works by inserting the values of your choosing. Here is a suggestion: delete TEST1 and TEST2 and insert numbers in increasing order, one by one. Observe how the keys are swapped between tables. Keep doing this until you fall in a loop. You will implement the Cuckoo Hashing mechanism as described before. The hashing functions to be implemented are:

h1(x) = x mod r and h2(x) x mod (r 1)

The input for the program is:

program < inputfile.txt > outputfile.txt

The format of the inputfile.txthas the following structure:

1
r m
a b
a b
a b
a b
...

Where 1 indicates that the Cuckoo Hashing part will be executed, ris a positive integer that indicates the length of the tables and mis a positive integer that indicates the maximum number of loops. For each tuple (a b), ais the instruction to perform and bis the number for the required instruction. The possible values for a are "i" for insert, "r" for remove and "l" for lookup. The values of b are positive integer numbers.

The program has to write the following outputs:

Every time an element x is insertedin Ti [hi(x)] (during insertion, rehashing, and element swapping) then write the element and where it is in which table i.e., if x = 2 and it was inserted in table T1 at position h1 (2) = 11 then the output is "2 in T1[11]". If x was inserted into an empty cell then add an exclamation mark "!" at the end of the output line. If x is already on either T1 or T2 then the output is "2 already in T1[11]".
When x is removedusing the remove function then write "2 out T1[11]" i.e., if x = 2 and is was removed from table T1 and position h1 (2) = 11. If x is not removed because it isn't on neither T1 or T2 then write "no out 2".
When x is looked up and it was found then write where i.e., if x = 2 and it is in T1 at position h1 (2) = 11 then the output is "2 at T1[11]". If x is not found then write "no 2". i.e., if x = 2 is neither in T1 nor T2 then write "no 2".
If the MaxLoop value is reached then write "maxloop reached".

The following is an example of the output given a set of tuples with instructions and values: see image.

2. Binary Heaps and Sorting

In this section you need to implement a Binary Heap. Then, using this heap we are going to implement a sorting algorithm. First you are going to implement a binary heap. One should be able to use it as either a maxheap or a minheap it should be configurable during heap initialization. Once the heap is initialized, it continues to behave the same way until its deletion. Your implementation should use a binary tree based approach, not an array based approach. Refer to your lecture slides to understand how heaps can be implemented. The elements in the heap should follow this

structure:
struct element {
int key;
int value;
};

When you execute the insert() or extract() functions on the heap, the comparison should be on 'key', not 'value'. You can also use a class to define the above structure, or use your own names.

Your heap should implement the following functionalities:

1. Peek:basically findmax or findmin. If it is configured as a maxheap, find the element with maximum 'key'. [If it is configured as a minheap, find the element with minimum 'key'.]

2. Insert:add a new element to the heap.

3. Extract/delete:return the element with minimum 'key' if the heap is a minheap [ or maximum 'key' if the heap is a maxheap] after removing it from the heap.

4. Create:create an empty heap.

5. Heapify:create the heap out of a given array of elements. Assume here that an empty heap is already initialized (as maxheap or minheap).

6. Size:return the number of items in the heap.

For the sorting implementation, given a list of < key, value > pairs you are going to sort the list based on the keys. This should be straightforward. You just need to call heapify() and then extract one by one the elements on the heap. The sort order will be specified (ascending or descending key values), and depending on that you should use maxheap or minheap.

The input for the program is:

program < inputfile.txt > outputfile.txt

The first line of the inputfile.txt could be the numbers 2 or 3 declaring which part of the project is being tested (2 for the heap part and 3 for sorting using heaps).

For heap part the following lines of the input file are as follows:

An integer n that represents the number of lines to follow
n lines of the following format:
- The character 'c' (for create) followed by an integer. If the integer value is '1', create an empty maxheap. If the integer value is '2' create an empty minheap.
- The character 'i' (for insert) followed by 2 integers. Insert these two values into the heap as pair.

For the sorting part the following lines of the input file are as follows:

An integer that represents the order of the sorting. The value 1 represents that you have to sort in increasing order. The value 2 represents decreasing order.
An integer n representing the number of elements that you have to sort.
n lines each containing two integers. First integer is the key and the second one is the value.
The character 'q' represents that now you need to sort the elements and print the sorted list. While printing the sorted list you will print each pair at each line. The key and value will be separated by one space character.

Sample test cases: see image.

Academic Honesty!

It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.

Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.