This is a programming assignment to conduct experimentation on hash tables. The program must be written in Oracle Standard Edition compliant Java or ISO/ANSI standard compliant C++.
Part 1 Implement the following linear congruential pseudo-random number generator:
xn+1 = (a * xn) mod M
where a = 75 = 16807 and M = 231 - 1 = 2147483647.The seed value X0 is determined appropriately. Write a function "pseudoRandom()" that returns Xn + 1 from Xn. Use 64-bit integer type for the variables and constants (long type in Java, long long type in C++). Despite its simplicity, it is known to generate good pseudo-random numbers for many purposes, including this assignment.
The program will have to compute the mean (average) as well as the standard deviation of data values. Let X1, ..., Xn be data values. The mean (average) of the Xi is u = (E1 <= i <= n Xi) / n. The standard deviation is s = [(E1 <= i <= n (Xi - u)2) / (n - 1)]1/2. The standard deviation is a fundamental statistical quantity that measures the degree of dispersion of data values. The larger the standard deviation is, the more dispersed the data values are. A basic law of standard deviation tells us that there should be a large concentration of values in the range [s, +s].
In the following, m is the array size, n is the # of key values inserted into the table, and f(k, m) = floor(m(fractional part of kA)) = floor(m(kA mod 1)), A = (sqrt(5) 1)/2 = 0.6180339887498949.
Write a program that performs the following experiments and displays the required statistical data legibly on the screen. Use 64-bit double type for floating-point numbers.
Part 2 This part is experimentation on chaining method. Set the array size m to 1000003, which is a prime number.
a.Using the above random number generator, generate n integer key values starting from the seed value x0 = 98760053, and insert them into the hash table using the hash function:
h(k) = k mod m, where k is the generated key value
Then compute and display the following data:
Repeat this experiment for n = 800000, 1000000, 2000000, 3000000, each of the four starting from the same seed value 98760053.
b.Repeat the experiments in part (a) using the hash function:
h(k) = f(k, m)
Part 3 This part is experimentation on open addressing. A cluster is a maximal contiguous sequence of occupied positions, an empty cluster is a maximal contiguous sequence of unoccupied positions. As discussed in class, linear probing suffers from primary clustering - generation of long clusters. Quadratic probing improves linear probing by eliminating primary clustering but still generates milder secondary clustering. Double hashing improves quadratic probing by eliminating secondary clustering. Experimental results should confirm this.
Set the array size m to 220 = 1048576.
a.Using the above random number generator, generate n key values starting from the seed value x0 = 98760053, and insert them into the hash table using the linear probing sequence:
h(k, i) = (h'(k) + i) mod m,
h'(k) = f(k, m)
Then compute and display the following data:
Repeat this experiment for n = 500000, 800000, 1000000, 10485761=1048575, each of the four starting from the same seed value 98760053.
b.Repeat the experiments in part (a) using the quadratic probing sequence:
h(k, i) = (h'(k) + i(i+1)/2) mod m,
h'(k) = f(k, m)
c.Repeat the experiments in part (a) using the double-hashing probing sequence:
h(k, i) = (h1(k) + i h2(k)) mod m,
h1(k) = f(k, m),
h2(k) = k mod m if k mod m is odd, (k mod m)+1 if k mod m is even
You should conduct experiments using different seed values. Optionally you might also conduct experiments using other pseudo-random number generators, e.g., ones available in Java/C++ libraries. As far as the elementary statistical data collected in this assignment are concerned, different seed values and random number generators should yield similar experimental results for fairly large values of m and n like those used in this assignment.