In this assignment, you will write and evaluate 5 different hash functions with whose input keys are names. Your evaluation should be based on popular American first names. You can create a person’s name from any two “first names” in the 2000 census link supplied below
http://www.ssa.gov/OACT/babynames/decades/names2000s.html
To evaluate each hash function, you should determine the value each input key maps to. A counter representing this value should be incremented. For a good hash function, a histogram representing the count for each value should be uniformly distributed. Also your program should run a χ2 test on the resulting output to determine the randomness of the distribution of the hash function.
Assignment submission should include your hash functions, a histogram representing the distribution of each function’s output, and an analysis of the χ2 test results for each function. Determine which one of your custom hash functions should be used to hash American first names.
Your submission should include 4 different hashing functions, using techniques such as we demonstrated in class (folding, adding, squaring, etc…). You will have a hash table array of a prime number of slots (101, for example) If you run a couple five hundred names through a hash function, you should get a fairly uniform distribution in the table, no too many with 0 or 1, and not too many with 10 or more keys hashing to a location. You will run each hash function 10 times, and perform the χ2 test which measures if it is uniform. 8 to 9 “yes” results will verify the hashing function is good. 6-7 or fewer indicates the function is probably not uniform.
A successful program may have 2 or 3 good functions and 1 or 2 poor functions. χ2 is computed by the following formula See image.
If χ2 is in the range of r ± 2 r , we conclude that distribution is indeed random. Otherwise it may not be. If you will generate 500 numbers in the 0-100 range. Then N=500 and r=101.
For example if r=7 and N = 21, your array of counters might look something like this:
4 2 2 2 3 6 2
Since N/r = 3, our calculation would be:
[(4-3)2 + (2-3)2 + (2-3)2 + (2-3)2 + (3-3)2 + (6-3)2 + (2-3)2] / 3 =
(1 + 1 + 1 + 1 + 0 + 9 + 1 ) / 3 = 14/3 = 4.67
r ± 2 r is the range 7 ± 2.65, [ 4.35, 9.65 ]