This project will be about building a hash table. You will read words from a given file into your program. Each word consists strictly of the letters A-Z and a-z. Any whitespace, punctuation, digit, etc. will separate words. An example is "this!is A&Word." This text would represent the 4 words this, is, a, and word. Each word should be converted to lowercase so that only lowercase letters remain. As you read the words, you should insert each word into a hash table. The word itself is the key value, and each entry in the hash table also needs to store a list of the line numbers where the word appears. When inserting a word that already appears in the hash table, simply add the current line number to the list for the word. You will also count the number of collisions that occur while inserting words into the hash table. Every probe to a cell that contains a word other than the one being inserted or searched should count as an additional collision, so that inserting a word may have multiple collisions. Probes to empty cells or cells already containing the word do not count as collisions.
Once you have read all of the words, you should output the total number of words read, the number of distinct words, and the total number of collisions that occurred in the hash table.
Finally, you will read words from the query file and report the line numbers where the word occurs, and the number of collisions while searching for the word in the table.
Please carefully read the following requirements:
$ cat sample1.txt
Cryptography is both the practice and study of the techniques used to communicate and/or store information or data privately and securely, without being intercepted by third parties. This can include processes such as encryption, hashing, and steganography. Until the modern era, cryptography almost exclusively referred to encryption, but now cryptography is a broad field with applications in many critical areas of our lives.
$cat q1.txt
cryptography
of
lives
hash
$ g++ -std=C++11 p3.cpp
$./a.out sample1.txt 91.txt 79 lp
The number of words found in the file was 64
The number of unique words found in the file was 52
The number of collisions was 30
cryptography appears on lines (1,5,6)
The search had 0 collisions
of appears on lines (1,7)
The search had 0 collisions
lives appears on lines [7]
The search had 1 collisions
hash appears on lines[]
The search had 2 collisions
$./a.out sample1.txt q1.txt 79 qp
The number of words found in the file was 64
The number of unique words found in the file was 52
The number of collisions was 26
cryptography appears on lines (1,5,6]
The search had 0 collisions
of appears on lines (1,7)
The search had 0 collisions
lives appears on lines [7]
The search had 1 collisions