You will develop a set of routines for creating and looking up a dictionary. Your code will be graded on correctness, efficiency, and documentation.
You will be provided with a header file main.h, a main function file main.c, a file assignment-5.c defining the API (application program interface) for your code, and three input files: small-dict.txt,medium-dict.txt,large-dict.txt. File assignment-5.c contains stub functions with short descriptions.
The main.c file is used for testing your design.You can make changes on test cases to make sure that your program works correctly for each possible input. Your program will be tested with a different main.c file, which will include different test cases.
Your will deliver a complete program implementing the functionalities described below. All of your functionality is in the form of functions that take parameters and return values, i.e., there is no terminal or file I/O.
Given an input file containing a long string with words on every new line, create a function that puts these words in an array of strings and sorts the array. The function to read the string and puts the words in an array is already provided to you in main.c.
The sort order should be dictionary order, i.e., the ordering given by strcmp. The array of strings will be a global 2D array, whose size we select - you do not have to worry about the array being too short or the words being too long.
Example input: “datenapplenelderberryncantaloupenblackberryn”
Write a lookup function that checks to see if a word is in the dictionary. Implement via binary search to find if a string is present in the dictionary. If it is present, return 1, otherwise 0.
Write a lookup function that determines where in the dictionary a given word would lie.
For example if the dictionary consists of
apple blackberry cantaloupe date elderberry
then a range search for batuan should return (0,1), for cantaloupe should return (1,3), for apple should return (-1,1), for elderberry should return (3,5), for ackee should return (INT_MIN, 0), for fig should return (4,INT_MAX).
Since a function can return only one int, and we have not covered structures, you will return the range by setting two caller integer variables which you are passed pointers to.
Given a string prefix, return strings that starts with that prefix.
Your function will set these strings in a global array of 10 strings, and return the number of strings set. The array should be in sorted order.
For example, for the dictionary above, if prefix is “a”, you should set the first string in the global array to “apple” and return 1.
If there was more than one match, you should return all of them up to 10 matches. E.g., if the dictionary included apricot, and prefix is “ap” return 2, and set the global array’s first and second elements to “apple” and “apricot”.
Given a string w, return 10 words in the dictionary that are closest to w and the smallest edit distance found. The distance between strings, D(s,t) is defined below:
Here |s| is the length of string s. Furthermore, by c1.s we refer to the string consisting of the character c1 followed by the string s. (So the . here is not the . used in matching expressions.)
Here d(x,y) is the distance on a QWERTY keyboard from the keys corresponding to c1 and c2 (and its 0 if x = y), which will be computed by a function we provide in main.c.
You are required to write a program that implements the functions described above: Sort dictionary, Plain lookup, Range search, Prefix lookup and Approoximate lookup.
Inputs for your functions will come from tests cases written in main.c file. You can assume that input format will always be correct. You do not need to check the validity of input in your functions. (For example, you do not need to worry about words that have blanks or characters outsize of the range ‘a’-’z’, or dictionary size overflowing the global array of strings).
However, your code should handle other legal input cases e.g., empty string, sorted dictionary, words that appear before any word in the dictionary, more than 10 matches, etc.
For example if you are given the word “null” and it is in the dictionary, your function should return 0. If “null” is not in the dictionary and its closest match is “nullify” which has edit distance of 3 compared to “null”, your function should return 3.
For the sorting function use the standard C library functions qsort and strcmp.
Use the provided array of strings to store the dictionary, and the provided global integer size variable for keeping track of the number of entries in the dictionary.
Use the provided array of strings to store results, and the provided global integer size variable for keeping track of the number of entries in the results array.
Your code will be tested for speed among other things (see grading). Thus, it’s important that your code executes in the most efficient way. You should consider using a 2D array to cache calls to the string distance function. Similarly, you might find that you can speed up the prefix matching by keeping some additional data in global variables (that you define).