This assignment will require you to write a number of functions (in C) which conform to a very precise specification.
You program (named "index") will analyze a set of input (from stdin) and determine how many times certain words occur. More precisely, your program will:
1. open a file named "keywords.txt" in the current directory, and read, convert to upper-case, and save a list of keywords (one word per line);
2. read input from stdin, ignoring anything other than whitespace and letters (a-z and A-Z);
3. break the input into words, where a "word" is any string of letters (a-z or A-Z) separated from following words by any amount of whitespace (or followed by EOF);
4. for each word that is read, convert it to upper-case, and if that word is found in keywords.txt, add 1 to a running tally of occurences for that word; and
5. at the end of all input, print the list of words (one per line) from key- words.txt with the number of occurrances next to each word.
The indicated functions must each appear in a .c file named as specified, and must be prototyped exactly as indicated.
readKeyWords.c
int readKeyWords(char *filename, char keyWords[100][32]);
This function opens the given filename, reads it one line at a time (each line containing a single word), converts the word to uppercase, and saves each word in the next location in keyWords array. The first word is stored in keyWords[0], the next in keyWords[1] and so on. The input file should be closed once the end of file is reached.
The function returns the number of words read. If the file cannot be opened, the function returns -1.
tally.c
void tally(char *word,char keyWords[100][32],int count[100],int numWords);
This function takes in a single word (already converted to uppercase), and com- pares it to the words in the keyWords array. If it matches the nth element in that array, then count[n] is incremented. Only the first numWords elements in the keyWords array should be considered.
If word is not found, no action is taken.
getWord.c
char *getWord(char *word);
This function returns the next "word" from stdin. A word is a collection of consecutive non-whitespace characters, which are followed in the input stream by whitespace or EOF. Note that words never contain non-letters (as recognized by the isalpha() function). When reading from stdin, any non-letter non- whitespace character should be ignored (do not add it to the word, do not count it as a word seperater). Lowercase letters must be converted to uppercase. See the Sample Execution section for more details.
If a word is successfully read from stdin, it is copied to the "word" argument, which is also returned as the function value. If EOF is reached, this function should return NULL.
myGetChar.c
int myGetChar();
This function operates like getchar(), returning the next character from stdin (or EOF at end-of-file), but with the following modifications:
displayResults.c
void displayResults(char keyWords[100][32],int tally[100],int numWords);
This function prints numWords lines of output. The nth line should be key- Words[n] followed by a colon (':') followed by tally[n].
The most confusing part of this assignment is the getWord function (which uses myGetChar to simplify its work). Suppose the input stream is:
He-llo. Th..Is i
s FUN!
Each successive call to myGetChar would return (I'm separating these by com- mas for clarity):
H,E,L,L,O, ,T,H,I,S, ,I, ,S, ,F,U,N, ,EOF
Notice that the - . and ! are ignored. A space separates the end of HELLO from the start of THIS, so those are separate words. The I on the first line is followed by a newline, which is whitespace, so I is the third word. The s in the beginning of the second line is followed by a space, so S is the fourth word. FUN is the final word, being followed by a newline (whitespace) and EOF.
The corresponding calls to getWord would thus return the following words (separated by commas):
HELLO,THIS,I,S,FUN,NULL (the value NULL, not the word "NULL")