For this exercise you will produce two spellchecking programs. The first will check a single word input by the user and the second will allow the user to select a text file to load and spell check the entire file. All checking is case insensitive, as usual.
For the first program the program is to ask the user for a dictionary file. There are two that are supplied with the project. The words.txt file is the one we have used before that has about half a million words in it and the other Dict.txt is the Hunspell US English dictionary that is used by many Linux applications, it contains about 49,000 words. You may wonder why you would use a dictionary that is a tenth of the size as another. The Hunspell dictionary would have an advantage over the other since it would contain words that were not as obscure as some in the other dictionary. Hence if a user stumbled onto one of these more obscure words the Hunspell dictionary would flag it as misspelled but the other dictionary would not. Hence, for the average user the Hunspell file might be more accurate.
After the program processes a word it should ask the user if they would like to check another word and continue until they select to quit.
When the program checks a word it should state if the word is in the dictionary or not. If not the program should give the user suggestions on what the correct word might be. Spell checkers usually have algorithms to determine the "distance" between words, i.e. quantify the dierence, and they suggest words that are close to the one that is not in the dictionary. Here we will make lists of suggestions from common errors in spelling.
Several of these may produce the same suggestions. For example, when berr was pro- cessed it gave a suggestion of BERRY during both adding one letter and the word continua- tion. Make sure that your program has only one occurrence of any suggestion. A run of the program is below.
Input dictionary filename: Dict.txt
Input a single word to spell check: berr
BERR is not in the dictionary.
Suggestions
-----------
HERR
KERR
TERR
BARR
BURR
BEAR
BEER
BERG
BERK
BERM
BERN
BERT
BERRA
BERRY
ERR
BRR
BERRYLIKE
Check another word? Y/N: y
Input a single word to spell check: weaj
WEAJ is not in the dictionary.
Suggestions
-----------
WEAK
WEAL
WEAN
WEAR
Check another word? Y/N: y
Input a single word to spell check: were
WERE is in the dictionary.
Check another word? Y/N: y
Input a single word to spell check: sawn
SAWN is not in the dictionary.
Suggestions
-----------
DAWN
FAWN
LAWN
PAWN
YAWN
SEWN
SOWN
SWAN
SHAWN
SPAWN
AWN
SAN
SAW
Check another word? Y/N: n
For the second program you will have the user input the dictionary file and the text file to be spellchecked. You will also give the user the option of sending the output to the screen, a file, or both. If the selection is a file or both you will ask the user for a filename to store the output. The program will then go through the document to be checked and report all misspelled words (according to the given dictionary) and suggestions for all of them. Since these documents that are being checked will have punctuation you will need to remove these. For example, if there is a phrase "He went to the game." i a document you would need to remove the before He and the . after game. On the other hand if you hit the contraction I've you would not want to remove the and if you hit a hyphenated word like high-speed you would not want to remove the hyphen. So when you are removing punctuation leave the hyphens and apostrophes in place. This will create some problems with single quotes but that will be far less common than a contraction.
Input dictionary filename: Dict.txt
Input document filename: TestDoc001.txt
Send output to the screen, a file, or both? S/F/B: b
Input document filename: out.txt
REFERS is not in the dictionary.
Suggestions
-----------
REVERS
REFER
METHODS is not in the dictionary.
Suggestions
-----------
METHOD
CURRENTLY is not in the dictionary.
Suggestions
-----------
None
PROFESSIONALS is not in the dictionary.
Suggestions
-----------
PROFESSIONAL
ATTACKS is not in the dictionary.
Suggestions
-----------
ATTUCKS
ATTACK
HUMANS is not in the dictionary.
Suggestions
-----------
HUMANE
HUMAN
HIGH-SPEED is not in the dictionary.
Suggestions
-----------
None
COMPUTERS is not in the dictionary.
Suggestions
-----------
COMPUTER
If there are no misspelled words the program should display a message to indicate this.
Input dictionary filename: words.txt
Input document filename: TestDoc001.txt
Send output to the screen, a file, or both? S/F/B: s
No misspelled words.
Some things to keep in mind when writing this. You are clearly doing a lot of searching in large data sets to complete this. As you know from class the binary search is much faster than the linear search but it requires that the files being searched are sorted. Neither of the two dictionary files are sorted, they are close but during some testing I found out that they were not. The sorts we will be going over in class are fairly slow but if you would like to investigate a faster sort the text does go into the Quick Sort in Chapter 19. You will need to manipulate it to serve your purpose but it will run much faster.