Do writings by individual authors have statistical signatures? They certainly do, and while such signatures may say little about the quality of an author's art, they can say something about literary styles of an era, and can even help clarify historical controversies about authorship. Statistical studies, for example, have shown that the Illiad and the Odyssey were not written by a single individual.
For this assignment you are to create a program that analyzes samples of text -- novels perhaps, or newspaper articles -- and produces two statistics about these texts: word size frequency, and average word length.
The program consists of three classes: FileAccessor, WordPercentagesDriver and WordPercentages. For this project you will write the WordPercentages class, which must compile and work with the FileAccessor and WordPercentagesDriver classes provided. The FileAccessor class provides basic file I/O functionality. The driver class reads in the name of a file that contains the text to be analyzed, creates an instance of WordPercentages, obtains the statistics and prints them to the console.
Here is a sample run with output:
Enter a text file name to analyze:
> AliceInWonderland.txt
Analyzed text: AliceInWonderland.txt
words of length 1: 6.98%
words of length 2: 14.30%
words of length 3: 24.41%
words of length 4: 20.86%
words of length 5: 12.96%
words of length 6: 7.81%
words of length 7: 6.01%
words of length 8: 2.79%
words of length 9: 1.85%
words of length 10: 0.80%
words of length 11: 0.48%
words of length 12: 0.21%
words of length 13: 0.17%
words of length 14: 0.32%
words of length 15 or greater: 0.06%
average word length: 4.08
Notice that the output formatting is NOT produced by the WordPercentages code. It is done by the printWordSizePercentages method in the driver class.
Your job, then, is to code a solution to this problem, and provide these two statistics - word size percentage, for word lengths from 1 to 15 and greater, and average word length (thus in the example given, 6.98 percent of the words are of length 1, 14.30 percent of the words are of length 2, and so forth. The average word length is 4.08).
The source code files provided are WordPercentagesDriver.java and FileAccessor.java. (Remove the package statements before using).
/JavaCS1/src/scrabble/wordpercentages/FileAccessor.java
/JavaCS1/src/scrabble/wordpercentages/WordPercentagesDriver.java
Here are some files to use for testing:
You can obtain interesting sample texts by, for example, visiting the Gutenberg foundation website (Gutenberg.org), and downloading books from there.
PROJECT REQUIREMENTS: