it2051229 Words Histogram

The program accepts one command line argument, which is the filename of the text to be read; for example: java histogram < filename>

where < filename> is the name of the input text file. The program then reads this file and calculates the histogram of each letter and displays the histogram in the following order:

the total number of characters counted (only include A-Z);
the letters in upper case;
the occurrence of the letters and the percentage; and
a series of asterisks showing the percentage of each occurrence. The program outputs a vertical bar for every 10 asterisks.

An example of the output is given here:

Total characters 5,542

A [ 300] 5.41% **********|*****
B [ 20] 0.36% *
C [ 50] 0.90% **
D [ 40] 0.72% **
E [1,000] 18.04% **********|**********|**********|**********|**********|
F [ 142] 2.56% *******
G [ 60] 1.08% ***
H [ 100] 1.80% *****
I [ 600] 10.83% **********|**********|**********|
J [ 400] 7.22% **********|**********|
K [ 234] 4.22% **********|*
L [ 200] 3.61% **********|
M [ 500] 9.02% **********|**********|*****
N [ 322] 5.81% **********|******
O [ 555] 10.01% **********|**********|*******
P [ 23] 0.42% *
Q [ 10] 0.18%
R [ 3] 0.05%
S [ 4] 0.07%
T [ 322] 5.81% **********|******
U [ 32] 0.58% *
V [ 555] 10.01% **********|**********|*******
X [ 3] 0.05%
Y [ 66] 1.19% ***
Z [ 1 ] 0.02%

The letter with the largest occurrence should have 50 asterisks and the number of asterisks for the rest of the letters should be proportionally scaled to the letter with the highest count. The next section explains how to calculate the number of asterisks to be displayed.

The letter E has the highest occurrence with a count of 1,000 and it makes up 18.04% of the characters, so it has 50 asterisks. The letter A makes up 5.41% of the characters, and has 15 asterisks because 5.41/18.04*50 = 15. The program should round the number of asterisks down to the nearest integer value.

The program should also conserve the output space by carefully formatting the count for each letter and the percentage of each letter so these outputs only use a minimum amount of space but still appear in neat columns. For example, if the largest number of characters is in the hundreds, then the output should look like this:

A [ 30] 5.41% **********|*****
B [ 2] 0.36% *
C [ 5] 0.90% **
D [ 4] 0.72% **
E [100] 18.04% **********|**********|**********|**********|**********|

You may like to start developing the program using small text files, and then use larger text files to test for the robustness of the program. You may download text files from the Gutenberg project: www.gutenberg.org You should also write JUnit code to test the program.

Academic Honesty!

It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.

Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.