In this project, we will work on text processing and analysis. Text analyzers could be used to identify the language in which a text has been written (language detection), to identify keywords in the text (keyword extraction) or to summarize and categorize a text. You will calculate the letter (character) frequency in a text. Letter frequency measurements can be used to identify languages as well as in cryptanalysis. You will also explore the concept of n-grams in Natural Language Processing. N-grams are sequential patterns of n-words that appear in a document. In this project, we are just considering uni-grams and bi-grams. Uni-grams are the unique words that appear in a text whereas bi-grams are patterns of two-word sequences that appear together in a document.
Write a Java application that implements a basic Text Analyzer. The Java application will analyze text stored in a text file. The user should be able to select a file to analyze and the application should produce the following text metrics:
Test your program in the file TheGoldBug1.txt, which is provided.