A concordance is an alphabetical list of the principal words used in a book or body of work, with their immediate contexts. Because of the time and difficulty and expense involved in creating a concordance in the pre-computer era, only works of special importance, such as the Vedas, Bible, Qur'an or the works of Shakespeare, had concordances prepared for them. (From Wikipedia)
In this assignment, the program you will design and implement an application that performs the basic tasks of a concordance.
More specifically, the application will allow us to read in a book which is stored in a text file and then to search for a word and the sentences in which the word appears, if any.
To define a sentence, we need to define a paragraph. A paragraph is a sequence of characters from the text that
The newline character will not be considered to be part of the paragraph.
A sentence will be defined as a sequence of characters from a paragraph (in the sense defined above) that
In addition, the period character, the question mark, and the exclamation mark will not be considered to be part of the sentence.
Note that this is an operational definition. It approximates, but is not completely in agreement with, the usual concept of a sentence. It even has „strange‟ consequences. For example, the sentence (from Pride and Prejudice)
Mr. Bennet replied that he had not.
will be extracted as two sentences according to our definition.
You must use this definition so that everyone will extract the same list of sentences from a given text file.
A word is defined to be a sequence of letters that appears in a sentence (in the sense defined above), separated by non-letter characters.
Write a Java program called Concordance.java, which provides a menu with the following options:
The menu must accept the letters „R‟, „S‟, „L‟, „W‟ or „Q‟ and be case insensitive.
When the user selects option „R‟ from the menu, the program should prompt the user for the name of a text file to open.
If there is any error in reading the file, the program must warn the user and return to the menu. Otherwise, the program reads the text file and constructs all the required data structures about the words and sentences in the input text file.
All the words are to be stored in upper case. The sentences are stored in uppercase and lower case as they appear in the text.
When the user selects option „M‟, the program will ask for a non-empty string to be entered from the key board, and display on the screen all the words concordance that begins with that string.
For example, if the text file is the one listed in the Appendix, and the string entered by the user is „Th‟, the following output will be displayed
Looking for words starting with: Th
Number of words matched: 4
THAT, THE, THERE, THIS
The matching is not case-sensitive (that is, we get the same output when we enter “th” or “TH” or “Th”, etc.). The words are listed in alphabetical order and separated by commas.
When the user selects option „S‟, the program will ask for a word to be entered from the key board, and display on the screen the list of sentences in which the word appears.
As an example, if the text file is the one listed in the Appendix, and the search word is „truth‟, the following output will be displayed
Search word: truth
Number of sentences: 2
(1)[4]: It is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife
(2)[5]: However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters
It is required that
When the user selects option „W‟, the program will ask for the search word, and display the search result on the screen as described for option „S‟. It will then the search result to a text file. If, for example, the search word is „truth‟, the text file will have the name
ContextOf-TRUTH.txt
Note that the search word appears in uppercase in the filename.
You can use only classes ArrayList and LinkedList. Other classes, such as TreeMap for example, are not allowed.
The efficiency with which the program performs various operations is a major concern. The files to be read in can be quite long. You will need to carefully consider which algorithms and data structures you use.
You can use any text file for input to this program. A good source of long text files is at the Gutenberg project (www.gutenberg.com) which is a project aimed to put into electronic form older literary works that are in the public domain. The extract from Jane Austen‟s book Pride and Prejudice used as the sample text file above was sourced from this web site. You should choose files of lengths suitable for providing good information about the efficiency of your program.
The program will be marked on correctness, style and efficiency.
Style will be worth 10 marks. This will include commenting, layout, choice of identifier names, choice of methods and/or classes, etc. Correctness will be allocated 25 marks. 35 marks will be allocated for efficiency in the program. Programs that are highly inefficient may receive no marks here.
You are required to submit electronically a written report describing your solution and implementation.
The report should describe:
The report does not have to be long but it should discuss all necessary material. Point form and tables are acceptable where appropriate.
It should be submitted electronically as a file with the name “report” (acceptable formats are: plain text, PDF or Word). Make sure your name and student number are on every page of your report. Putting them into a header or footer would be a good idea.
Also include in the report for Task 2 your answer for Task 3 described below.
An extract from Pride and Prejudice
Pride and Prejudice
by Jane Austen
Chapter 1
It is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife.
However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.
"My dear Mr. Bennet," said his lady to him one day, "have you heard that Netherfield Park is let at last?"
Mr. Bennet replied that he had not.
"But it is," returned she; "for Mrs. Long has just been here, and she told me all about it." Mr. Bennet made no answer.
"Do you not want to know who has taken it?" cried his wife impatiently.
"YOU want to tell me, and I have no objection to hearing it."
This was invitation enough.