it2051229 CSE2 CSE5ALG Word Lexicons Phase 2

Background

As described in the handout for Part 1, the overall aim of the assignment is to develop a program to build a lexicon and to find the words that match certain patterns.

Whereas for Part 1 we are only concerned with the correctness of the program, for Part 2 we are concerned with the efficiency. More specifically, you are required to do the tasks described below. Besides the information given in the tasks below, please refer to Part 1 of the Assignment for any other information you need.

Task 1

Write a Java program called WordMatch.java. This program takes four command-line argu- ments. For example:

java WordMatch in1.txt out1.txt in2.txt out2.txt

1. The first is the name of a text file that contains the names of the text files from which the words are to be read to build the lexicon (The aim of this argument is to specify the input files)

2. The second is the name of the text files to which the words in the lexicon are to be written (The argument specifies the file that contains the words and their neighbors in the lexicon)

3. The third is the name of a text file that contains a number of matching patterns, one per line (The aim of this argument is to provide the matching patterns)

4. The fourth is the name of the text file that contains the result of the matching for the given patterns (The argument specifies the file that contains the result)

For this version, the efficiency with which the program performs various operations is a major concern.

For example, the files read in can be quite long and the lexicon of words can grow to be quite lengthy. Time to insert the words will be critical here and you will need to carefully consider which algorithms and data structures you use.

You can use any text files for input to this program. A good source of long text files is at the Gutenberg project (www.gutenberg.com) which is a project aimed to put into electronic form older literary works that are in the public domain. The extract from Jane Austen's book Pride and Prejudice used as the sample text file above was sourced from this web site. You should choose files of lengths suitable for providing good information about the efficiency of your program.

A selection of test files have been posted on LMS for your efficiency testing. You can consider additional test files if you wish.

As expected, the definition of a word, and the content of a query's result and display of this result are exactly the same as what described in Part 1.

Task 2 - Report

Write a report about the WordMatch program and the classes that support it. The report has the following sections:

Section 1: In this section, describe the classes that you implement for Task 1. For each class, list and briefly describe its attributes and methods.
Section 2: In this section,
- Identify what you consider to be the main issues in the effort to make the program run efficiently; for example, what subtasks you think could be costly in terms of execution time
- Point out the techniques/features that you have used to address those issues
Section 3: In this section, explain how you test the functional correctness of your program.
Section 4: In this section, explain how you test the efficiency of your program. Make sure you present quantitative information about its efficiency.

Task 3

Consider the B-trees of order M. Assume that we have the following result, which we will refer to as Lemma 1.

Lemma 1: The barest B-tree of height H contains N = 2K^H - 1 elements, where K = [M / 2].

Determine the upper bound for a B-tree of order 21 which has 1,000,000 = 10⁶ elements.

You must give an integer value as the upper bound of the B-tree.

You are not allow to use the result given in the lecture regarding the upper bound for B-tree's height. Instead, you must work out the answer using Lemma 1 above.

Academic Honesty!

It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.

Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.