Introduction
In this assignment, you will work with introductory Object-Oriented Programming (OOP) to create classes with associated atributes and methods. This handout explains the problem you are to solve, and the tasks you need to complete for the assignment. Please read it carefully.
There are two parts of this assignment:
- Code Component
- Written Component
Goals of this Assignment
- Write Python classes with attributes and methods
- Interpret more complex pre-written code to interface with your own work
- Synthesize previous course concepts (dictionaries, lists)
- Apply runtime analysis to identify program complexity
Files to Download
Please download the Assignment 4 files and extract the zip archive.
- Starter code:
- dictionary_class.py and word.py You will complete a few methods in these files. The classes defined in these files are used in dictionary_program.py .
- Data: english_dictionary.p You should not open or modify this file. This is a special binary file that Python can read much faster than a regular .txt file. The lexical dictionary was pre-built for you and saved in this format.
- The data for this assignment was taken from the deprecated Wordset Dictionary (https://github.com/wordset/wordset-dictionary) , which is in turn based off of WordNet (https://wordnet.princeton.edu/) . The license is retained at the bottom of this page. Please note that there is no guarantee on the accuracy of the definitions in the dataset, and we (instructors and the University of Toronto) are not liable for any content, ie. entries and/or definitions, in the dataset which you may find offensive. If you are uncomfortable with the contents of the dataset, please reach out to us for an alternative dataset to use for testing.
- Main program: dictionary_program.py You should not modify this file. This is the interactive dictionary program where you can explore the dictionary we have provided. This file will only work after you have completed the tasks in word.py and dictionary_class.py . If you try to look up a word that does not exist in the dictionary, the dictionary will (by default) suggest the closest word to what you have entered. If you find that this is running too slowly, you may modify the constant SUGGEST_WORD and set it to False .
Data Structures
For each class, a __repr__() method is provided for you.
Definition class
The Definition class represents a definition, with the attributes definition ( str ) and part_of_speech ( str ). The part_of_speech attribute indicates whether the definition is attributed to a verb, noun, adjective, etc.
The Entry class represents a single entry in the dictionary. It has the attributes word ( str ) and definitions ( List[Definition] ).
MyDictionary class
The MyDictionary class represents our lexical dictionary. It has the attributes entries and __num_entries__ .
The __num_entries__ attribute is an int representing the number of entries in the dictionary. The entries attribute is a dictionary ( Dict[str, List[Entry]] ) with:
- key: an alphabet in the English language ( str ), lowercase
- value: a List of instances of class Entry , containing all dictionary entries that start with the letter indicated by the key.
The following is an example of how these classes are used:
>>> word = "illuminate"
>>> def1 = Definition("make lighter or brighter", "verb")
>>> def2 = Definition("make free from confusion or ambiguity", "verb")
>>> def3 = Definition("add embellishments and paintings to (medieval manuscripts)", "verb")
>>> entry = Entry(word, [def1, def2, def3])
...
>>> my_dictionary = MyDictionary()
>>> my_dictionary.add_entries([entry])
...
>>> print(my_dictionary)
illuminate:
1. VERB. make lighter or brighter
2. VERB. make free from confusion or ambiguity
3. VERB. add embellishments and paintings to (medieval manuscripts)
>>> my_dictionary.get_num_entries()
1
>>> my_dictionary.entries
{'i': [illuminate:
1. VERB. make lighter or brighter
2. VERB. make free from confusion or ambiguity
3. VERB. add embellishments and paintings to (medieval manuscripts)
]}
Code Component
Complete the following methods according to their descriptions:
dictionary_class.py
To test the MyDictionary class without completing suggest_closest_word , set the constant SUGGEST_WORD to False in dictionary_program.py .
- MyDictionary.__init__() -> None Initialize attributes entries as an empty Python dictionary and __num_entries__ as 0 .
- MyDictionary.get_num_entries() -> int Return the number of entries in the lexical dictionary.
- MyDictionary.suggest_closest_word(str) -> str The parameter represents a word that is not present in entries as a lexical word. Using the Levenshtein distance as a measure of similarity, return the closest word to the word passed in. You should use the MyDictionary.__levenshtein_dist__() method here.
- Smaller levenshtein distance means a higher degree of similarity. For two arbitrary unique words, word1 and word2 (unique means word1 is not the same as word2 ), the minimum levenshtein distance is 1 and the maximum levenshtein distance is max(len(word1), len(word2)) . You may read more about Levenshtein distance here (https://en.wikipedia.org/wiki/Levenshtein_distance) .
- Carefully consider when this function should terminate. Is the closest word unique?
Try running the following example:
>>> my_dictionary = MyDictionary()
>>> my_dictionary.__levenshtein_dist__('capybara', 'iguana')
?
>>> my_dictionary.__levenshtein_dist__('apple', 'bapple')
?
word.py
- Definition.__init__(str, str) -> None The first parameter represents the definition and the second parameter represents the part of speech. Initialize attributes definition as str and part_of_speech as str .
- Entry.__init__(str, List[Definition]) -> None The first parameter represents the word and the second parameter represents a list of definitions association with the given word. Initialize attributes word as str and definitions as a list of Definition .
Written Component
Choose two of the following questions to answer. Reference any sources using the IEEE citation format. Save your answers in a single .pdf document.
1. In the MyDictionary Class, the method get_word_index uses a search algorithm to locate entries in the lexical dictionary. In your own words, answer the following:
- What is a search algorithm?
- Identify both the best-case and worst-case runtime complexity for the algorithm used for get_entry_by_word . Give examples for both cases.
- Give a visual representation (e.g. draw a picture) to illustrate how this function locates the entry for the given word.
2. In the MyDictionary Class, the method __levenshtein_dist__ uses the levenshtein distance metric to find words that are lexically similar to each other.
- Run the __levenshtein_dist__ method on the words "canary" and cannery, with display=True . Include the result in your answer. Given the resultant distance value, indicate the shortest way to transform canary into cannery.
- The method suggest_closest_word takes noticably long to run, regardless of implementation. Using the concept of runtime complexity, identify why this function takes so long to run. Refer to the __levenshtein_dist__ method in your answer.
- Identify the best- and worst- case runtime complexity for the suggest_closest_word method. Give examples for both cases.
3. In the MyDictionary Class, there is a __repr__ method provided. In your own words, answer the following:
- What is the purpose of the __repr__ method?
- What should you consider prior to calling print() on an instance of MyDictionary ?
- As per a lexical dictionary, the dictionary entries are sorted lexically within their subsection (refer to add_entries ). Explain how this affects the runtime complexity of the program. (Hint you may wish to consider the methods add_entries and get_entry_by_word , as well as the attribute __num_entries__ )