it2051229 SER222 Determining Optimal Character Memorization Order

Summary: In this assignment, you will implement a topological sort algorithm as a way to reorder kanji (; Japanese logographic characters) into an order optimal for memorization.

1 Background

In this assignment, you will practice applying your knowledge of graph data structures and algorithms to determine an optimal order to learn logographic characters.

Consider the following scenario: as a native speaker of the English language, you wish to learn a language like Chinese () or Japanese (), which uses a special character set. Unlike the English language, which uses the Roman alphabet to encode phonetics, many east Asian languages use Hanzi derived characters, which are symbols with semantics bound to them (to use the technical term: logographic character). Often times, English speakers learning a language using logographic characters find themselves stumped by the apparently insurmountable problem of memorizing thousands of unique characters. They all look so different and yet so similar - can we hope to tell them apart or write them? Wouldn't it be nice if we could use our knowledge of algorithms and data structures to somehow address this problem so that English speakers would have a better shot at learning or ...?

One interesting dependency graph that can be constructed is a "component" graph for Hanzi characters, or Hanzi derived characters (e.g., Kanji). You see, complex characters are often built from simpler characters as components. This sub-structure is very useful! Components not only define a common appearance and stroke order but can indicate phonetics and/or semantics. Furthermore, there are considerably fewer components (hundreds) than actual characters (thousands). (Please note that we use the term component very generally here - it does not map to the traditional notion of a radical.) This sub-structure is particularly useful for people memorizing characters - instead of looking at each character as a monolithic block, one can memorize the individual components, and then reuse that knowledge for more complex characters. The following graph is an example of this for the character ("method"):

Figure 1: Dependency graph for . See image

Reading this graph, we can see that is made from ("gone") and ("water") ( is written as here, a standard transformation). Recursively, is made from ("soil") and . Based on these dependencies, we would want to learn these characters in the order: . If we do that, then instead of learning each stroke in , we just have to remember "method=water+gone", which tells us to write and . (For more information on these ideas, see Remembering the Kanji by James Heisig.) That said, in order to make use of this recursive structure, we would have to learn characters in an order such that we also see simpler characters before we see the more complex characters that are built out of them. To solve this problem, we can produce a topological sort of a graph. A topological sort is an ordered list of vertices in graph such that all dependencies are listed before their dependent.

2 Requirements

In this assignment, you will implement an editable graph data structure and a new algorithm for finding the topological sort of a graph. Our goal will be to load two data files, and print out a topological order for the characters that they list. Attached to this post are a base file and two interfaces. Further down the BlackBoard page, you'll find a PDF describing the Java implementation of hashtables (you may find it useful), and a visualization of the dependencies in the data files. The data is formatted as follows:

data-kanji.txt (UTF-8 formatted) stores nodes:

Tab separated.
# prefixes comment lines.
Lines look like < characterID1> < character1>, e.g., "120 ", which indicates that character1 can be represented as the number characterID1. IDs are just integers.

data-components.txt (ASCII formatted) stores edges:

Tab separated.
# prefixes comment lines
Lines look like < character1ID> < character2ID>, e.g., "92 73", which indicates that character1 is a component of character2.

In terms of programming, you will need to:

Create a new class called BetterDiGraph that implements the EditableDiGraph interface. See the interface file for details.
Create a new class called IntuitiveTopological that implements the TopologicalSort interface. Use BetterDiGraph to store the graph.
- Instead of using DFS to find a topological sort, implement the following algorithm: "IntuitiveTopological". This algorithm works as follows: look at your graph, pick out a node with in-degree zero, add it to the topological ordering, and remove it from the graph. This process repeats until the graph is empty.
- Make sure to check for cycles before trying to generate a topological sort of the graph!
Complete the main method in LastNameMain.java. It should:
- Load data-kanji.txt, use it to populate a hashtable that maps IDs to characters, and add the IDs as nodes in the graph.
- Load data-components.txt, and use it to add edges to the graph being built.
- Create an IntuitiveTopological object, and use it to sort the graph.
- Display the characters in the ordering. Note that topological sort will produce a list of a IDs - you'll need to take the IDs and uses them to look up the correct character in the hashtable you populated earlier.
Extra Credit: add support for visualizing the graph that you generate. Most likely this will take the form of using a graph library such as GraphViz to render an image for the data you load. An example might look like the image below. see image.
If you find yourself adding import packages other than java.util.LinkedList, java.util.HashMap, java.util.NoSuchElementException, or java.io.*, please double check with your instructor that they may be used.

3 Testing

Below is a sample output for your program that contains the characters in the default order from the file, and the order resulting from a topological sort. Note that your topological sort may be in a slightly different order than is shown below. The key is that simpler characters are shown before the more complicated characters that are built from them.

Figure 2: Sample output. see image.

Academic Honesty!

It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.

Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.