This assessment is designed to test your ability to architect and create a relatively substantial java application. The core of the assessment focusses on Object Orientation, I/O, Exceptions, and the selection and use of appropriate data structures. The later parts of the assessment focus on user interface design, JDBC database manipulation, and the use of third party libraries.
The UK's Land Registry makes available its records of the price paid by the buyers of residential property [1]. The data is updated monthly, and is available to download as a CSV file containing the data for either an individual month, or going back as far as January 1995. You will be using the most recent full CSV file, the link to click is depicted in Figure 1, below. Details of what the columns in the CSV file are, and what the data inside them means, is available at [4].
Figure 1: Download Link For Land Registry Price Data see image.
Note: the file very large, so I would advise you to download this using a download manager (e.g. Flashget etc.) capable of resuming broken downloads. You will need to allow sufficient time for the download, too. Typically, it will be far faster to download the file at University, using our high-speed network. You will not submit the file with your work. I have my own copy!
Your application will allow users to input the start of a postcode, and view the date, price and address of each matching property sale recorded in the file. As you progress through the assignment, you will see first hand the difference in performance that a well-chosen and sensibly designed database can make to the speed of searching through larger quantities of data.
The assessment specification is broken down into a number of sections. The first section, the core app, accounts for 40% of the marks on the assessment, and is compulsory. The later sections are weighted 10% or 20%, and you are free to choose which combination of them you would like to work on. I would recommend considering and planning the combination of features you intend to complete before starting work on the implementation to reduce the amount of code that needs reworking as you add later features. For reassessment students whose unit marks are to be capped at 40%, I would recommend working on features totalling at least 60% of the assignment, as it is normal to lose a small number of marks in each section for minor errors. You are free to choose which of the additional features to work on.
The user should be able to enter the first few letters of a postcode, and the app should display the date, address, and purchase price of each matching sale in the file. If you have opted to complete the Swing interface section, then the results should be displayed in an appropriate swing component (e.g. a JTable). If you have chosen to stick to the command line, print the date, address and sale price of each matching entry to stdout.
You might be tempted to write the application such that the file is loaded once at startup, and that repeated searches can be made against an in-memory representation of the data. This would indeed yield quicker searches, but the data file is so large that it may not fit into memory using the standard JVM configuration on most machines. This means that you'll need to use a BufferedReader to process each line of the file individually for each search.
Comma Separated Value (CSV) files are relatively easy to parse programmatically, requiring you to tokenize each line of the file using the comma character as a delimiter and identify the column (and thus token) numbers you require, which will contain the desired data. The java.util.StringTokenizer class, or the String.split() method are two common ways of handling the tokenization. I would recommend the StringTokenizer class for performance reasons [2], despite the official JavaDoc documentation discouraging its use in new code [3]. In the Land Registry's data file, each token is surrounded by quotation marks, which you will need to strip, using the String.replace() method. Remember to escape \" the quote character in your code when passing it to the replace() method, or the compiler will become very confused as to where the String starts and ends!
The app's code should be structured using appropriate classes and methods so as to avoid any significant code duplication, and to maximise ease of reading and maintenance. You should pay some consideration to the performance of the file reading, tokenizing and parsing, given the large size of the files and resultant number of iterations that will be required. The code should be commented appropriately, using the JavaDoc standard.
SQLite is an extremely popular embedded database engine, that unlike client-server databases like MySQL or Firebird, stores databases in simple files. SQLite is available for, or integrated into, most programming languages, and works on most operating systems. The database files can be read or manipulated by any program using the SQLite libraries, across operating systems. There are also many utilities to help create and manipulate SQLite databases.
vCreating an SQLite database from the file is a relatively simple task, using the command line sqlite tool available from [5]. Use the SQL snippet depicted in Figure 2, adjusting the path to the CSV file as necessary. The .import will likely take several minutes to complete. Creating an index of the postcode column will help speed up our queries.
Figure 2: SQL to create appropriate database and index for price paid data see image.
Your application should use the xerial SQLite JDBC connector [6], as you did in lab sessions, to search the database you have created instead of accessing the data file directly. This will make a significant difference to the time taken to perform searches. If you complete the assignment using an SQLite database, it is not necessary to produce a second version that loads the data directly from the CSV file: consider this feature an "instead of" rather than an and.
Command line interfaces are relatively simple to write, but many non-technical users find them awkward and fiddly to use. Write a Swing user interface, in which the user can search by entering their partial postcode in a JTextField, and clicking a search JButton. The date, full Figure 2: SQL to create appropriate database and index for price paid data address and sale price of each matching results should be displayed in a JTable. Your interface must be written by hand, without the aid of any visual form designers.
This section is intended to be challenging, and depends on you having completed the previous Swing UI section. JFreeChart [7] is a popular open source library for displaying graphs in java applications. Your application should use the JFreeChart library to draw a scatter plot, with the date of sale plotted on the x axis, and the sale price plotted on the y axis. The best solutions will use Jfreechart's built-in regression functions to draw a "line of best fit" using the plotted data. Figure 3, overleaf, depicts an example of the sort of chart that you should create.
Figure 3: Example Scatter Plot, with Line of Best Fit see image.