This project requires you to compare the performance of two distinct sorting algorithms to obtain some appreciation for the parameters to be considered in selecting an appropriate sort. Write a Quicksort and a Heap Sort. They should both be iterative, so that the overhead of recursion will not be a factor in your comparisons. In your write-up, consider how your code would have differed if you had made the sorts recursive.
A quicksort (or Partition Exchange Sort) divides the data into 2 partitions separated by a pivot. The first partition contains exactly all the items which are smaller than the pivot. All other items are in the other partition. Be sure to comment on the benefits of using an iterative vs recursive version. Select the first item of the partition as the pivot. Treat a partitions of size one or two as a special case. You should write a second version that uses recursion down to a stopping case of a partition of size k or smaller. For these partitions, use an insertion sort to finish. Try a k value of 100 and 50. Create another version quicksort which uses the median-of-three as the pivot and only partitions of size 2 or 1 will be a special case.
Heap Sort is a practical sort to know and is based on the concept of a heap. It has two phases: Build the heap and then extract the elements in sorted order from the heap.
Create input files of four sizes: 50, 500, 1000, 2000 and 5000 integers. For each size file make 3 versions. On the first use a randomly ordered data set. On the second use the integers in reverse order. On the third use the integers in normal ascending order. (You may use a random number generator to create the randomly ordered file, but it is important to limit the duplicates to <1%. Alternatively, you may write a shuffle function to randomize one of your ordered files.) This means you have an input set of 15 files plus whatever you deem necessary and reasonable. The size 50 files are to be turned in. The other sizes are for timing purposes. It needs to print out the sorted values. It does not need to print the times.
Each sort must be run against all the input files. You will have a minimum of 75 runs
Your program should access the system clock to get some time values for the different runs. The call to the clock should be placed as close as possible to the beginning and the end of each sort. If other code is included, it may have a large, albeit fixed, cost, which would tend to drown out the differences between the runs, if any. Why take a chance! If you get too many zero time data values or any negative time values then you must fix the problem. One way to do this is to use larger, files than those specified. Another solution is to perform the sorting in a loop, N times, and calculate an average value. You would need to be careful to start over with unsorted data, each time through the loop.
Write an analysis comparing the two sorts and their performance. Be sure to comment on the relative runtimes of the various runs, the effect of the order of the data, the effect of different size files, and the effect of different partition sizes and pivot selection methods for Quick Sort. Which factor has the most effect on the efficiency? Be sure to consider both time and space efficiency. Be sure to justify your data structures. As time permits consider examining alternate methods of selecting the pivot for QuickSort. Also consider files of size 10,000 or additional random files - perhaps with 15-20% duplicates. Your write-up must include a table of the times obtained.
The project in total will have: Analysis (word file), The source codes, The compiled code, Copies of input data sets (as text files), Copies of all output to all required input cases (as text files)