This project is designed to help you develop the skills nescessary to write and analyze algorithms. We will be using search algorithms which are relatively simple to understand and apply what we have learned about measuring time and measuring comparisons.
Your goal is to validate the hypothesis that binary search is more efficient than sequential search both in terms of runtime and number of comparisons. Your validation will be purely analytical and based on data you will generate and report on.
You will write all the code needed for this lab from scratch. That said, feel free to reference code and helper functions that have been demonstrated in class, and re-write these functions into your own project code base.
1. Write a sequential search algorithm
Implement your own sequential search algorithm that operates on a std::vector< int> and finds an int in the list. This algorithms should have three parameters: a begin iterator referencing the first element in the range to search through, an end iterator such that end-- would reference the last element in the range to search through, and a callable function that will be used to check for equality. The algorithm should return an iterator referencing the element in the container if the element is found in the container, otherwise it should return the end iterator that was passed into the function.
This will be very similar to the function we wrote in class, except for the third parameter. Rather than taking an int, you should take a callable function. This function should take a const int& as a parameter, and return true if the parameter matches the item you are looking for, and false otherwise. Rather than using the == to check if an item in the list matches the item you are looking for, you will invoke the function and pass the item in the list that you want to check. This behavior is similar to how the std::find_if algorithm works and is discussed in the class lectures.
2. Write a binary search algorithm
Implement your own binary search algorithm that operates on std::vector< int> and finds an int in the list. This algorithm should have the same signature as the above sequential search algorithm - that is, it should take three parameters: a begin iterator referencing the first item in the search range, an end iterator such that end-- references the last element in the search range, and a callable function. The binary search algorithm should also return an iterator referencing the item in the list if found, and the end iterator passed in to the function otherwise.
The callable function behaves almost the same as the above sequential search algorithm. It should be invoked on each element in the list to check for to the item you are searching for. However, this function has a slightly different behavior becuase you will need to check not only for equality, but also for less-than and greater-than. The function should take a const int& as a parameter, this will be the item in the list to check. The function should return an int as a result with the following characteristics: the return value is -1 if the item in the list is the item you are looking for, the return value is 0 if the item in the list is to the item you are looking for, and the return value is 1 if the item in the list is the item you are looking for. You will be able to use this callable function to determine which sub-list to narrow your search down to, just like we did in the class lecture.
3. Measure runtime
Write code inside of main that measures runtime of the following four algorithms:
1. Your own sequential search algorithm
2. Your own binary search algorithm
3. The standard library std::find_if algorithm which is a sequential search
4. The standard library std::binary_search algorithm. Note that this function takes four parameters, the third parameter is the value to search for, and the fourth parameter is a comparison function. You can look up the requirements in CPPReference.
You will need to generate various sized lists of random integers as explained below and also pick a random integer value to search for in the list before calling the functions.
Please carefully follow these instructions for generating the lists:
All of these requirements can be written using a few nested for-loops. Please don't try to do this manually where you change a value, then re-compile and re-run the code and record the results. That would end up taking an extremely long time. Write a program that does all of this for you, and then just let the program run. The program will likely take several minutes to run.
4. Measure comparisons
When you run all of these algorithms, you will take advantage of the fact that the third parameter is a callable function that you provide. You can use this to keep track of the number of times that function is called.
For each of the above list sizes in addition to measuring the runtime, measure the number of comparisons that the search algorithm uses by creating a counter variable and incrementing it on each comparison.
5. Produce a graph of your data
For all of the data you generate (runtime data and comparison count data) keep track of the results. You can print out all of the results to a file in order to have access to it when the program is finished running.
Here is an example of what the raw data looks like printed to the console:
Size MyFindTime MyFindIter StdFindT
1000 5.26e-05 1000 1.638e-05
2000 0.0001002 2000 3.136e-05
3000 0.00014832 3000 4.468e-05
I recommend that instead of printing to the console, you print to a file. You can print using the CSV format. This will make it easy to import the data into a spreadsheet and create a graph of the data.
Here is an example of the data I generated printed to a file in CSV format:
Size,MyFindTime,MyFindIter,StdFindTime,StdFindIter,MyBinTime,MyBinIter,S
1000,5.302e-05,1000,1.662e-05,1000,3.22e-06,9,2.5e-06,11
2000,0.00011298,2000,3.39e-05,2000,6.94e-06,10,3.48e-06,12
3000,0.0002073,3000,6.712e-05,3000,4.86e-06,12,3.18e-06,11
4000,0.00025662,4000,8.664e-05,4000,4.8e-06,12,3.3e-06,11
5000,0.0003302,5000,0.00012046,5000,5.5e-06,13,3.32e-06,12
6000,0.00048316,6000,0.00014118,6000,6.84e-06,13,4.34e-06,12
Once you have the data, you need to create two graphs of the results. One graph for runtime, and one graph for comparisons. The hould represent the list size, and the should represent the number of seconds for the first graph, and the number of comparisons for the second graph. On the graph, there should be a different line for each algorithm.
You can open the CSV data in a spreadsheet program. You can use Excel, or Google docs, or any of several other free spreadsheet programs. If you have the CSV data formatted as shown above, then it should work well.