Merge Sort Lab
This is a lab activity, not an assignment. Therefore, it's not required that you complete it or hand it in. However, the concepts in this lab will be useful for completing the homework assignments (and will be useful for exams down the road), so I'd encourage you to put in a serious effort on it in class, and to consider finishing any remaining parts outside of class.
Goals: Understand merge sort by implementing it. Compare the efficiency of different sorting algorithms.
Setup
Log into a lab computer, mount COURSES, and navigate to your STUWORK directory. Look back at the lab from the first day of class or get help from Anna or Rachel if you need a reminder on how to complete these steps.
Create a folder in your STUWORK directory called mergeSortLab. Open VSCode, and drag the mergeSortLab folder
into the VSCode window. Download the
starter code
and move the files into your new folder.
The download should include two Python files, mergeSort.py and sortTiming.py.
Part 1: Implement the merge function
There is starter code for a merge function in mergeSort.py. This function will take in two sorted lists, and
return a new list that is the merged (sorted) version of both lists.
First, write a
whileloop that implements the following pseudocode. You should use the variables that I created for you.# while there are still items in both the left and right halves # check if the next left item is less than or equal to the next right item # if so, append the next left item to the merged list # otherwise, append the next right item to the merged listYour
mergefunction isn't quite done: it'll stop merging as soon as you run out of items in one of the lists. But this is a good time to check whether it works, so far. Uncomment the lines of code towards the end of the file that are labeled as test cases for Part 1.Run the code (
python3 mergeSort.py), and you should get the following output:[0, 1, 2, 3, 4, 5, 6, 7] [0, 1, 7, 8] ['apple', 'avocado', 'banana', 'cucumber', 'fig', 'grape']
Finish writing the
mergefunction by implementing the following pseudocode after the while loop:# if we're not done with the left list, append all remaining items to the new list # otherwise, append all remaining items from the right list to the new list
Now, when you run the code, you should get the following output:
[0, 1, 2, 3, 4, 5, 6, 7, 8] [0, 1, 7, 8, 20, 26] ['apple', 'avocado', 'banana', 'cucumber', 'fig', 'grape', 'zucchini']
Part 2: Implement the mergeSort function
Now we'll implement mergeSort! The empty function is already in mergeSort.py, you should fill it in as follows:
- The base case is when the length of the list is 0 or 1
- In the recursive case, you should make recursive calls on the first half and second half of the list. Remember
that you can use
lst[:midpoint]to get the first half of the list andlst[midpoint:]to get the second half of the list if you've definedmidpointto be the index of the middle item. Remember that this midpoint needs to be an integer! - Then, use the
mergefunction to merge the sorted sublists, and return the merged result.
Once you're implemented mergeSort, uncomment the test cases that I've written. You should get the following
results:
[1, 2, 3, 4, 5] ['apple', 'avocado', 'banana', 'cucumber', 'fig', 'grape', 'zucchini'] [1, 1, 2, 2, 2, 3] [-10, -5, -1, 0, 2] [(1, 'b'), (2, 'a'), (2, 'c')]
Part 3: Explore timing
In the file sortTiming.py, I have written code that will run trials of several sorting algorithms, and then
print the average runtime of each algorithm. You have a few tasks to do:
- At the top of the file, fill in the
mergeandmergeSortfunctions by copying over your code frommergeSort.py. Run the code. You should see something like this: (the times may be smaller, I'm not sure how much faster the lab computers are than my laptop!)
Working on input size: 10 ----------------------------------- Avg. time for bubble sort with input size 10: 0.00312 ms Avg. time for selection sort with input size 10: 0.00223 ms Avg. time for insertion sort with input size 10: 0.00187 ms Avg. time for merge sort with input size 10: 0.00494 ms Avg. time for built-in sort with input size 10: 0.00021 ms
- See how the results vary as the length of the list grows (change the variables in
main) and record the results on your worksheet. You should also decrease the number of trials as the input size grows. For the largest trials, you can also skip running the slowest sorting algorithm by commenting out those lines of code inrunTrials. - Plot the data on the Desmos calculator or another plotting software of your
choice to see how the results compare with the Big O times we've discussed for these algorithms. In Desmos, you can
click the "+" in the upper left corner, and then select "table" to enter in your data.
- You may want to adjust the zoom to see all your data more clearly. The best way to do this is to click the zoom magnifying class icon, located just to the left of the table where you're entering the data.
- Answer the questions on your worksheet about which algorithm performs best in different settings.
Extra time, or want extra practice?
Merge sort in place
It is expensive (both in terms of additional runtime, and additional memory usage) to create new sublists like we
do in the current version of mergeSort. You can fix this by creating a single temporary list and using index
pointers to mark where the current start and end of the list is.
Define a function
mergeInplace(lst, leftStart, leftEnd, rightEnd, tmp)which takes the following parameters:- The full list (
lst) - An index pointer (integer) to mark the start of the left sublist
- An index pointer to mark the end of the left sublist (note that this pointer will also be the start of the right sublist)
- An index pointer to mark the end of the right sublist
- A temporary list (
tmp) that is the same size as the full list
Your function should do the same process as your original merge, but do the merging into
tmpand then copy section oftmpback intolst. For example, ifleftStartis 3 andrightEndis 7, the merged list should live in indices 3 through 6 (inclusive) in bothtmpandlst.Check that this function works correctly with the same tests we used in the lab. (You'll have to modify the test cases to pass in all necessary parameters.)
- The full list (
- Now, define a function
mergeSortInplace(lst, first, last, tmp)that implements merge sort, but instead of creating new sublists, adjusts thefirstandlastindex pointers to indicate where the sublists are.
Insertion sort optimization
Your experiments showed that merge sort isn't the fastest algorithm with small lists (insertion sort is faster).
This is because merge sort has extra work of splitting the list and making recursive calls. Because of this,
built-in sort algorithms like Python's sort() method typically use a hybrid approach where insertion sort is used
in place of merge sort once the list is small enough.
- Make a copy of your merge sort function and call it
hybridSort. Then switch it to use insertion sort instead of recursive merge sort when the sublist size is smaller than 10. - Re-run the timing experiments to include
hybridSort. Is there a noticable different in the speed?
Anna's acknowledgements
This lab was adapted from a lab used by Anya Vostinar. Thanks for sharing!