blog.humaneguitarist.org
Dr. Quack: comparing two ways of approaching the same sort algorithm
[Wed, 16 Sep 2015 13:25:46 +0000]
Wow. These past 10 months or so have gone by in a weird way. I was surprised to learn that it's been almost a year since I wrote this [http://blog.humaneguitarist.org/2014/11/25/a-quackish-approach-to-sorting-federerated-search-results/] post on a way of sorting two sets of search results in way I'd been thinking about for a while. I called it a "duck sort". Anyway, when I wrote the function to do the sort I was comparing two lists, specifically the first item of each. When I decided which list's item "won" the comparison, I would remove the item from the winning list and continue to make the comparisons until one of the lists became empty. I'd always wondered if it would be better to traverse the lists instead of continually removing the first item of each list. In terms of clarity in my head, I went with the "remove" approach instead of the "traverse" approach because it was easier for me while just trying to figure out the basic logic of the sort method. But now I have this Python timer thing I just posted about here [http://blog.humaneguitarist.org/2015/09/15/timing-stuff-in-python-is-bad-for-your-knees/] ... So, I made a "traverse" version of the sort method and made a little test script to take an increasingly larger data set and time the two approaches to the algorithm and see what happened. Not surprisingly, traversing the lists was increasingly more efficient and faster than my original "remove" approach, i.e. altering/re-drawing the lists. So, here is the new "traverse" approach to the "duck sort" and the original "remove" approach.
""" This code is in Python. """
### duck sort via traversing lists.
def duck_sort_TRAVERSE(x,y):
output = []
yi, xi = 0, 0
while yi < len(y) and xi < len(x):
if y[yi][1] >= x[xi][1]:
output.append(y[yi])
yi += 1
else:
output.append(x[xi])
xi += 1
output = output + y[yi:]
output = output + x[xi:]
return output
### duck sort via removing item from lists.
def duck_sort_REMOVE(x,y):
output = []
while len(y) > 0 and len(x) > 0:
first_x = x[0]
first_y = y[0]
if first_y[1] >= first_x[1]:
output.append(first_y)
y.remove(first_y)
else:
output.append(first_x)
x.remove(first_x)
output = output + y
output = output + x
return output
I ran the test on the same "Hansel and Gretel" data from my original post on the "duck sort". But this time, I simply multiplied the data set by a number starting from "0" and ending in "10,000" in increments of 5,000. This way I could compare the speed of the two approaches on a larger and larger data set. The results are interesting. Here's a table with the time (seconds) of the "traverse" and "remove" approaches. The "data_length" column is simply the size of the multiplier used to augment the data set.
data_length traverse remove
0 0 0
5000 0.0130000114 0.0939998627
10000 0.0279998779 0.3059999943
15000 0.0360000134 0.5829999447
20000 0.0479998589 1.0199999809
25000 0.0610001087 1.7070000172
30000 0.0789999962 2.3399999142
35000 0.0850000381 3.2030000687
40000 0.1010000706 4.1009998322
45000 0.111000061 5.2359998226
50000 0.1240000725 7.9670000076
55000 0.132999897 8.3550000191
60000 0.1679999828 12.876999855
65000 0.1769998074 16.1730000973
70000 0.1919999123 16.2030000687
75000 0.2049999237 15.4270000458
80000 0.2190001011 17.5679998398
85000 0.2230000496 19.131000042
90000 0.2339999676 21.3770000935
95000 0.243999958 24.8929998875
100000 0.254999876 25.8870000839
Perhaps more interesting is a graph of the two approaches, below. With the graph, it's hard to even see that the "traverse" approach is trending upward. IMAGE: "comparison of traverse and remove approaches to duck sort"[http://blog.humaneguitarist.org/uploads/traverse_vs_remove.png]