Dr. Quack: comparing two ways of approaching the same sort algorithm

[Wed, 16 Sep 2015 13:25:46 +0000]
Wow. These past 10 months or so have gone by in a weird way. I was surprised to learn that it's been almost a year since I wrote this [] post on a way of sorting two sets of search results in way I'd been thinking about for a while. I called it a "duck sort". Anyway, when I wrote the function to do the sort I was comparing two lists, specifically the first item of each. When I decided which list's item "won" the comparison, I would remove the item from the winning list and continue to make the comparisons until one of the lists became empty. I'd always wondered if it would be better to traverse the lists instead of continually removing the first item of each list. In terms of clarity in my head, I went with the "remove" approach instead of the "traverse" approach because it was easier for me while just trying to figure out the basic logic of the sort method. But now I have this Python timer thing I just posted about here [] ... So, I made a "traverse" version of the sort method and made a little test script to take an increasingly larger data set and time the two approaches to the algorithm and see what happened. Not surprisingly, traversing the lists was increasingly more efficient and faster than my original "remove" approach, i.e. altering/re-drawing the lists. So, here is the new "traverse" approach to the "duck sort" and the original "remove" approach. """ This code is in Python. """ ### duck sort via traversing lists. def duck_sort_TRAVERSE(x,y): output = [] yi, xi = 0, 0 while yi < len(y) and xi < len(x): if y[yi][1] >= x[xi][1]: output.append(y[yi]) yi += 1 else: output.append(x[xi]) xi += 1 output = output + y[yi:] output = output + x[xi:] return output ### duck sort via removing item from lists. def duck_sort_REMOVE(x,y): output = [] while len(y) > 0 and len(x) > 0: first_x = x[0] first_y = y[0] if first_y[1] >= first_x[1]: output.append(first_y) y.remove(first_y) else: output.append(first_x) x.remove(first_x) output = output + y output = output + x return output I ran the test on the same "Hansel and Gretel" data from my original post on the "duck sort". But this time, I simply multiplied the data set by a number starting from "0" and ending in "10,000" in increments of 5,000. This way I could compare the speed of the two approaches on a larger and larger data set. The results are interesting. Here's a table with the time (seconds) of the "traverse" and "remove" approaches. The "data_length" column is simply the size of the multiplier used to augment the data set. data_length traverse remove 0 0 0 5000 0.0130000114 0.0939998627 10000 0.0279998779 0.3059999943 15000 0.0360000134 0.5829999447 20000 0.0479998589 1.0199999809 25000 0.0610001087 1.7070000172 30000 0.0789999962 2.3399999142 35000 0.0850000381 3.2030000687 40000 0.1010000706 4.1009998322 45000 0.111000061 5.2359998226 50000 0.1240000725 7.9670000076 55000 0.132999897 8.3550000191 60000 0.1679999828 12.876999855 65000 0.1769998074 16.1730000973 70000 0.1919999123 16.2030000687 75000 0.2049999237 15.4270000458 80000 0.2190001011 17.5679998398 85000 0.2230000496 19.131000042 90000 0.2339999676 21.3770000935 95000 0.243999958 24.8929998875 100000 0.254999876 25.8870000839 Perhaps more interesting is a graph of the two approaches, below. With the graph, it's hard to even see that the "traverse" approach is trending upward. IMAGE: "comparison of traverse and remove approaches to duck sort"[]