Friday, December 31, 2010

How to make Quicksort suck: use one-handed partitioning

While I was away (back in late 2009), somebody posted a Dual Pivot Quicksort, that advertised (and delivered) a 20% reduction in the number of exchanges, compared to Quicksort, as defined in wikipedia's article (http://en.wikipedia.org/wiki/Quicksort ). That sounds good, doesn't it? But...

Wikipedia insists on posting a quicksort with a partitioning routine that is downright lousy. The number of exchanges required during a quicksort of a randomly ordered array of n distinct elements shouldn't be ln(n)*n on average. It should be ln(n)*n/3. In other words, Dual Pivot Quicksort still does 12/5 = 2.4 times as many exchanges as are actually necessary.

Here's the pseudo-code for partition() in the Wiki article (verbatim) (which I think of as a "partition with one hand tied behind its back", or a one-handed partition):
  function partition(array, left, right, pivotIndex)
pivotValue := array[pivotIndex]
swap array[pivotIndex] and array[right] // Move pivot to end
storeIndex := left
for i from left to right - 1 // left ≤ i <>
if array[i] ≤ pivotValue
swap array[i] and array[storeIndex]
storeIndex := storeIndex + 1
swap array[storeIndex] and array[right] // Move pivot to its final place
return storeIndex
That pseudocode is very pretty, very short, and very stupid. Better partitioning routines have been around for decades (google for Sedgewick Partition: certainly since the 1970s), and I wouldn't mind betting that the Quicksorts that beat Mergesort (for sorting arrays of value rather than reference types, anyway) actually use those. The pseudocode for the partition routine should look more like this (this is a "two-handed" version):
function partition(array, left, right, pivotIndex)
pivotValue := array[pivotIndex]
swap array[pivotIndex] and array[left] // Move pivot to middle
i = left+1
j = right
while i <= j and array[i] < pivotValue
i = i + 1
while i < j
while array[i] < pivotValue //search from left
i = i + 1
while pivotValue < array[j] //search from right
j = j - 1
swap array[i] and array[j]
i = i + 1
j = j - 1
if left < j
swap array[left] and array[j]
return j
The first while loop is there to handle the special case that there's no element in the remainder of the array that is not less than the pivot value. Otherwise it would be necessary to check that i is less than j when searching from the left. Equal elements get exchanged (that's so we don't have to check that i is less than j during the search from the right, and it also ensures that inputs containing large numbers of duplicates don't cause problems). The last if statement is necessary because there might be no other element greater than or equal to pivotValue.

It's more complicated, yes, but the extra complexity is worth it! The second algorithm "burns the candle at both ends", searching from the left for values larger than (or equal to) the pivot, and from the right for values smaller than (or equal to) the pivot, and exchanges each such pair that it finds. The number of exchanges is equal to the number of values that should be, but aren't yet, in the smaller of the two partitions.

Contrast this with the algorithm from Wikipedia's article, which performs one exchange for each value that should be in the left-hand partition. On average, to partition N elements, the "two-handed" version needs N/8 exchanges rather than the N/2 required by the "one-handed" version: four times fewer.

However, switching to "two-handed" partitioning will not reduce the expected number of exchanges for the entire quicksort by a factor of 4. It actually reduces it by a factor of only 3.

If you're ready to "invest" in two more index variables, you can switch to a routine that does "rotations" rather than exchanges, and will partition an array of N elements in, on average, N/4 moves. It's theoretically better (it reduces the number of moves by a third, on average), but it's much messier and only marginally faster.

Dzmitry Huba's parallel mergesort

Wow! No post in nearly eighteen month. My daughter took over the laptop. And the dog ate my homework.

Recently I spotted an interesting post, with C# code, at...

http://dzmitryhuba.blogspot.com/2010/10/parallel-merge-sort.html

...and the code was pretty damned good (nice how C# delegates make it all a lot cleaner; in my C++ code I have had to write a bunch of hideously ugly template functions to fake delegates; perhaps I should be coding in C# instead). A few minor, minor things:
  • The main reasons that Dzmitry's parallel mergesort outperformed the parallel quicksort at http://msdn.microsoft.com/en-us/library/ff963551.aspx were probably that (a) the Partition() routine is an inefficient "one-handed" implementation [see next post], and (b) the parallel quicksort did not parallelize the partitioning step. I've got some code that does that (badly) lying around somewhere which I'll paste into a later post (if I can find it). If the partitioning is parallelized, and you're using "two-handed" partitioning, and you're sorting integers, parallel quicksort should win.

  • The load balancing problem that Dzmitry mentioned is largely caused by the use of BinarySearch in his ParallelMerge(). The problem is that using the median of one of the ranges and binary searching for that value in the other range doesn't divide the merge evenly enough, particularly if the input was ordered or mostly ordered. If you are merging two ranges of size n, you need to search, instead, for an index, i, such that the first i elements of the left subrange will be merged with the first n-i elements of the right subrange. That will always handle an (n,n) merge by scheduling two same-sized merges: one of (i,n-i), one of (n-i,i). There's no risk of it scheduling an (n/2,n) and an (n/2,0) merge.

    Unfortunately I was lazy, and the code I've got that does that is wired in to the merge routines themselves (and furthermore it is generalized for splitting the load across P cores, where P is not necessarily a power of two, and takes P as a parameter, which makes it much harder to see what it is doing). I'll have to do a rewrite.