Friday, December 31, 2010

Dzmitry Huba's parallel mergesort

Wow! No post in nearly eighteen month. My daughter took over the laptop. And the dog ate my homework.

Recently I spotted an interesting post, with C# code, at...

http://dzmitryhuba.blogspot.com/2010/10/parallel-merge-sort.html

...and the code was pretty damned good (nice how C# delegates make it all a lot cleaner; in my C++ code I have had to write a bunch of hideously ugly template functions to fake delegates; perhaps I should be coding in C# instead). A few minor, minor things:
  • The main reasons that Dzmitry's parallel mergesort outperformed the parallel quicksort at http://msdn.microsoft.com/en-us/library/ff963551.aspx were probably that (a) the Partition() routine is an inefficient "one-handed" implementation [see next post], and (b) the parallel quicksort did not parallelize the partitioning step. I've got some code that does that (badly) lying around somewhere which I'll paste into a later post (if I can find it). If the partitioning is parallelized, and you're using "two-handed" partitioning, and you're sorting integers, parallel quicksort should win.

  • The load balancing problem that Dzmitry mentioned is largely caused by the use of BinarySearch in his ParallelMerge(). The problem is that using the median of one of the ranges and binary searching for that value in the other range doesn't divide the merge evenly enough, particularly if the input was ordered or mostly ordered. If you are merging two ranges of size n, you need to search, instead, for an index, i, such that the first i elements of the left subrange will be merged with the first n-i elements of the right subrange. That will always handle an (n,n) merge by scheduling two same-sized merges: one of (i,n-i), one of (n-i,i). There's no risk of it scheduling an (n/2,n) and an (n/2,0) merge.

    Unfortunately I was lazy, and the code I've got that does that is wired in to the merge routines themselves (and furthermore it is generalized for splitting the load across P cores, where P is not necessarily a power of two, and takes P as a parameter, which makes it much harder to see what it is doing). I'll have to do a rewrite.

No comments:

Post a Comment