Monday, June 15, 2009

Multithreading Quicksort - Teaser

Yesterday I was drafting a post about multi-threading quicksort (as it is easier to efficiently multi-thread Quicksort than Mergesort, but a lot of the principles are the same). But it got out of hand! The code to go with the examples ballooned out: 170 lines for the single-threaded version, 210 for the first two naive multi-threaded versions, that create too many threads, 340 for a count-down latch class, 60 for an example using latches, 480 for a thread-safe stack and a dispatcher running one worker thread per core, 50 for an example using them, 110 for test data generation and a routine to actually run the examples.

I think I'll chop that draft post into three pieces (one for the first two examples, one for the example using a latch, one for the example using the dispatcher). For now, I'll just post a "teaser", talking about examples I haven't provided yet.

The first two examples create threads per sub array and wait on them. The overhead for creating the threads is pretty high, and if small sub-arrays are being divvied up between multiple threads, unworkably so. The third example uses a latch to reduce the amount of waiting, but because it too creates too many threads, it is even slower than the first two (and it can easily run out of threads, too, resulting in the program crashing). The fourth example reuses the latch, and also uses a dispatcher (to reduce the number of worker threads to 4, on my 4-core machine). The fifth example does the same, but uses CriticalSection API calls in place of mutexes.

It's sort of mean to graph the relative performance of routines I haven't posted yet, but I'm going to do it anyway. This graph shows how well the five routines performed, sorting an array of 10 million integers on my Core 2 Quad, relative to a stock-standard single-threaded quicksort. The X axis is the multithreading threshold (the sub-array-size threshold below which additional threads are not created).



A 3.5:1 speed-up using 4 cores isn't ideal (a 4:1 speed-up would be nice!). But a 3.5:1 speed-up is respectable.

No comments:

Post a Comment