Sunday, May 31, 2009

A (very slightly) shorter version of Heapsort

I regard Heapsort as the most beautiful of the O(n.logn) sorting routines.
It also has very low space requirements. But in practice it isn't a very
good sorting routine if the array it is sorting doesn't fit in the cache
(on an Intel machine, it does very well indeed if the array fits in the L1 cache,
and still does pretty well if the array fits in the L2 cache, but for larger
arrays it is a dog).

The routine presented here isn't noticeably faster than a standard
binary Heapsort (it does roughly N fewer record moves), but it's (very slightly) shorter. Standard Heapsort treats the first element of the array as a special
case. This version does not. Instead of the code inside the loop of selection phase of the sort having
the pseudo-code:

  • Exchange the element that is to be removed with the apex of the heap

  • Reestablish the heap property for the remaining elements

it has the pseudo-code:

  • Pretend that the element to be removed is the apex of the heap

  • Reestablish the heap property



template<class T> int heapSortTopless(T *a, int count)
{
  int i; 
  int j; 
  T v;
  for (int h=count/2; h>=0; h--)  
  {
    v=a[i=h];
    j = i + i + 2;  //standard heapsort has +1 instead of + 2
    while (j<count)
    {
      if (j<count-1) 
        j+= (a[j]<a[j+1]) ? 1 : 0;
      if (a[j]<=v) 
        break;
      a[i]=a[j];
      i = j;
      j = i + i + 2;  //standard heapsort has +1 instead of +2
    }
    a[i]=v;
  }
  for (--count;count>=0;count--)
  {
    v=a[count];  //standard heapsort must also set a[count]=a[0]  
    i = count;  //standard heaposrt sets i=0
    j = 0;      //standard heapsort sets j=1

    while (j<count)
    {
      if (j<count-1) 
        j+= (a[j]<a[j+1]) ? 1 : 0;
      if (a[j]<=v) 
        break;
      a[i]=a[j];
      i = j;
      j = i + i + 2;  //standard heapsort has +1 instead of +2
    }
    a[i] = v;
  }
}

When I first came up with this change, I thought it would have a cache advantage over standard Heapsort for some array types (my reasoning was that, for record size r, if the cache line size is a multiple of 2*r, and the array starts on a cache line boundary, slightly fewer cache lines would be accessed on average), but in practice this doesn't seem to make much of a difference, even if the cache line size is exactly 2*r.

In practice, however, Heapsort is usually my second choice of sorting routine (Combsort is my first choice, because it is easier to code and easier to get right), and even when I use Heapsort I tend to use the standard binary Heapsort. Although ternary (and higher radix) heapsorts work better in practice, I don't want to waste the time of maintenance programmers who mightn't be familiar (let alone comfortable) with them.

Saturday, May 30, 2009

An in-place Quicksort routine

Quicksort routines for sorting arrays are not usually in-place. Although the partitioning subroutine usually is in-place, the partitions themselves are sorted via recursion. Even if only the smaller partition is sorted recursively, and we use an explicit stack rather than recursive calls, the maximum space overhead for the stack is 2*log2(n) pointers or indexes.

The problem boils down to this: when we have partitioned a sub-array into two partitions, of different sizes, while we are sorting one of those partitions (typically the larger partition) we need to "remember" how big the other partition was, and where it began.

If we knew in advance where the partition boundaries were going to be, that wouldn't be a problem. There are in-place routines that can find the kth element, for some k chosen in advance, of the current subarray (since then the sizes of the two partitions will be known in advance). Supposing that we have such a routine, partitionExactly say, it is easy to formulate an in-place Quicksort. It will look like this...

template <class T> int quickSortInPlace(T *base, int count)
{
  int step=1;
  while (step*2<count) step=step*2;
  while (step>=1)  
  {
    for (int blockStart=0;blockStart<count-step;blockStart+=2*step)
    {
      int blockFinish = blockStart + 2*step - 1;
      int desiredpartition = blockStart + step;
      if (count<=blockFinish) blockFinish=count-1;
      partitionExactly(base, blockStart, blockFinish, desiredpartition);
    }
    step = (step >> 1); 
  }
  return 0;
}

The most obvious way to find the exact median, that fits in with the "flavour" of QuickSort, is to the quickSelect algorithm (see http://en.wikipedia.org/wiki/Selection_algorithm). But in practice, that sucks. The problem is that quickSelect is geared to work on arrays that are in random order, and as the sort proceeds, parts of the array become more and more ordered. Indeed, when sorting the subarray A[i...j], A[ (i+j)/2] eventually becomes a good "guess" for the median, and should be chosen as the pivot. But moving the pivot "out of the way" becomes a very bad idea indeed (since we'll just have to move it back later, and moving it out of the way results in us having to shuffle around other elements too) . It should be left where it is until we know it has to be moved.

template <class T> inline int partitionFromLeft(T *arr, int lowerIndex, int upperIndex)
{
  int lhs=lowerIndex;
  int rhs=upperIndex+1;
  T pivot=arr[lowerIndex];
  T tmp;
  for (;;) 
  {
    do { lhs++; } 
      while (lhs<=rhs && arr[lhs]<pivot);
    do { rhs--;  } 
      while (pivot<arr[rhs]);
    if (rhs<=lhs) break;
    { tmp=arr[lhs]; arr[lhs]=arr[rhs]; arr[rhs]=tmp; }
  }
  if (rhs!=lowerIndex)
  {
    { tmp=arr[lowerIndex]; arr[lowerIndex]= arr[rhs];
    arr[rhs]=tmp; }
  }
  return rhs;  
}


template <class T> inline int partitionFromRight(T *arr, int lowerIndex, int upperIndex)
{
  int lhs=lowerIndex-1;
  int rhs=upperIndex;
  T pivot=arr[upperIndex];
  T tmp;
  for (;;) 
  {
    do { rhs--;  } 
      while (lhs<=rhs && pivot<arr[rhs]);
    do { lhs++;  } 
      while (arr[lhs]<pivot);
    if (rhs<=lhs) break;
    tmp = arr[lhs]; arr[lhs] = arr[rhs]; arr[rhs]=tmp;
  }
  if (lhs!=upperIndex)
  {
    tmp =arr[lhs]; arr[lhs] = arr[upperIndex]; arr[upperIndex]=tmp;
  }
  return lhs;  
}

template <class T> inline int partitionFromCentre(T *arr, int lowerIndex, int upperIndex, int centre)
{
  int lhs=lowerIndex-1;
  int rhs=upperIndex+1;
  T pivot=arr[centre];
  T tmp;
  for (;;) 
  {
    do { lhs++; } 
      while (lhs<centre && arr[lhs]<pivot);  
    do { rhs--; } 
      while (centre<rhs && pivot<arr[rhs]);  
    if (lhs==centre || rhs==centre) break;
    tmp = arr[lhs]; arr[lhs] = arr[rhs]; arr[rhs]=tmp;
  }
  if (lhs!=rhs)
  {
    if (lhs==centre) 
      return partitionFromLeft(arr, centre, rhs);
    else
      return partitionFromRight(arr, lhs, centre);
  }
  return rhs;  
}

template <class T> void partitionExactly(T *arr, int lowerIndex, int upperIndex, 
                                               int desiredPartitionIndex)
{
  int actualPartitionIndex ;
  do
  {
    actualPartitionIndex = partitionFromCentre(arr, lowerIndex, upperIndex, desiredPartitionIndex);
    if (actualPartitionIndex <desiredPartitionIndex)
      lowerIndex=actualPartitionIndex+1;
    else if (desiredPartitionIndex<actualPartitionIndex )
      upperIndex=actualPartitionIndex-1;
  }
  while (actualPartitionIndex !=desiredPartitionIndex);  
}

After all that, the routine is considerably slower than standard Quicksort. It does about the same amount of record movement, but has to do rather more comparisons than Quicksort (asymptotically, something close to twice as many). Like standard Quicksort, it has a quadratic worst case.

Why I'm writing a blog

I'm a 39-year-old C++ computer programmer who likes to play chess (badly). My true love is my wife, but my obsession is sorting algorithms and implementations. Their inner workings fascinate me: it amazes me that there are so many different ways to "skin the cat". I collect sorting routines, compare them, test them, tweak them, and I like to write my own. I have collected literally hundreds of sorting routines, over the last ten years or so, and have written dozens more of my own.

I've decided to do a blog about the theory and practice of writing sorting routines (which will mostly focus on C++ implementations, because C++ is the language I know best, and because C++ gives me "generality" - via the use of features like templates - but lets me get "closer" to the hardware than do byte-code languages like C# and Java). And I want my own blog where "I'm in charge". If it's my blog I will get the last word. Sometimes the last word will have to be an admission I've been wrong all along, but I'm okay with that.

You'll find that my opinions about the relative performance of sorting routines will tend to be boringly conventional (Quicksort beats Mergesort beats Heapsort, duh!) but my opinions about the reasons will sometimes be a bit more controversial (e.g. Quicksort comfortably beats Mergesort if the order of merge is 3 or less, but because it requires less record movement, not because Quicksort makes more efficient use of the cache).

I'll include snippets of working C++ code in my posts, and I'll cover sorting algorithms and data structures that I don't like (for example, I think lists and trees both suck for sorting, but I'll still cover them). When I think an algorithm is a dog with fleas it's always because I couldn't find a way to tune it up. But, even if I start out thinking the algorithm is no good, I'll try to tune it. There are way too many comparison articles that say "X is faster than Y" and then "illustrate the point" by contrasting a tuned-up version of X with a tuned-down version of Y. I'll try to avoid doing that!

And, no, I have no idea what order I'm going to present things in! To get a coherent version of my blog you will have to... sort my posts. That will be "left as an exercise for the reader."