Saturday, May 30, 2009

An in-place Quicksort routine

Quicksort routines for sorting arrays are not usually in-place. Although the partitioning subroutine usually is in-place, the partitions themselves are sorted via recursion. Even if only the smaller partition is sorted recursively, and we use an explicit stack rather than recursive calls, the maximum space overhead for the stack is 2*log2(n) pointers or indexes.

The problem boils down to this: when we have partitioned a sub-array into two partitions, of different sizes, while we are sorting one of those partitions (typically the larger partition) we need to "remember" how big the other partition was, and where it began.

If we knew in advance where the partition boundaries were going to be, that wouldn't be a problem. There are in-place routines that can find the kth element, for some k chosen in advance, of the current subarray (since then the sizes of the two partitions will be known in advance). Supposing that we have such a routine, partitionExactly say, it is easy to formulate an in-place Quicksort. It will look like this...

template <class T> int quickSortInPlace(T *base, int count)
{
  int step=1;
  while (step*2<count) step=step*2;
  while (step>=1)  
  {
    for (int blockStart=0;blockStart<count-step;blockStart+=2*step)
    {
      int blockFinish = blockStart + 2*step - 1;
      int desiredpartition = blockStart + step;
      if (count<=blockFinish) blockFinish=count-1;
      partitionExactly(base, blockStart, blockFinish, desiredpartition);
    }
    step = (step >> 1); 
  }
  return 0;
}

The most obvious way to find the exact median, that fits in with the "flavour" of QuickSort, is to the quickSelect algorithm (see http://en.wikipedia.org/wiki/Selection_algorithm). But in practice, that sucks. The problem is that quickSelect is geared to work on arrays that are in random order, and as the sort proceeds, parts of the array become more and more ordered. Indeed, when sorting the subarray A[i...j], A[ (i+j)/2] eventually becomes a good "guess" for the median, and should be chosen as the pivot. But moving the pivot "out of the way" becomes a very bad idea indeed (since we'll just have to move it back later, and moving it out of the way results in us having to shuffle around other elements too) . It should be left where it is until we know it has to be moved.

template <class T> inline int partitionFromLeft(T *arr, int lowerIndex, int upperIndex)
{
  int lhs=lowerIndex;
  int rhs=upperIndex+1;
  T pivot=arr[lowerIndex];
  T tmp;
  for (;;) 
  {
    do { lhs++; } 
      while (lhs<=rhs && arr[lhs]<pivot);
    do { rhs--;  } 
      while (pivot<arr[rhs]);
    if (rhs<=lhs) break;
    { tmp=arr[lhs]; arr[lhs]=arr[rhs]; arr[rhs]=tmp; }
  }
  if (rhs!=lowerIndex)
  {
    { tmp=arr[lowerIndex]; arr[lowerIndex]= arr[rhs];
    arr[rhs]=tmp; }
  }
  return rhs;  
}


template <class T> inline int partitionFromRight(T *arr, int lowerIndex, int upperIndex)
{
  int lhs=lowerIndex-1;
  int rhs=upperIndex;
  T pivot=arr[upperIndex];
  T tmp;
  for (;;) 
  {
    do { rhs--;  } 
      while (lhs<=rhs && pivot<arr[rhs]);
    do { lhs++;  } 
      while (arr[lhs]<pivot);
    if (rhs<=lhs) break;
    tmp = arr[lhs]; arr[lhs] = arr[rhs]; arr[rhs]=tmp;
  }
  if (lhs!=upperIndex)
  {
    tmp =arr[lhs]; arr[lhs] = arr[upperIndex]; arr[upperIndex]=tmp;
  }
  return lhs;  
}

template <class T> inline int partitionFromCentre(T *arr, int lowerIndex, int upperIndex, int centre)
{
  int lhs=lowerIndex-1;
  int rhs=upperIndex+1;
  T pivot=arr[centre];
  T tmp;
  for (;;) 
  {
    do { lhs++; } 
      while (lhs<centre && arr[lhs]<pivot);  
    do { rhs--; } 
      while (centre<rhs && pivot<arr[rhs]);  
    if (lhs==centre || rhs==centre) break;
    tmp = arr[lhs]; arr[lhs] = arr[rhs]; arr[rhs]=tmp;
  }
  if (lhs!=rhs)
  {
    if (lhs==centre) 
      return partitionFromLeft(arr, centre, rhs);
    else
      return partitionFromRight(arr, lhs, centre);
  }
  return rhs;  
}

template <class T> void partitionExactly(T *arr, int lowerIndex, int upperIndex, 
                                               int desiredPartitionIndex)
{
  int actualPartitionIndex ;
  do
  {
    actualPartitionIndex = partitionFromCentre(arr, lowerIndex, upperIndex, desiredPartitionIndex);
    if (actualPartitionIndex <desiredPartitionIndex)
      lowerIndex=actualPartitionIndex+1;
    else if (desiredPartitionIndex<actualPartitionIndex )
      upperIndex=actualPartitionIndex-1;
  }
  while (actualPartitionIndex !=desiredPartitionIndex);  
}

After all that, the routine is considerably slower than standard Quicksort. It does about the same amount of record movement, but has to do rather more comparisons than Quicksort (asymptotically, something close to twice as many). Like standard Quicksort, it has a quadratic worst case.

No comments:

Post a Comment