Monday, June 8, 2009

An Introsort that uses heapsort more

I got a bit bogged down on the mergesort generator routine (I had a major software release at work last Tuesday night... and I bought two new books on Friday). I've still got some work work to do today, but at least I've read those books. The mergesort generator routine is about done, but I have to run a few checks to ensure that the routines it is generating actually work. In terms of performance, the 3-way forecasting switchback mergesort that it generates pips Quicksort at the post (which I had expected), but the 4-way version is markedly slower (which I had not expected).

Today I'm going to take a look at IntroSort (which is a hybrid between Quicksort and Heapsort, which uses Quicksort most of the time and resorts to Heapsort if Quicksort's depth of recursion gets too high). You might have noticed, in the graph for my last post, that radix 2 heapsort actually beat Quicksort for small N. That suggests a slightly different version of Introsort might be worth trying: with Heapsort also used whenever the size of a partition gets significantly below the size of the data cache. The modified introsort will look something like this...

template <class T> void introSort2(T *arr, int count, int depthAllowed)
{
  while (count>4096 && --depthAllowed > 0)
  {
    int p = partitionFromCentre(arr, 0, count-1, count/2);
    if (p+p<count)
    {
      introSort2(arr, p, depthAllowed);
      count -= p + 1;
      arr   += p + 1 ;
    }
    else
    {
      introSort2(arr+p+1, count-p-1, depthAllowed);
      count = p;
    }
  }
  heapSortTopless(arr,count);
}

template <class T> void introSort(T *arr, int count)
{
  int d=0;
  for (int m=1; m<count; m+=m) d+=2;
  return introSort2(arr, count, d);
}

The magic number 4096 should probably be replaced with a value calculated at compile-time from the type and the size of the data cache. partitionFromCentre is a template function declared in a previous post in this blog, as is heapSortTopless. But does it perform better than a standard Quicksort? For integers, anyway?



Yes, it does, on my PC at least. The improvement is significant for small N, but becomes less important as N increases. It's good to see that a technique normally used only for avoiding Quicksort's quadratic worst case might also be used to improve its running time in the average case (depending on your compiler and your hardware).

No comments:

Post a Comment