next up previous
Next: About this document ...

Quicksort sorts an array by first choosing a ``pivot'' element x, and then ``partitioning'' the other elements into those less than x (which end up toward the left of the array) and those greater than x (which end up toward the right), and then x goes in between the two groups. Then we make two recursive calls, on the left subarray and on the right subarray. Once the pivot has been put in between the left and right sections, that is its final correct place. (Of course, each recursive call creates a new pivot for that left or right part.)


Quicksort ($A, l, r$)
1     if $l \geq r$ then return
2     $p \leftarrow$ Partition ($A, l, r$)
3     Quicksort($A, l, p - 1$)
4     Quicksort($A, p + 1, r$)

Here is how Partition works: We partition on the first element. Take it out. Look for the rightmost element that is smaller and place it in the first position (which is the newly opened position on the left side). Look for the leftmost element that is larger and place it in the newly opened position on the right side. Starting from there look for the rightmost element that is smaller and place it in the the newly opened position on the left side. Starting from there look for the left mostelement that is larger and place it in the newly opened position on the right side. Continue in this fashion until the pointers cross. Finally, put the partition element into the hole, which is its final position in the sorted array.

There are many versions of Partition; the one described above (and coded below) is especially nice in that it leads to a relatively easy mathematical analysis for the average-case number of comparisons (using probabilities, calculus, summations, and strong constructive induction!)


Partition (A, start, end)
 x := A[start]
 left := start
 right := end + 1
 while left < right
    do right := right - 1 until {left >= right or A[right] <= x}
    if left < right then do A[left] := A[right] else end_while
    do left := left + 1 until {left >= right or A[left] >= x}
    if left < right then do A[right] := A[left]
 A[left] := x
 return left


The worst-case runtime for quicksort is not good: $O(n^2)$. And this occurs, surprisingly, when the data is already sorted! For then the pivot is the smallest value and Partition does little good except to split off that one value before the next (recursive) call. Thus Partition has to run $n$ times, and each time it makes $i-1$ comparisons (where $i$ runs from $n$ down to 1). The average case more than makes up for this defect, however, as we shall see below.

Further, there is a technique that can help avoid the worst case, namely randomizing the array before each call to Partition. In fact, all we really need is to choose a pivot value that is likely to be near the median; and for this we could simply exchange A[1] with a randomly-chosen array value. An even better approach is to choose (say) three array values at random and use the second largest (ie the mid-sized one).

Such modified versions of algorithms are called randomized algorithms.


Consider quicksort applied to $n$ values. Let $T(n)$ be the average number of comparisons the algorithm uses. We know $T(0) = T(1) = 0$. The partition operation is done with exactly $n-1$ comparisons. Assume that the partition value turns out to have sorted rank (order position) $q$. Then we have its location, but we still have to recursively quicksort the $q-1$ smallest values and the $n-q$ largest values. So, for a fixed $q$, the remaining steps of quicksort take $T(q-1) + T(n-q)$ comparisons on average.

But the value of $q$ is equally likely to be any integer from $1$ to $n$. So, averaging over all possible values of $q$, the expected number of remaining steps of quicksort takes is $\sum_{q=1}^n \frac{1}{n} [T(q-1) +
T(n-q)]$ comparisons.

We claim that $T(n) \le a n \ln n$ for some constant $a$ and $n \ge 1$. Proof by constructive induction.

Base case: $n=1$: $T(1) = 0$ and $a \cdot 1 \cdot \ln 1 = 0$.

Induction step: Assume it holds for all positive integers less than $n$. Then

\begin{eqnarray*}
T(n) & = & \sum_{q=1}^n \frac{1}{n} [T(q-1) + T(n-q)] + n - 1 ...
...
& \le & a n \ln n ~~~~~\mbox{{\bf if} the induction is to hold}
\end{eqnarray*}



Some straightfoward algebra now shows that $a\geq 2$, so $a=2$ is the tightest upper bound we can get with this argument, and

\begin{displaymath}
T(n) \le 2n \ln n ~=~ 2 (\ln 2) n \lg n ~\approx~ 1.39 n \lg n
\end{displaymath}




next up previous
Next: About this document ...
Don Perlis 2003-10-24