next up previous
Next: About this document ...

In this chapter we investigate results on the best possible runtime a sorting algorithm can have, in the worst case. Of course there are always slower and slower algorithms that one can write, and that is not what we are asking about. Rather, we want to know how fast the very best sorting algorithm could be, even when it encounters its toughest data. The best we have seen so far, in terms of the number of comparisons done in the worst case, is Mergesort, which does $n \lg
n - n + 1$ (if $n$ is a power of $2$). Can one do better?

First we need to step back and develop some terminology. Recall that a problem is the kind of thing that an algorithm solves. Sorting is a problem. So, we are asking how much work (eg, how many comparisons) are required to sort $n$ items, simply because of the nature of the problem.

Here is a simpler example: find the maximum element in an $n$-element array. This is a PROBLEM, not an algorithm. And it can be solved in many ways (ie, by many different algorithms). But we can state outright that NO algorithm can solve this problem in less than $n$ steps. The reason is that finding the maximum requires, no matter what - if it is always going to work on any data, at least looking at (accessing) every array value (since any one of them could be the maximum). That means at least $n$ accesses, hence at least $n$ steps. Of course, if the array happens to be sorted, then the max is at the end and can be found in one access (eg by an algorithm that just picks the item at the right); but that will not work for a different data arrangement. In general, an algorithm that does not have a possible execution that looks at all the data, will in some cases miss the maximum since the maximum value might be where the algorithm did not look.

That is, the best possible runtime for ANY max-finding algorithm is, in the worst case, $\Omega(n)$, and in particular the number of data accesses is $\geq n$. And we figured this out WITHOUT looking at any particular algorithm, we did it by thinking about the PROBLEM and what solving it must require.

We then can say that $n$ accesses is the BEST OF THE WORST-CASE number of accesses required by max-finding. That is, $n$ is the fastest (lowest, best) possible of all the worst-case runtimes for all algorithms that solve the max-finding problem. More simply, we say that max-finding has a lower bound of $n$ accesses. Thus any algorithm at all that solves max-finding (properly, and in general), will have a runtime $T(n) =
\Omega(n)$ in the worst case.

Now we will present a complete proof that the lower bound number of adds that must be done to sum n items (if adds are the only operation allowed), is n-1. This is an example of a lower bound worst-case runtime, that is easier than for sorting, but still interesting.

Proof: We assume each item is added only once, for if any A[i] were added twice (or more), the answer (sum of all items) could not be correct in general. For instance if all items except A[3] are 0, then we'd get a sum of 2A[3] instead of A[3]. (Unless, of course, an item is added multiple times but the results are not used, in which case we simply consider those adds that are actually added into the final sum.)

Now, let the n original items be represented as the n leaves a binary tree, where a parent represents the add of its two children. Clearly every item must have a parent since every item must be added (once); and clearly every parent has two children since every add has a LHS and a RHS (but these need not be original items, they can be results of previous adds).

The root represents the LAST add that is done, giving the final sum. The left subtree and the right subtree each represent the sum of all the leaves of that subtree, hence the sum of less than n items, say L items on the left and R on the right, where L+R = n.

                               +
                             /   \
                           +       +
                         /   \   /   \
                       +      c d     +
                     /   \          /   \
                    a     b        e     f
Above, a and b are added and then c is added to that; e and f are added and then is d added to that; then these two results are added at the root.

Here is another tree for the same items added differently:

                               +
                             /   \
                           +       +
                         /   \    / \
                       +      +  e   f
                      / \    / \
                     a   b  c   d

We must show that NO MATTER WHAT arrangement of adds is done (as long as it is correct) then the number of adds is at least n-1 (as it is in the above two examples: there are 5 adds and 6 items).

We proceed by induction. The base case is n=1, and 0=1-1 adds are needed.

IH: For any $k<n$ items, k-1 adds are needed to sum k items.

So for n items, in the tree above, the LHS has $L<n$ items, and the RHS has $R<n$ items, so by the IH there are at least L-1 adds on the left, and at least R-1 on the right. So the total number of adds, T(n), is at least (including an extra 1 for the root add):


\begin{displaymath}T(n) >= L-1 + R-1 + 1 = L + R -1 = n-1\end{displaymath}

This ends the proof.

The same argument, more or less, is this: any binary tree with n leaves must have n-1 parents. And this is proven by induction just as above.

Note: the same proof shows that EXACTLY n-1 adds are used in summing n items (if adds are the only operation). That is, the lower bound on adds is also the exact number of adds. When a lower bound is actually reachable (ie when there is an algorithm that has that good a worst case runtime) we say the bound is tight (and then of course that particular algorithm is as good as possible in the worst case). So the obvious add-as-you-iterate algorithm is as good as any, for solving the summing problem.

Now at last we return to the sorting problem, and ask about a lower-bound for it. Sorting is a bit more complicated that max-finding, and so it is a bit more complicated to come up with a line of reasoning that shows a good lower bound. What do we mean by a good lower bound? Well, in the case of max-finding, $n$ not only is a lower bound, it is the largest lower bound since in fact one really can find the maximum value in $n$ steps! (How?) More than $n$ are not needed (even though some algorithms will take more than that). Thus $n$ is the best lower bound, the one that tells us the most, that gets us closest to what an algorithm really has to do. We say it is a tight lower bound.

But $n$ is also a lower bound for sorting: one surely has to at least look at all the data in order to sort it! The question then is whether there is a larger lower bound, or is $n$ really enough steps to sort in general? Asking the question this way prompts us to distinguish two cases as to what sorting-in-general may mean:

  1. comparison sorting, ie, sorting that treats data only based on the relative sizes of the items. This means that two different arrays $<a_1,\ldots,a_n>$ and $<b_1,\ldots,b_n>$ will be moved about in exactly the same way as long as $a_i < a_{i+1} \iff b_i < b_{i+1}$ and so on. Thus as long as two arrays have the same internal pattern of relative sizes, then they will be subjected to the very same processing.

    Put differently, comparison-sorting never needs to know anything about the actual data values themselves, all it needs are answers to boolean queries as to which of two values is larger. All the sorting algorithms we have seen so far are examples of comparison sorting; the only facts about the data that are used in Insertion Sort, Mergesort, Quicksort, and Heapsort, are facts as to whether one item is larger than another.

  2. other methods - these we have not yet seen. It may seem surprising that other methods exist - after all, how can one sort if one does not see which values are larger and which are smaller? We will return to this a little later.

For now we restate our question about lower bounds: what is the tightest (best, largest) lower bound on comparison sorting? Can it be done (in the worst case) in $n$ steps (comparisons)? We have not seen an algorithm that can do that. And for good reason: there is none.

Lower-Bound Theorem for Comparison Sorting: Any comparison sorting algorithm must in the worst case, do at least $n \lg n$ comparisons on an $n$-item array.

Proof: Given $n$ items, there are $n!$ possible rearrangements, and any one of them might be the proper ordering. So any algorithm that solves the sorting problem must be able to produce every one of these $n!$ outcomes. [For example, consider an algorithm that cannot arrange (a b c) into each of the six permutations; suppose for instance that it cannot produce (b c a) from (a b c). Then the data (3 1 2) will not be correctly sorted by that algorithm since it has to change into (1 2 3) which is a permution it cannot manage.]

Now at certain moments during the execution of a comparison-based algorithm that sorts $n$ items, it will evaluate a boolean expression (ie, a comparison), and what it does after that depends on whether the boolean is evaluated as T or as F. We can represent this process as a ``decision tree''; the following is an example for $n=3$:

                                 *
                               /   \
                              *     *
                             / \   / \
                            *   - *   -
                           / \   / \
                          -   - -   -
Here each* stands for a comparison, and its two children represent T and F, ie whatever computation occurs after the boolean is evaluated, until the next comparison (*) below that. And a - stands for termination of execution, ie the sorting is complete.

For instance, the above tree shows all possible execution sequences of comparisons for Insertion Sort when $n=3$; eg, the leftmost path from root to leaf corresponds to each boolean being true (ie the ``new'' card is always less than the previous ones (this is the worst case), so it takes three booleans (comparisons) to get them in order. And the rightmost path corresponds to each boolean being false (the new card is always larger than the previous ones (the best case) and only two comparisons are used. Depending on how the data is originally arranged, one or another of the six paths from root to a leaf will be taken. And note that $3!
= 6$, which is no accident: six leaves are needed to represent all possible execution sequences.

So the general situation is this: Any algorithm that comparison-sorts $n$ items must have at least $L \geq n!$ leaves in its decision tree. But this is a binary tree, and the maximum number of leaves in a binary tree of height $h$ is $2^h$. (Why?) So we have $2^h \geq L \geq n!$ and thus $2^h \geq n!$. Taking logs of both sides we get


\begin{displaymath}h \geq \lg n! = \sum_{i=2}^n \lg i \geq \int_1^n \lg x\end{displaymath}


\begin{displaymath}= \lg e [x \ln x - x ] \vert _1^n = \lg e [n \ln n - n + 1]\end{displaymath}


\begin{displaymath}= n \lg n - \lg e (n - 1)\end{displaymath}

So $ h = \Omega(n \lg n)$. But what does $h$ signify? It is the number of edges from the root to the deepest leaves, namely the maximum possible number of comparisons along an execution sequence. That is, $h$ is the worst-case number of comparisons for a comparison-sort of $n$ items, and we have just shown that this must be at least $n \lg n$ in the dominant term.

This compares very nicely with the runtime for Mergesort, which is exactly $n \lg
n - n + 1$, identical in the dominant term, and only slightly worse in the other terms ($-n + 1$ compared to $\lg e [-n +
1]$). That is, Mergesort does only a tiny bit more comparisons than the Lower-Bound Theorem says are needed. And so $n \lg n$ is a tight lower bound (dominant term) for comparison sorting.


What are some other ways of ``measuring'' a sorting algorithm, aside from the number of comparisons it does? Here are two common measures: (i) space, and (ii) locality. Of course one always needs the space of the input array, simply to hold the data. But many of the sorting algorithms we have seen require little or no additional space beyond the data array. For instance, Insertion Sort simply moves items around inside that same array; so do Quicksort (except for storing the pivot in one extra location) and Heapsort. But Mergesort, when it performs the Merge subroutine, requires a separate array to hold the two merged subarrays, and this effectively doubles the amount of space (memory) needed. We say that an algorithm that needs no memory beyond that for the input data (or at most only a constant amount of additional memory, independent of the amount of data) runs in place. So all but Mergesort, of the algorithms we have seen, run in place; and Mergesort is not a horrible offender, in that it needs only twice the space, not (say) the square of the space.

Locality is best understood in terms of huge datesets. Suppose $n$ is so large that only a small section of the array can be held in main memory at one time. Then an algorithm that tends to work on items that are close to one another in the array can do a lot of work before it has to swap out that section of the array and bring in a new section. But an algorithm that tends to work on far-apart items at the same time will need to swap whole sections in and out frequently, and this will slow it down (it is called thrashing) since access to peripheral memeory (eg, magnetic tapes in the extreme case) is very slow compared to main memory (so-called random access). Of the algorithms we have seen, all but one have good locality: they look mostly at data items that are side-by-side in the array. But Heapsort, in comparing parent and children, looks at items that may be very far apart (the left child of item $A[i]$ is item $A[2i]$, which may be halfway across the array).

Since computer main memory is regularly getting cheaper and packed more tightly on a single chip, and thus larger, then these two issues are of less importance than before.

Below is a table summarizing many known results about comparison sort algorithms.

  Worst Case Average Case In place Locality
Insertion Sort $\frac{n^2}{2}$ $\frac{n^2}{4}$ Yes Yes
Merge Sort$^*$ $n \lg n$ $n \lg n$ No$^{**}$ Yes
Heap Sort $2n \lg n$ $2n \lg n ^{***}$ Yes No
Quick Sort $\frac{n^2}{2}$ $\sim 1.39 n \lg n$ Yes Yes


$^{*}$For a Merge procedure that always does exactly $n-1$ comparisons
on each call with $n$ items.

$^{**}$Can be done in place, but not practical.

$^{***}$Hard to prove.



The above expressions for runtime give the highest-order terms only, but with the correct coefficients.

This does not take account of other time-consuming aspects of the algorithms, especially data movement. But as a general rule in comparison-based algorithms, there are at most no more data movements than there are comparisons (since data is generally moved only in response to a comparison).

So (assuming that an assignment statement takes no more time than a comparison) we can multiply the above runtimes by two to get a rough upper estimate that comes closer to actual behavior in those cases that do in fact perform many moves. It happens that on average, Quicksort performs only about 1/3 as many moves as comparisons, while the others perform proportionately more.


We now turn to the topic of non-comparison sorting: sorting without comparing data items to one another. Here is a simple example: we are given 8 data items, and we are told in advance that all of them will be integers between 1 and 10.

We of course could run any of our comparison-sorting algorithms on this data, with - at best - a worst-case runtime of about $8 \lg 8 =
24$ comparisons (actually for Mergesort it will be $8\cdot 3 - 8 + 1 =
17$). But here is another method, that takes advantage of the fact that we know a lot about the data (its range - 1 to 10 - and the fact that the values are whole numbers).

Create an empty array $C[1..10]$; this will be where we keep track of which values are in the input array $A[1..8]$. Now for each data item at a location $i$, namely $A[i]$, simply put a mark (eg a 1) in that value-spot in $C$. That is, if (say) $A[5] = 2$ then in $C[2]$ we put a 1 (or an x, or anything else to indicate that we saw a $2$. The fact that the $2$ occurred in location $i$ is not important here, all we are noting is the value and recording it in place in $C$ we have reserved for twos (ie, $C[2]$). ThUs $C$ has a special place for all values from $1$ to $10$; not all of them will be used, of course, since we have only $8$ values to work with.

In more detail: suppose the data array is $[3,5,7,2,10,4,8,9]$ Then $C$ will look like this, after we have looked at $A[1]=3$:

     -  -  x  -  -  -  -  -  -  -
     1  2  3  4  5  6  7  8  9 10
And after we have passed all the way through $A$, $C$ looks like this:
     -  x  x  x  x  -  x  x  x  x
     1  2  3  4  5  6  7  8  9 10
Now we are almost done: $C$ tells us exactly which values are in $A$ and they are in order! We just read them out from $C$, ie we read out where the x's are, from left to right: 2 3 4 5 7 8 9 10

Notice that the data values are (among) the index values $i$ for $C$, and the value $C[i]$ stored in $C$ just tells us whether that index $i$ is one of the data values (in $A$).

How much work did this take? Well, there are the $8$ access to $A$ and then the $10$ accesses to $C$, for a total of $18$ accesses (and no comparisons!). But Mergesort would only have done $17$ comparisons, so it is not clear that this is progress.

Let us see how many accesses would be needed in general, for $n$ items and a value-range of $1$ to $k$. We'd need to run through the $n$ data items (marking their occurrence in $C$ as we do so) and then run through $C$ ($k$ steps) to read out the sorted values. So it seems that there is a total of $n+k$ basic steps here. To compare this to Mergesort on $n$ items, with its $n \lg
n - n + 1$ comparisons, we need to specify $k$ in terms of $n$. Let us suppose that the range of values is at most a constant times the number of values: $k\leq cn = O(n)$. Then our new kind of sorting seems to take up to $n + cn =
O(n)$ steps, much better than $n \lg n$ for large $n$.

There are two subtleties that we have ignored. The first is this: if the data has repeated values, then writing x's in $C$ will not show that a value has occurred more than once. This is easy to remedy. Instead of writing x in a $C$ location, add 1 (increment it). Then when reading the values out at the end, as long as we are looking at a positive $C$ value, $C[i]$, write out the index $i$, decrement $C[i]$, and repeat on that same index. Details are left for homework. This will slightly increase the runtime, since now as we iterate through $C$ at the end to read out the data, we might have to stay at one index for awhile; but this can add at most $n$ more steps. So the total is still only $n + k + n = 2n + k = O(n)$ if $k=O(n)$.

We will call this algorithm (modified to allow for repeated values) Basic Counting Sort: it counts how many times each possible range value actually occurs in $A$, by incrementing $C$ in the right place, for each $A$ values it finds. So $C[i]$ holds the ``count'', namely how many times the value $i$ occurs in $A$.

The second subtlety has to do with uses that sorting is often out to. Very commonly when we sort data, we have a larger setting in mind. For example, we may have pairs $<name,salary>$ in a huge list, and the company boss may want to have a printout with all salaries listed from lowest to highest. That is easy enough to do, by sorting on the second field (salary). But then the boss sees a particular salary that interests her and she asks you (the programmer who wrote and ran the sorting code) whose salary it is. You stammer and turn red: your code (an implementation of the above algorithm) does not keep the other data field (name) with the sorted salary results, so you have to run through the original array $A$ again, looking for that salary.

The trouble is not just that you did not think of this beforehand; after all, the same might have happened had you used Mergesort or any other sorting algorithm. But all these other algorithms actually move data around when the sorting is done, whereas this new one (Basic Counting Sort) does not move (or copy) data at all! So when the final run through $C$ is done, all we are doing is noting how many ones there were, how many twos, and so on, but the actual ones and twos (in $A$, if there were any) are still in $A$ and they were not even copied to $C$. So any extra so-called ``satellite'' data (other fields) that may be stored along with the sort-field in $A$, is still in $A$ and inaccessible from $C$.

A clever way to remedy this is as follows, in what we will call (Regular or Full) Counting Sort (more sophisticated than basic counting sort): we go through $A$ item by item, incrementing $C$ as before; then we do an odd thing: we add each count stored in $C$, starting at $C[1]$, to the next one: $C[i+1] <-
C[i+1] + C[i]$. When that is done, the number in each $C[i]$ no longer counts how many times $i$ appears in $A$; now it counts how many values $\leq i$ appear in $A$. So in particular, $C[A[n]]$ is the number of data values $\leq A[n]$; this means that the data value $A[n]$ itself must come (when in proper sorted order) after $C[A[n]]-1$ items less than or equal to it. That is, we know that the $C[A[n]]$th item in sorted order is $A[n]$. So we can put $A[n]$ into $C[A[n]]$th position right away; but where, ie in what array? We need a new array, $B$, for the sorted output. So we copy $A[n]$ - along with any satellite data - into $B[C[A[n]]]$. Then we decrement $C[A[n]]$ and keep going on the remaining items: $A[n-1]$ etc. Here is the pseudocode; arrays $B$ and $C$ are assumed initialized to zero.

CountingSort(A[1..n],B[1..n],C[1..k])
1  for i <- 1 to n
2      do C[A[i]] <- C[A[i]] + 1
3  for j <- 2 to k
4      do C[j] <- C[j] + C[j-1]
5  for i <- n downto 1
6      do B[C[A[i]]] <- A[i]
7         C[A[i]] <- C[A[i]] - 1

Consider the data array, $A=[2,9,5,7,4,9,4,3]$, where $n=8$ and $k=10$. Then after lines 1 and 2 are done, $C=[0,1,1,2,1,0,1,0,2,0]$. After lines 3 and 4 $C=[0,1,2,4,5,5,6,6,8,8]$. So there are $C[A[8]]=C[3]=2$ items less than or equal to $A[8]=3$, and line 6 will start by putting $3$ in position $2$ in array $B=[0,3,0,0,0,0,0,0]$. After lines 6 and 7 are done, $B=[2,3,4,4,5,7,9,9]$, the sorted output.

How fast is Counting Sort? It is much like Basic Counting Sort: it accesses all $n$ items in $A$, then it runs through all $k$ items in $C$, then it runs through all $n$ items in $A$ again. This is $O(n+k+n) = O(n)$ if $k=O(n)$.

How general is Counting Sort? Can it replace comparison sorts? No, not when we do not know the range of data values in advance (both how small and how large they get, and also that they are whole numbers). Can we inspect the data to find the range, and THEN run Counting Sort? Yes, but this takes extra time. We'd have to find the min and the max (this is fast, linear-time) and then we'd have to find a common denominator $D$ for all the values (this is slow), so that we can multiply them all by $D$ to get whole number values. Only then could we run Counting Sort, and at the end divide all by $D$ again.

However, in practice there are many situations in which it is known that the data are integers and how large and small they can get.




next up previous
Next: About this document ...
Don Perlis 2003-10-29