First we need to step back and develop some terminology. Recall that
a problem is the
kind of thing that an algorithm solves. Sorting is a problem. So, we
are asking how much work (eg, how many comparisons) are required to sort
items, simply because of the nature of the
problem.
Here is a simpler example: find the maximum element in an
-element array. This
is a PROBLEM, not an algorithm. And it can be solved in many ways
(ie, by many different algorithms). But we can state outright that NO
algorithm can solve this problem in less than
steps. The reason is
that finding the maximum requires, no matter what - if it is always
going to work on any data, at least looking at (accessing) every array value
(since any one of them could be the maximum). That means at least
accesses, hence at least
steps. Of course, if the array happens
to be sorted, then the max is at the end and can be found in one
access (eg by an algorithm that just picks the item at the right); but that will
not work for a different data arrangement. In general, an algorithm
that does not have a possible execution that looks at all the
data, will in some cases miss the maximum since the maximum value
might be where the algorithm did not look.
That is, the best possible runtime for ANY max-finding algorithm is,
in the worst case,
, and in particular the number of data
accesses is
. And we figured this out WITHOUT looking at any
particular algorithm, we did it by thinking about the PROBLEM and what
solving it must require.
We then can say that
accesses is the BEST OF THE WORST-CASE number of accesses
required by max-finding. That is,
is the fastest (lowest, best) possible
of all the worst-case runtimes for all algorithms that solve the
max-finding problem. More simply, we say that max-finding has a lower
bound of
accesses. Thus any algorithm at all that solves
max-finding (properly, and in general), will have a runtime
in the worst case.
Now we will present a complete proof that the lower bound number of adds that must be done to sum n items (if adds are the only operation allowed), is n-1. This is an example of a lower bound worst-case runtime, that is easier than for sorting, but still interesting.
Proof: We assume each item is added only once, for if any A[i] were added twice (or more), the answer (sum of all items) could not be correct in general. For instance if all items except A[3] are 0, then we'd get a sum of 2A[3] instead of A[3]. (Unless, of course, an item is added multiple times but the results are not used, in which case we simply consider those adds that are actually added into the final sum.)
Now, let the n original items be represented as the n leaves a binary tree, where a parent represents the add of its two children. Clearly every item must have a parent since every item must be added (once); and clearly every parent has two children since every add has a LHS and a RHS (but these need not be original items, they can be results of previous adds).
The root represents the LAST add that is done, giving the final sum. The left subtree and the right subtree each represent the sum of all the leaves of that subtree, hence the sum of less than n items, say L items on the left and R on the right, where L+R = n.
+
/ \
+ +
/ \ / \
+ c d +
/ \ / \
a b e f
Above, a and b are added and then c is added to that; e and f are added
and then is d added to that; then these two results are added at the root.
Here is another tree for the same items added differently:
+
/ \
+ +
/ \ / \
+ + e f
/ \ / \
a b c d
We must show that NO MATTER WHAT arrangement of adds is done (as long as it is correct) then the number of adds is at least n-1 (as it is in the above two examples: there are 5 adds and 6 items).
We proceed by induction. The base case is n=1, and 0=1-1 adds are needed.
IH: For any
items, k-1 adds are needed to sum k items.
So for n items, in the tree above, the LHS has
items, and the RHS
has
items, so by the IH there are at least L-1 adds on the left,
and at least R-1 on the right. So the total number of adds, T(n), is at
least (including an extra 1 for the root add):
This ends the proof.
The same argument, more or less, is this: any binary tree with n leaves must have n-1 parents. And this is proven by induction just as above.
Note: the same proof shows that EXACTLY n-1 adds are used in summing n items (if adds are the only operation). That is, the lower bound on adds is also the exact number of adds. When a lower bound is actually reachable (ie when there is an algorithm that has that good a worst case runtime) we say the bound is tight (and then of course that particular algorithm is as good as possible in the worst case). So the obvious add-as-you-iterate algorithm is as good as any, for solving the summing problem.
Now at last we return to the sorting problem, and ask about a lower-bound for
it. Sorting is a bit more complicated that max-finding, and so it is
a bit more complicated to come up with a line of reasoning that shows
a good lower bound. What do we mean by a good lower bound? Well, in
the case of max-finding,
not only is a lower bound, it is the
largest lower bound since in fact one really can find the maximum value in
steps! (How?) More than
are not needed (even though some
algorithms will take more than that). Thus
is the best lower
bound, the one that tells us the most, that gets us closest to what an
algorithm really has to do. We say it is a tight lower bound.
But
is also a lower bound for sorting:
one surely has to at least look at all the data in order to sort
it! The question then is whether there is a larger lower bound, or is
really enough steps to sort in general? Asking the question this
way prompts us to distinguish two cases as to what sorting-in-general
may mean:
Put differently, comparison-sorting never needs to know anything about the actual data values themselves, all it needs are answers to boolean queries as to which of two values is larger. All the sorting algorithms we have seen so far are examples of comparison sorting; the only facts about the data that are used in Insertion Sort, Mergesort, Quicksort, and Heapsort, are facts as to whether one item is larger than another.
For now we restate
our question about lower bounds: what is the tightest (best, largest) lower
bound on comparison sorting? Can it be done (in the worst case) in
steps (comparisons)? We have not seen an algorithm that can do
that. And for good reason: there is none.
Lower-Bound Theorem for Comparison Sorting: Any comparison sorting
algorithm must in the worst case, do at least
comparisons on
an
-item array.
Proof: Given
items, there are
possible rearrangements, and
any one of them might be the proper ordering. So any algorithm that
solves the sorting problem must be able to produce every one of these
outcomes. [For example, consider an algorithm that cannot arrange (a b
c) into each of the six permutations; suppose for instance that it
cannot produce (b c a) from (a b c). Then the data (3 1 2) will not be correctly
sorted by that algorithm since it has to change into (1 2 3) which is
a permution it cannot manage.]
Now at certain moments during the execution of a comparison-based
algorithm that sorts
items, it will evaluate a boolean expression
(ie, a comparison), and what it does after that depends on whether the
boolean is evaluated as T or as F. We can represent this process as a
``decision tree''; the following is an example for
:
*
/ \
* *
/ \ / \
* - * -
/ \ / \
- - - -
Here each* stands for a comparison, and its two children represent T
and F, ie whatever computation occurs after the boolean is evaluated,
until the next comparison (*) below that. And a - stands for
termination of execution, ie the sorting is complete.
For instance, the above tree shows all possible execution sequences of
comparisons for
Insertion Sort when
; eg, the leftmost path from root to leaf
corresponds to each boolean being true (ie the ``new'' card
is always less than the previous ones (this is the worst case), so it takes
three booleans (comparisons) to get them in order. And the rightmost path
corresponds to each boolean being false (the new card is always larger
than the previous ones (the best case) and only two comparisons are
used. Depending on how the data is originally arranged, one or another
of the six paths from root to a leaf will be taken. And note that
, which is no accident: six leaves are needed to represent all
possible execution sequences.
So the general situation is this: Any algorithm that comparison-sorts
items
must have at least
leaves in its decision tree. But this is a
binary tree, and the maximum number of leaves in a binary tree of
height
is
. (Why?) So we have
and thus
. Taking logs of both sides we get
So
. But what does
signify? It is the
number of edges from the root to the deepest leaves, namely the
maximum possible number
of comparisons along an execution sequence. That is,
is the
worst-case number of comparisons for a comparison-sort of
items,
and we have just shown that this must be at least
in the
dominant term.
This compares very nicely with the runtime for Mergesort, which is
exactly
, identical in the dominant term, and only
slightly worse in the other terms (
compared to
). That is, Mergesort does only a tiny bit more comparisons than
the Lower-Bound Theorem says are needed. And so
is a tight
lower bound (dominant term) for comparison sorting.
What are some other ways of ``measuring'' a sorting algorithm, aside from the number of comparisons it does? Here are two common measures: (i) space, and (ii) locality. Of course one always needs the space of the input array, simply to hold the data. But many of the sorting algorithms we have seen require little or no additional space beyond the data array. For instance, Insertion Sort simply moves items around inside that same array; so do Quicksort (except for storing the pivot in one extra location) and Heapsort. But Mergesort, when it performs the Merge subroutine, requires a separate array to hold the two merged subarrays, and this effectively doubles the amount of space (memory) needed. We say that an algorithm that needs no memory beyond that for the input data (or at most only a constant amount of additional memory, independent of the amount of data) runs in place. So all but Mergesort, of the algorithms we have seen, run in place; and Mergesort is not a horrible offender, in that it needs only twice the space, not (say) the square of the space.
Locality is best understood in terms of huge datesets. Suppose
is so
large that only a small section of the array can be held in main memory at
one time. Then an algorithm that tends to work on items that
are close to one another in the array can do a lot of work before it has to
swap out that section of the array and bring in a new section. But an
algorithm that tends to work on far-apart items at the same time will
need to swap whole sections in and out frequently, and this will slow
it down (it is called thrashing) since access to peripheral memeory
(eg, magnetic tapes in the extreme case) is very slow compared to main
memory (so-called random access). Of the algorithms we have seen,
all but one have good locality: they look mostly at data items that are
side-by-side in the array. But Heapsort, in comparing parent and
children, looks at items that may be very far apart (the left child of
item
is item
, which may be halfway across the array).
Since computer main memory is regularly getting cheaper and packed more tightly on a single chip, and thus larger, then these two issues are of less importance than before.
Below is a table summarizing many known results about comparison sort algorithms.
| Worst Case | Average Case | In place | Locality | |
| Insertion Sort | Yes | Yes | ||
| Merge Sort |
No |
Yes | ||
| Heap Sort |
|
Yes | No | |
| Quick Sort |
|
Yes | Yes |
For a Merge procedure that always does exactly
comparisons
on each call with
items.
Can be done in place, but not practical.
Hard to prove.
The above expressions for runtime give the highest-order terms only, but with the correct coefficients.
This does not take account of other time-consuming aspects of the algorithms, especially data movement. But as a general rule in comparison-based algorithms, there are at most no more data movements than there are comparisons (since data is generally moved only in response to a comparison).
So (assuming that an assignment statement takes no more time than a comparison) we can multiply the above runtimes by two to get a rough upper estimate that comes closer to actual behavior in those cases that do in fact perform many moves. It happens that on average, Quicksort performs only about 1/3 as many moves as comparisons, while the others perform proportionately more.
We now turn to the topic of non-comparison sorting: sorting without comparing data items to one another. Here is a simple example: we are given 8 data items, and we are told in advance that all of them will be integers between 1 and 10.
We of course could run any of our comparison-sorting algorithms on
this data, with - at best - a worst-case runtime of about
comparisons (actually for Mergesort it will be
). But here is another method, that takes advantage of the fact
that we know a lot about the data (its range - 1 to 10 - and the
fact that the values are whole numbers).
Create an empty array
; this will be where we keep track of
which values are in the input array
. Now for each data item
at a location
, namely
, simply put a mark (eg a 1) in that
value-spot in
. That is, if (say)
then in
we put
a 1 (or an x, or anything else to indicate that we saw a
. The fact
that the
occurred in location
is not important here, all we
are noting is the value and recording it in place in
we have
reserved for twos (ie,
). ThUs
has a special place for all
values from
to
; not all of them will be used, of course,
since we have only
values to work with.
In more detail: suppose the data array is
Then
will look like this, after we have looked at
:
- - x - - - - - - -
1 2 3 4 5 6 7 8 9 10
And after we have passed all the way through
- x x x x - x x x x
1 2 3 4 5 6 7 8 9 10
Now we are almost done:
Notice that the data values are (among) the index values
for
, and the value
stored in
just tells us whether that
index
is one of the data values (in
).
How much work did this take? Well, there are the
access to
and
then the
accesses to
, for a total of
accesses (and no
comparisons!). But Mergesort would only have done
comparisons,
so it is not clear that this is progress.
Let us see how many accesses would be needed in general, for
items
and a value-range of
to
. We'd need to run through the
data items (marking their occurrence in
as we do so) and then run
through
(
steps) to read out the sorted values. So it seems
that there is a total of
basic steps here. To compare this to
Mergesort on
items, with its
comparisons, we need
to specify
in terms of
. Let us suppose that the range of
values is at most a constant times the number of values:
. Then our new kind of sorting seems to take up to
steps, much better than
for large
.
There are two subtleties that we have ignored. The first is this:
if the data has repeated values, then writing x's in
will not show
that a value has occurred more than once. This is easy to remedy.
Instead of writing x in a
location, add 1 (increment it). Then
when reading the values out at the end, as long as we are looking at a
positive
value,
, write out the index
, decrement
,
and repeat on that same index. Details are left for homework. This
will slightly increase the runtime, since now as we iterate through
at the end to read out the data, we might have to stay at one
index for awhile; but this can add at most
more steps. So the
total is still only
if
.
We will call this algorithm (modified to allow for repeated values)
Basic Counting Sort: it counts how many times each possible
range value actually occurs in
, by incrementing
in the right
place, for each
values it finds. So
holds the ``count'',
namely how many times the value
occurs in
.
The second subtlety has to do with uses that sorting is often out to.
Very commonly when we sort data, we have a larger setting in mind.
For example, we may have pairs
in a huge list, and the
company boss may want to have a printout with all salaries listed from
lowest to highest. That is easy enough to do, by sorting on the
second field (salary). But then the boss sees a particular salary
that interests her and she asks you (the programmer who wrote and ran
the sorting code) whose salary it is. You stammer and turn red: your
code (an implementation of the above algorithm) does not keep the
other data field (name) with the sorted salary results, so you have to
run through the original array
again, looking for that salary.
The trouble is not just that you did not think of this beforehand; after
all, the same might have happened had you used Mergesort or any other
sorting algorithm. But all these other algorithms actually move data
around when the sorting is done, whereas this new one (Basic Counting
Sort) does not move (or copy) data at all! So when the final run
through
is done, all we are doing is noting how many ones there
were, how many twos, and so on, but the actual ones and twos (in
,
if there were any) are still in
and they were not even copied to
. So any extra so-called ``satellite'' data (other fields) that may be
stored along with the sort-field in
, is still in
and
inaccessible from
.
A clever way to remedy this is as follows, in what we will call
(Regular or Full) Counting Sort (more sophisticated than basic counting sort):
we go through
item by item,
incrementing
as before; then we do an odd thing: we add each
count stored in
, starting at
, to the next one:
. When that is done, the number in each
no longer
counts how many times
appears in
; now it counts how many
values
appear in
. So in particular,
is the
number of data values
; this means that the data value
itself must come (when in proper sorted order) after
items less than or equal to it. That is, we know that the
th
item in sorted order is
. So we can put
into
th
position right away; but where, ie in what array? We need a new
array,
, for the sorted output. So we copy
- along with
any satellite data - into
. Then we decrement
and keep going on the remaining items:
etc. Here is the
pseudocode; arrays
and
are assumed initialized to zero.
CountingSort(A[1..n],B[1..n],C[1..k]) 1 for i <- 1 to n 2 do C[A[i]] <- C[A[i]] + 1 3 for j <- 2 to k 4 do C[j] <- C[j] + C[j-1] 5 for i <- n downto 1 6 do B[C[A[i]]] <- A[i] 7 C[A[i]] <- C[A[i]] - 1
Consider the data array,
, where
and
. Then after lines 1 and 2 are done,
. After lines 3 and 4
. So there are
items less than or
equal to
, and line 6 will start by putting
in position
in array
. After lines 6 and 7 are done,
, the sorted output.
How fast is Counting Sort? It is much like Basic Counting Sort: it
accesses all
items in
, then it runs through all
items in
, then it runs through all
items in
again. This is
if
.
How general is Counting Sort? Can it replace comparison sorts? No,
not when we do not know the range of data values in advance (both how
small and how large they get, and also that they are whole numbers).
Can we inspect the data to find the range, and THEN run Counting Sort?
Yes, but this takes extra time. We'd have to find the min and the max
(this is fast, linear-time) and then we'd have to find a common
denominator
for all the values (this is slow), so that we
can multiply them all by
to get whole number values. Only then
could we run Counting Sort, and at the end divide all by
again.
However, in practice there are many situations in which it is known that the data are integers and how large and small they can get.