Optimizing compilers reorder iterations of statements to expose parallelism and to improve instruction scheduling, register use and cache utilization. Many different reordering transformations have been developed and studied, such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering [CK92][Wol90][Wol89][Pol88][AK87][PW86]. Each transformation has its own legality checks and transformation rules, which makes it difficult to reason about or predict the legality and effects of compositions of transformations.

Unimodular loop transformations [Ban93][WL91][Ban90] partially solve this problem. Unimodular loop transformations are a unified framework able to describe any transformation that can be obtained by composing loop interchange, loop skewing and loop reversal. Such a transformation is described by a unimodular linear mapping from the original iteration space to a new iteration space. For example, loop interchange in a doubly nested loop maps iteration to iteration .

Unfortunately, unimodular transformations are limited in two ways: they can only be applied to perfectly nested loops, and all statements in the loop nest are transformed in the same way. They can therefore not represent some important transformations such as loop fusion, loop distribution and statement reordering.

The points in the iteration space resulting from a unimodular transformation
will be executed in lexicographic order.
Thus a unimodular transformation implicitly specifies a new order or
*schedule* for the points in the original iteration space.
We use this idea of a schedule as the basis for our unified reordering
transformation framework.
We extend [KP93b][Pug91] the idea of unimodular transformations by allowing
each statement to have its own schedule and by allowing the
schedule to be an arbitrary 1-1 affine function with symbolic
constant terms; we also
allow some non-affine schedules that specify blocking or interleaving.
By generalizing in these ways, we can represent
a much broader set of reordering transformations, including
transformations that can be obtained by combinations of:

1.75in

- loop interchange
- loop reversal
- loop skewing
- statement reordering

- loop interleaving [ST92]
- index set splitting [CK92][Ban79]
- loop coalescing [Pol88]
- loop scaling [LP92]

For example, the KIJ loop permutation of Gaussian Elimination (without pivoting) is shown in Figure 2. While this code can be transformed into the KJI, JIK, JKI and IKJ loop permutations reasonably easily, Michael Wolfe notes that generating the IJK permutation requires imperfect triangular loop interchange, loop distribution and index set splitting [Wol91]. Within our transformation framework, this loop permutation is no more difficult to generate than any other (which is not to say it is more desirable than any other). The code we generate is shown in Figure 3.

Although our extension greatly increases the expressiveness of our framework, it also substantially complicates many of the issues involved. We need to use more precise dependence abstractions than dependence distance/direction vectors. It is also more difficult to check the legality of schedules, determine schedules that optimize some criteria and generate code corresponding to the transformed schedule. We describe these problems and solutions to many of them [KP93b].

[tb] 6.75in [b]3.25in