Optimizing compilers reorder iterations of statements to expose parallelism and to improve instruction scheduling, register use and cache utilization. Many different reordering transformations have been developed and studied, such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering [CK92][Wol90][Wol89][Pol88][AK87][PW86]. Each transformation has its own legality checks and transformation rules, which makes it difficult to reason about or predict the legality and effects of compositions of transformations.
Unimodular loop transformations [Ban93][WL91][Ban90] partially solve this problem. Unimodular loop transformations are a unified framework able to describe any transformation that can be obtained by composing loop interchange, loop skewing and loop reversal. Such a transformation is described by a unimodular linear mapping from the original iteration space to a new iteration space. For example, loop interchange in a doubly nested loop maps iteration to iteration .
Unfortunately, unimodular transformations are limited in two ways: they can only be applied to perfectly nested loops, and all statements in the loop nest are transformed in the same way. They can therefore not represent some important transformations such as loop fusion, loop distribution and statement reordering.
The points in the iteration space resulting from a unimodular transformation will be executed in lexicographic order. Thus a unimodular transformation implicitly specifies a new order or schedule for the points in the original iteration space. We use this idea of a schedule as the basis for our unified reordering transformation framework. We extend [KP93b][Pug91] the idea of unimodular transformations by allowing each statement to have its own schedule and by allowing the schedule to be an arbitrary 1-1 affine function with symbolic constant terms; we also allow some non-affine schedules that specify blocking or interleaving. By generalizing in these ways, we can represent a much broader set of reordering transformations, including transformations that can be obtained by combinations of:
For example, the KIJ loop permutation of Gaussian Elimination (without pivoting) is shown in Figure 2. While this code can be transformed into the KJI, JIK, JKI and IKJ loop permutations reasonably easily, Michael Wolfe notes that generating the IJK permutation requires imperfect triangular loop interchange, loop distribution and index set splitting [Wol91]. Within our transformation framework, this loop permutation is no more difficult to generate than any other (which is not to say it is more desirable than any other). The code we generate is shown in Figure 3.
Although our extension greatly increases the expressiveness of our framework, it also substantially complicates many of the issues involved. We need to use more precise dependence abstractions than dependence distance/direction vectors. It is also more difficult to check the legality of schedules, determine schedules that optimize some criteria and generate code corresponding to the transformed schedule. We describe these problems and solutions to many of them [KP93b].
[tb] 6.75in [b]3.25in