Related Work

Table 8: Results on wire-code program compression in related work

	% of Gzip'd
Paper	classfiles
Slim Binaries [KF97,KF,Fra97]	59
JShrink, DashO, and Jax	65 - 83
jar.gz format (§2.1)	55 - 85
Clazz format [HC98]	52 - 90
Jazz format [BHV98]	40 - 70
This paper (on programs > 10K bytes)	17 - 41

Many papers have argued and discussed different aspects of machine independent intermediate forms, such as their suitability for run-time optimization. Given the wide-spread use of Java, my goal was to develop a compact wire-code format for Java, without regard to the merits or problems of the Java virtual machine. A comparison of quoted compression results from related work is provided in Table 8.

Ernst et al. [EEF⁺97] discussed two kinds of code compression: wire-code and directly executable. The technique I have proposed is a wire code, so I will limit myself to comparisons with the wire-code described by Ernst et al. [EEF⁺97]. Ernst et al. consider only code segments of full executables, and thus don't deal with significant amounts of symbolic linkage information nor data such as strings and floating point constants. They use the lcc intermediate form [FH95], which is a tree based format. It has been suggested that a tree based intermediate form is more suited to compression [KF97]. Despite these differences, our basic approach is very similar: break out dissimilar objects into different streams which are compressed separately.

Michael Franz has proposed Slim Binaries [KF97,KF,Fra97] as a mechanism for distribution of compact, machine independent programs. It is based on an encoding of the abstract syntax tree and symbol table of a program. The papers on Slim Binaries do not give details of the encoding, but do give limited experimental results (Table 8).

Lars Raeder Clausen et al. [CSCM98] describe a method of factoring Java classfiles by finding custom macros or opcodes, similar to the techniques described in [EEF⁺97]. They focus on embedded systems with small amounts of memory, and focus on reducing the size of bytescodes for loaded classfiles. Their techniques reduce the size of the bytecodes to about 70% - 80% of their original uncompressed size and require modifications to the JVM.

Normally, an entire class file must be transmitted before a class loader will start to process it, and an entire jar file is transfered before it is used by a class loader. Krintz et al. [KCLZ98] describe methods that determine (based on profile information) which classes and methods are likely to be needed first, transmit the data needed for these classes and methods first, and allow execution to start while the remaining information about classes and methods is still being transferred. Invoking a method which hasn't arrived yet blocks until the method arrives. I have incorporated parts of this idea in my provision for eager class loading (§11). Intermingling different classes could change the effectiveness of compression (since there would likely not be as much locality of reference).

Tools such as Jax [LTS], Dash-O and JShrink perform shrinking and obfuscation by renaming classes, methods and fields to have short, meaningless names and stripping out debugging information. The Jax tool (and perhaps the others) performs transformations such as removing methods that are never called and merging a class into its superclass when it can prove that such a transformation doesn't effect the semantics of the program. These tools typically give reductions of 17% - 32% of the classfile size. Surprisingly, the techniques of Section 2, applied after applying DashO, gave an additional reduction of 2% in the size of the resulting compressed jar file. The apparent reason is the DashO doesn't sort the constant pool, leading to poorer compression. Transformations applied by these tools could be usefully combined with the techniques in this paper to provide greater compression than either technique alone. Tools such as Jax are particularlly using when an application uses a small portion of a library that is not installed on most clients. By extracting just the used portion of the library, the potential savings are unbounded.

Jazz compression

The Clazz format described by Horspool and Corless [HC98] was a predecessor of the Jazz format. While there are a number of similarities, the Clazz format is applied to individual classfiles in isolation, and therefore does not achieve a high compression as Jazz or the compression techniques described in this paper.