next up previous
Next: Basic approaches Up: Compressing Java Class Files Previous: Introduction

Subsections

Methodologies and Baselines


 
Table 1: Benchmark programs studied in this paper
  Size in Kbytes sjar/ sjar/ sj0r.gz/  
Benchmark sj0r jar sjar sj0r.gz sj0r jar sjar Description

rt

8,937 5,726 4,652 2,820 52% 81% 61% Java 1.2 runtime
swingall 3,265 2,193 1,657 998 51% 76% 60% Sun's new set of GUI Widgets (JFC/Swing 1.1)
tools 1,557 950 737 513 47% 78% 70% Java 1.2 tools (javadoc, javac, jar, ...)
icebrowserbean 226 125 116 88 52% 93% 76% HTML browser
jmark20 309 189 173 91 56% 91% 53% Byte's java benchmark program
visaj 2,189 1,524 1,157 703 53% 76% 61% Visual GUI builder
ImageEditor 454 359 257 162 57% 72% 63% Image editor, distributed with VisaJ
Hanoi 86 57 46 31 54% 80% 67% Demo applet distributed with Jax
Hanoi_big 56 37 30 20 53% 80% 67% Hanoi, partially jax'd
Hanoi_jax 38 22 21 16 55% 96% 74% Hanoi, fulled jax'd
javafig 357 198 170 143 48% 86% 84% Java version of xfig
javafig_dashO 269 136 131 113 49% 96% 86% javafig, processed by dashO
Programs from SPEC JVM98 (http://www.spec.org/osg/jvm98/)
201_compress 15 11 10 6 64% 85% 59% Modified Lempel-Ziv method (LZW)
202_jess 270 183 136 64 50% 74% 47% Java Expert Shell System based on NASA's CLIPS expert shell system
205_raytrace 52 31 24 15 47% 78% 64% Raytracing a dinosaurs (invoked by 227_mtrt)
209_db 10 6 6 5 56% 94% 84% Performs multiple database functions on memory resident database
213_javac 516 274 226 143 44% 82% 63% Sun's JDK 1.0.2 Java compiler
222_mpegaudio 120 68 62 45 51% 91% 73% Decompresses MPEG Layer 3 audio
228_jack 115 74 55 36 48% 74% 65% A Java parser generator that is based on the Purdue Compiler Construction Tool Set (PCCTS)

               

sj0r non-classfiles excluded, debugging information stripped, no compression
jar non-classfiles excluded, class files as distributed (debugging information often not stripped), files compressed individually
sjar non-classes excluded, debugging information stripped, files compressed individually
sj0r.gz non-classes excluded, debugging information stripped, individual files not compressed, jar file gzip'd as a whole

 

 In this paper, I explore wire-formats for collections of Java class files. I assume that bandwidth is the most precious resource. Time required to compress a Java archive is relatively unimportant, while the time required to decompress must be reasonable (not significantly longer than using gzip). The wire-format is a sequential format: all of the class files must be decompressed in sequence. As they are decompressed, they can be written to disk as a conventional jar file or separate classfiles. These would be completely conventional classfiles that could be used by a standard JVM. Alternatively, each class can be directly loaded into a JVM as it is decompressed, saving the expense of constructing the classfile. For this, a custom classloader would be required, but no other changes to the JVM would be required. See Section 11 for a discussion of eager class loading.

While it would be possible to include debugging information in a wire-format, we would typically prefer to save space by excluding it. I do not encode the attributes LineNumberAttribute, LocalVariableTable nor SourceFile. Also, because my approach requires that we renumber entries in the constant pool, I exclude any unrecognized attributes (we would not be able to update references to the constant pool in unrecognized attributes).

I also exclude any non-class files (e.g., PNG image files) from archive in performing my size calculations. I report compression as the size of the compressed object, as a percentage of the size of the original object. To have a consistent and fair comparison of the size of my archive format with standard jar files, I performed the following transformations to the benchmarks I studied:

These changes typically give a 20% improvement in jar file size Sorting of the constant pool entries can give an improvement of several percent when the class file is compressed, because it enables zlib to do a better job of finding repeated patterns. In this paper, when I report the size of original and compressed class files, those sizes reflect the improvements gained by these transformations. Any improvements I report for the new techniques in this paper reflect improvements beyond those gained by removing debugging information and garbage collecting the constant pool.

I will often refer to gzip and zlib compression interchangeable. However, in most situations where I apply gzip compression I do not include the 18 bytes for the GZIP header and trailer.



 
Table: Benchmark programs studied in this paper
  Size in Kbytes sjar/ sjar/ sj0r.gz/  
Benchmark sj0r jar sjar sj0r.gz sj0r jar sjar Description

rt

8,937 5,726 4,652 2,820 52% 81% 61% Java 1.2 runtime
swingall 3,265 2,193 1,657 998 51% 76% 60% Sun's new set of GUI Widgets (JFC/Swing 1.1)
tools 1,557 950 737 513 47% 78% 70% Java 1.2 tools (javadoc, javac, jar, ...)
icebrowserbean 226 125 116 88 52% 93% 76% HTML browser
jmark20 309 189 173 91 56% 91% 53% Byte's java benchmark program
visaj 2,189 1,524 1,157 703 53% 76% 61% Visual GUI builder
ImageEditor 454 359 257 162 57% 72% 63% Image editor, distributed with VisaJ
Hanoi 86 57 46 31 54% 80% 67% Demo applet distributed with Jax
Hanoi_big 56 37 30 20 53% 80% 67% Hanoi, partially jax'd
Hanoi_jax 38 22 21 16 55% 96% 74% Hanoi, fulled jax'd
javafig 357 198 170 143 48% 86% 84% Java version of xfig
javafig_dashO 269 136 131 113 49% 96% 86% javafig, processed by dashO
Programs from SPEC JVM98 (http://www.spec.org/osg/jvm98/)
201_compress 15 11 10 6 64% 85% 59% Modified Lempel-Ziv method (LZW)
202_jess 270 183 136 64 50% 74% 47% Java Expert Shell System based on NASA's CLIPS expert shell system
205_raytrace 52 31 24 15 47% 78% 64% Raytracing a dinosaurs (invoked by 227_mtrt)
209_db 10 6 6 5 56% 94% 84% Performs multiple database functions on memory resident database
213_javac 516 274 226 143 44% 82% 63% Sun's JDK 1.0.2 Java compiler
222_mpegaudio 120 68 62 45 51% 91% 73% Decompresses MPEG Layer 3 audio
228_jack 115 74 55 36 48% 74% 65% A Java parser generator that is based on the Purdue Compiler Construction Tool Set (PCCTS)

               

sj0r non-classfiles excluded, debugging information stripped, no compression
jar non-classfiles excluded, class files as distributed (debugging information often not stripped), files compressed individually
sjar non-classes excluded, debugging information stripped, files compressed individually
sj0r.gz non-classes excluded, debugging information stripped, individual files not compressed, jar file gzip'd as a whole

 

 In this paper, I explore wire-formats for collections of Java class files. I assume that bandwidth is the most precious resource. Time required to compress a Java archive is relatively unimportant, while the time required to decompress must be reasonable (not significantly longer than using gzip). The wire-format is a sequential format: all of the class files must be decompressed in sequence. As they are decompressed, they can be written to disk as a conventional jar file or separate classfiles. These would be completely conventional classfiles that could be used by a standard JVM. Alternatively, each class can be directly loaded into a JVM as it is decompressed, saving the expense of constructing the classfile. For this, a custom classloader would be required, but no other changes to the JVM would be required. See Section 11 for a discussion of eager class loading.

While it would be possible to include debugging information in a wire-format, we would typically prefer to save space by excluding it. I do not encode the attributes LineNumberAttribute, LocalVariableTable nor SourceFile. Also, because my approach requires that we renumber entries in the constant pool, I exclude any unrecognized attributes (we would not be able to update references to the constant pool in unrecognized attributes).

I also exclude any non-class files (e.g., PNG image files) from archive in performing my size calculations. I report compression as the size of the compressed object, as a percentage of the size of the original object. To have a consistent and fair comparison of the size of my archive format with standard jar files, I performed the following transformations to the benchmarks I studied:

These changes typically give a 20% improvement in jar file size Sorting of the constant pool entries can give an improvement of several percent when the class file is compressed, because it enables zlib to do a better job of finding repeated patterns. In this paper, when I report the size of original and compressed class files, those sizes reflect the improvements gained by these transformations. Any improvements I report for the new techniques in this paper reflect improvements beyond those gained by removing debugging information and garbage collecting the constant pool.

I will often refer to gzip and zlib compression interchangeable. However, in most situations where I apply gzip compression I do not include the 18 bytes for the GZIP header and trailer.

   
Gzip'd jar files of uncompressed class files

The compression done in normal jar files are on a file-by-file basis. We can achieve better compression if we compress an entire jar file, where the individual files in the jar file have not been compressed separately. In tables and text, I refer to these as j0r.gz files (0 for no compression within the jar file).
next up previous
Next: Basic approaches Up: Compressing Java Class Files Previous: Introduction
William Pugh