next up previous
Next: Methodologies and Baselines Up: Compressing Java Class Files Previous: Compressing Java Class Files

Introduction

This paper examines techniques for compressing (collections of) Java class files. Java class files are generated by Java compilers, are the standard distribution medium for Java programs and are the usual way of providing programs to a Java virtual machine. Java class files contain a substantial amount of symbolic information. In the javac benchmark from SPEC JVM98, only 21% of the uncompressed class file size is actually taken up by the method bytecodes. One purpose of this is to avoid the need to recompile all Java classes that use a class X whenever X is changed. So long as the functionality depended on doesn't change, previously compiled Java classes will work with the new version of X.

Few interesting Java applications are comprised of a single class. Many applications are composed of hundreds or even thousands of classes. Java class files can be collected in jar files, which are collections of compressed Java class files (and possibly other files, such as images). Jar files are used both on disk and for network transmission.

In many applications, Java programs are transmitted across the network. While ample bandwidth is available in some situations, there are many applications in which there are slow modem or mobile communication links in the network. The jar format normally uses the gzip compression mechanism to compress the files in a jar file. This typically provides a factor of 2 compression over standard Java class files. However, the compressed jar files for substantial applications can still be quite large (50-200K is not unusual), and take several minutes to transmit over a slow communication link.

I use a number of approaches to creating smaller files that contain the same information as a jar file:

Although this paper focuses solely on the problem on compressing Java class files, many of the techniques described would be generally useful for developing compact object serialization protocols.


next up previous
Next: Methodologies and Baselines Up: Compressing Java Class Files Previous: Compressing Java Class Files
William Pugh