Next: Acknowledgments Up: Compressing Java Class Files Previous: Related Work

Conclusion

The Java classfile format is rather fluffy and it should come as no great surprise that a different format could lead to smaller files, particularly when information duplicated across multiple class files is combined. On the other hand, a good compression algorithm can work wonders, and a more efficient format with less redundant information will often not compress as well. So the amount of additional compression available over gzip'd classfiles was not obvious. As it turns out, we can obtain compression factors of 2-5 over individually gzip'd classfiles, which will make an important difference in mobile and other low bandwidth applications.

We have been making the assumption that for each kind of data, one particular encoding scheme is optimal. Of course, this isn't the case: different schemes will work better with different benchmarks. To achieve even better compression, the compression stage could try several encoding methods of each kind of data, and select the one that happens to work best. The encoded data would include a description of the encoding mechanism used for each data sequence, and would not be substantially harder to decode than if a fixed policy was used for each kind of data.

There are a number of other approaches that might give minor performance improvements. The only change I can think of that would likely give non-trivial improvements would be assume a standard set of pre-loaded references to frequently used package names, classes, method references and so on. It actually isn't guaranteed that this would improve compression (preloaded references that were never used would degrade compression), but I expect it would help on small archives. This would also likely increase the size of the decompressor, so in the situations where the decompressor is not pre-installed, there would not be any net benefit.

As a research tool, the goal is to get as much compression as possible. However, as a tool that might be widely distributed and reimplemented, it might be better to have a specification of the packed format that is simple and clear. It may be appropriate to simplify the format by, for example, dropping approximate stack state (§7.1).

I expect that an implementation will be available for download from http://www.cs.umd.edu/ $\sim$ pugh by the date of the conference.

Next: Acknowledgments Up: Compressing Java Class Files Previous: Related Work

William Pugh