next up previous
Next: Evaluation Up: Compressing Java Class Files Previous: Compressing Sets of Strings

Other issues

One reason that my packed format is more compact is that multiple class files are combined into a single packed format that shares information. If each class file were packed separately, the total amount of data that needs to be communicated increases. Another question is how much of the compression in my packed format is due to gzip, and how much is because of the more compact encoding. On normal classfiles, gzip provides a compression factor of about 2. These effects of combining classfiles and using gzip are broken out in Table 5. Not using gzip may be appropriate on very lightweight clients where running zip is impossible or too expensive.


 
Table 5: Effects of separate packing and not gzipping
  % of size of jar file
  of gzip'd classfiles
Option javac mpegaudio
Standard 22% 37%
Packed Separately 52% 56%
Not gzip'd 49% 99%
Packed Separately and not gzip'd 87% 118%
 

There is one issue we must be careful about when decompressing an archive. Normally, when we need to create a reference to a constant pool entry in a reconstructed classfile, we can just assign the element referenced to any free slot in the constant pool. However, the bytecode LDC instruction can only encode an index in the range 1-255. These instructions can only reference integer, float and string constants.

The first fix is to assign integer, float and string constant pool entries the smallest available index. Other constant pool entries are assigned in the largest available index; we transmit the total number of constant pool entries required as part of are encoding.

This almost fixes the problem. However, if there are more than 255 integer, float and string constants referenced in a classfile, which ones are assigned small indices? We would like to ensure that the same set of constants is assigned small indices as in the original classfile; otherwise, we would have to change some LDC instructions to LDC_W instructions, which are of different sizes. This would then require patching all jump offsets that traversed the changed instruction.

Instead, if a integer, float or string constant is referenced with a LDC_W instruction, then it is assigned a high constant pool index; if it is referenced with a LDC instruction, it is assigned a low constant pool index. This assumes that a classfile doesn't reference the same constant pool entry with both a LDC and a LDC_W instruction. It would be inefficient to do so, and can be fixed (and made more efficient) when the classfile is encoded if necessary.

This almost fixes the problem, except that a integer, float or string constant can also be referenced as a constant value for a field. We use an additional bit in the access flags for a field to encode whether a constant value int/float/string should be assigned a high index.

 
Table 6: Compression ratios
  Size in KBytes Size as % of jar format Size as % of packed format
Benchmark jar j0r.gz Jazz Packed j0r.gz Jazz Packed Strings Opcodes Ints Refs Misc
209_db 6 5 4 3 84% 66% 49% 34% 28% 9% 17% 13%
201_compress 10 6 4 3 59% 41% 29% 29% 32% 14% 17% 8%
Hanoi_jax 21 16 12 7 74% 58% 32% 21% 30% 13% 27% 9%
205_raytrace 24 15 12 7 64% 50% 30% 20% 33% 9% 22% 16%
Hanoi_big 30 20 15 9 67% 52% 29% 25% 27% 14% 26% 8%
Hanoi 46 31 23 13 67% 49% 29% 22% 29% 12% 29% 8%
228_jack 55 36 30 17 65% 55% 30% 32% 21% 14% 21% 11%
222_mpegaudio 62 45 34 23 73% 54% 37% 9% 24% 37% 12% 18%
icebrowserbean 116 88 80 39 76% 69% 34% 21% 31% 11% 26% 12%
javafig_dashO 131 113 102 53 86% 78% 41% 23% 28% 8% 29% 12%
202_jess 136 64 42 23 47% 31% 17% 23% 28% 12% 26% 11%
javafig 170 143 122 64 84% 71% 38% 28% 26% 8% 27% 11%
jmark20 173 91 86 35 53% 50% 20% 22% 25% 13% 28% 12%
213_javac 226 143 90 50 63% 40% 22% 18% 29% 15% 27% 11%
ImageEditor 257 162 123 64 63% 48% 25% 22% 28% 16% 24% 10%
tools 737 513 477 204 70% 65% 28% 26% 27% 10% 27% 11%
visaj 1,157 703 691 238 61% 60% 21% 23% 26% 12% 31% 8%
swingall 1,657 998 887 338 60% 54% 20% 19% 28% 13% 31% 9%
rt 4,652 2,820 8,435 1,069 61% 181% 23% 22% 28% 13% 27% 10%

jar Size of jar file with individual class files stripped of debugging information and compressed
j0r.gz Size of gzip of jar file with class files stripped of debugging information and but not compressed
Jazz Size of Jazz archive [BHV98] (See Section 13.1)
Packed Size of archive produced by techniques in this paper

 


 
Figure 2: Graph of compression ratios
\begin{figure*}\centerline{\epsfxsize=6in \epsfbox{chart.eps}}\end{figure*}


next up previous
Next: Evaluation Up: Compressing Java Class Files Previous: Compressing Sets of Strings
William Pugh