next up previous
Next: Compressing Bytecodes Up: Compressing Java Class Files Previous: Compressing References

   
Encoding Integers

Both in encoding integers that naturally appear in a classfile (e.g., integer constants in bytecode, maximum stack size for code) and in encoding the indices arising from an encoding of references, we need to consider how to convert them into a bytestream we can hand off to the compressor.

Of course, a sequence of 16 or 32 bit integers can easily be turned into a sequence of 8 bit integers. But this sequence would contain a mixture of high bytes and low bytes, which would likely have different frequency distributions and result in poor compression.

The approach we take for encoded unsigned integers is to encode the lowest seven bits in a byte, with the high bit set if more bits are coming. This works well in cases where we don't know the maximum value or distribution but expect that the distribution is skewed towards small numbers (it works very poorly if most numbers being encoded are in the range 128-255).

In other situations both the encoder and the decoder know the range of possible values (e.g., that the integer to be encoded is in the range 0...4242). In such cases, we use a scheme that takes into account the range of values that need to be transmitted. If we know that values $0\ldots (n-1)$ need to be transmitted ( $n \leq 2^{16}$), we reserve the highest $r = \lfloor \frac{n-2}{255} \rfloor$ bit patterns in the first byte to indicate that this is a two-byte value. If $x \geq 256-r$, x is encoded as

\begin{displaymath}[((x-(256-r)) \mbox{\rm\ mod\ } r)+256-r, \lfloor (x-(256-r))/r \rfloor ]\end{displaymath}

Using variable length encodings as above for signed integers would result in a multi-byte encoding of all negative numbers since their representation is at the high end of the unsigned range. We fix this by essentially moving the sign bit of signed integers into the least significant bit position; x is encoded as $x \geq 0 \ \ ? \ \ 2x : -2x-1$. Thus, $\{-3,-2,-1,0,1,2,3\}$ is encoded as $\{5,3,1,0,2,4,6\}$.


next up previous
Next: Compressing Bytecodes Up: Compressing Java Class Files Previous: Compressing References
William Pugh