Encoding Integers

Of course, a sequence of 16 or 32 bit integers can easily be turned into a sequence of 8 bit integers. But this sequence would contain a mixture of high bytes and low bytes, which would likely have different frequency distributions and result in poor compression.

The approach we take for encoded unsigned integers is to encode the lowest seven bits in a byte, with the high bit set if more bits are coming. This works well in cases where we don't know the maximum value or distribution but expect that the distribution is skewed towards small numbers (it works very poorly if most numbers being encoded are in the range 128-255).

In other situations
both the encoder and the decoder know the range of possible values (e.g., that
the integer to be encoded is in the range 0...4242).
In such cases, we use a scheme that takes into account the range of values that need
to be transmitted. If we know that values
need to be transmitted
(
),
we reserve the highest
bit patterns in the first byte to indicate
that this is a two-byte value. If
,
*x* is encoded as

Using variable length encodings as above for signed integers would result in
a multi-byte encoding of all negative numbers since their representation is at the high
end of the unsigned range. We fix this by essentially moving the sign
bit of signed integers into the least significant bit position; *x* is encoded
as
.
Thus,
is encoded as
.