Signed Extension

Introduction

By now, you should know that 2C is the representation used for signed int values in the majority of ISAs, thus it's the underlying representation for ints programming languages like C.

"C" offers several different sizes for signed int. There's short int, int, long int. The "C" language doesn't specify how many bytes each should be, but does have the following restriction.

 sizeof( short ) <= sizeof( int ) <= sizeof( long )

All sizes can be different, or they can all be the same (or two of them the same, and one different), as long as the above restrictions are met.

Casting `short` to `int`

"C" allows you to cast from a shorter int size to a larger int size. Let's assume that short and int are different sizes. Say, short uses 2 bytes while int uses 4 bytes.

Casting requires the CPU to make conversions between one data type and another. Since short and int are signed two's complement, the way to convert from a shorter data type to a longer data type is to sign extend.

Definition To sign-extend an N-bit number to an (N+k)-bit number, bits b_(N+k-1)-N = (b_N-1)^kl while the bits at index less than or equal to N - 1 remain unchanged.

To sign extend means to fill in the additional upper k bits with 0, if the sign bit is 0, and to fill those bits with 1, if the sign bit is 1.

Clearly, this works if the sign bit is 0. You write 3, in base 2 using 4 bits, as 0011. If you want to write it in 6 bits, it's 000011. Adding additional 0's to the high bits does not affect the final value.

What's less clear is that this is true if the sign bit is 1. Let's think about why this might be. Suppose you want to represent -2 in 2C using 4 bits. You'd write 1110. Again, think of the binary odometer example. Go back from 0000, to 1111, to 1110.

A binary odometer has its lower bits changing first. The higher bits change later. This is true whether you are going forward or backward in binary.

For example, write -2 in 2C using 6 bits. It looks like 111110. That's just 1110 with two more 1's at the higest 2 bits.

Casting `int` to `short`

It turns out that you can cast in the opposite direction. Thus, if you wanted to cast a 32 bit int to a 16 bit int, the top 16 bits are chopped off. Thus, a number represented as b_31-0 is now b_15-0.

Clearly, the problem with casting to a shorter size is a possible loss of information. You can represent a larger number of values with more bits. Getting rid of bits may cause the number to be cut off.

If the compiler complains, usually static casting the number to the shorter type will get rid of the complaints. Effectively, you are telling the compiler "I know this casting may lose data, but don't worrry about it. This is OK for my application."

Casting `int` to `char`

The getchar() function in C returns an int. This is surprising because getchar() suggests it may return a char.

The problem is returning an error value. When getchar() reaches the end of file, it returns EOF. This is almost always defined as -1. While there is a possibility of defining char as negative (thus signed char), the safer response is to return -1.

Unfortunately, you need to cast the result as in:

   char ch = (char) getchar() ;

You would static cast if you were working in C++.

Zero extension

Sign extension applies when casting a signed 2C number with smaller number of bytes (or bits) to a signed 2C with larger number of bytes (or bits).

If you are casting unsigned numbers to a larger width (i.e., a larger number of bits or bytes), you zero-extend. That is, you add zeroes to the upper bits. This should make sense. With no negative values, there's no need to copy 1's up.

Zero-extension often occurs with logical operations. If you are doing a logical operation (e.g., bitwise AND, bitwise OR) where one argument has more bytes than the other, it's usually the case that the argument with fewer bytes is zero-extended.

This is especially true in assembly language where it's often not clear whether a register is holding an UB or 2C representation (or whether it should simply be considered some kind of bitstring, and not even thought of as a number).

Summary

Sign-extension is a way to extend signed int to more bits. This is done by copying the sign bit to all the additional upper bits.

Zero extension is used to extend unsigned ints to more bits. This is done by copying 0's to all additionla upper bits.

You can also reduce the number of bits (this doesn't have a particular name---i.e., it's NOT called sign-reduction) by casting to a smaller size, but this may entail loss of information.

Web Accessibility