Floating Point Problems

Next: Extreme Issues Up: nasty-nums Previous: Fixed Point Problems

Floating Point Problems

An IEEE single precious floating point number, z, is stored in 32 bits, with parts s, the sign bit; e, the 8-bit excess-127 exponent; and, f, the 23 bit fractional part as shown below.

$\begin{displaymath}z = (-1)^s \;\;\; 1.f\;\;\; \times 2^{e-127}\end{displaymath}$

Express the fixed point numbers below using 32-bit IEEE floating-point notation. (Hint: use your fixed point base 2 representation as a starting point.)
1. 256.75
2. 0.908
3. -4099.125
4. $0.0000\;\;\;0004$ (hint: multiply by 16 and accumulate integers)
5. $-64.0000\;\;\;0004$
Express both operands in 32-bit IEEE floating point and perform the indicated operations. Remember, addition and subtraction require both operands to have the same exponent so that the binary points are aligned. Furthermore, subtraction is implemented by adding the 2's complement of the number being subtracted. (Hint: some of the numbers to be converted were used in previous problems.)
2. $1- 0.0000\;\;\;0004$

MM Hugue 2004-09-08