Problem 2:``Ding Dong Bell--Coding Goes Well" (36 pts)

This question addresses issues associated with developing variable length codes specific to known or anticipated data sets.

A file containing 11 unique symbols is to be encoded using Huffman coding. The frequencies of every letter occuring in the file were tabulated, and the Huffman code in the following table was generated by minimizing the weighted path length in the tree. A fixed length code for the 11 unique symbols was also generated.

Plain Text Huffman 4-Bit

Letter Code Word Code Word

A 100 0001

D 1111101 0010

E 0 0011

I 110 0100

M 11110 0101

O 1111100 0110

R 11101 0111

S 11100 1000

T 101 1001

space 1111110 0000

linefeed 1111111 1111

2.1 (6): Build the Huffman coding tree corresponding to the above code.
2.2 (6): True or False: Letter A occurs more frequently in the file than letter I. Explain your answer for full credit.
2.3 (6): True or False: The Huffman code for a given set of symbols and frequencies is unique. That is, exactly ONE coding scheme can be extracted from this information. Explain your answer for full credit.
2.4 (6): When decoding a string using a Huffman code, how do you tell when one character stops and another one begins? That is, why don't we need a ``new character'' or ``stop'' symbol?
2.5 (6): Is the term ``tree'' really an accurate name for the Huffman coding tree? That is, is there another name that we could give this structure in the context of the course thus far? Explain your answer for full credit.
2.6 (6): If we receive another file in ASCII containing only those 9 unique symbols and encode it using the table above, which would you expect to produce a shorter encoded file--the 4-bit fixed length code or the Huffman code given above? Explain your answer for full credit.

MM Hugue 2012-03-08

Plain Text	Huffman	4-Bit
Letter	Code Word	Code Word
A	100	0001
D	1111101	0010
E	0	0011
I	110	0100
M	11110	0101
O	1111100	0110
R	11101	0111
S	11100	1000
T	101	1001
space	1111110	0000
linefeed	1111111	1111