Project #7 (Huffman Trees)

CMSC 132

Due Date: Wed Nov 30, 6:00 pm

Object-Oriented Programming II

Type of Homework: Closed

Fall 2005


Objective

The objectives of this project are:

This homework is considered a closed homework.  Make sure you read the Open/Closed policy before continuing working on this project.

Overview

For this project you will implement the class HuffmanCodec which represents an encoder/decoder for text.  We use Huffman trees for the encoding process. We have provided an overview of huffman trees at the end.

Your project will be graded as follows:

For this project:

Specifications

Important Restrictions

The TAs have been instructed not to help you debug your code unless you have written a test case that demonstrates that your project is malfunctioning.  You should be writing and using your test cases WHILE you are implementing the project, not as an afterthought at the very end.  Testing and software development go hand-in-hand.

Class You Need to Implement

For this project, you need to implement the HuffmanCodec class.   This class allows you to encode and decode text using Huffman trees.  The class defines a Huffman tree which is then used to support the encoding and decoding tasks.  When implementing this class, you must observe the following restrictions/requirements:

A description of the class methods can be found at project javadoc.  Feel free to add any additional class(es) you understand you need, but make sure you place them in the Huffman package.

General Requirements

Honor Section Requirements

In addition to the above requirements, you must satisfy the following requirements:

Submission

Submit your project using the submit project option associated with Eclipse. 

Academic Integrity

Please make sure you read the academic integrity section of the syllabus so you understand what is permissible in our programming projects.  We want to remind you that we check your project against other students' projects and any case of academic dishonesty will be referred to the University's Office of Judicial Program

Driver Example

The following driver (DriverExample.java) can be found in the code distribution.  The expected output is included below.

package huffman;
import java.io.StringReader;
public class DriverExample {
	public static void main(String[] args) {
		HuffmanCodec huffmanCodec =  new HuffmanCodec(new StringReader("eebbeecdebeeebecceeeddebbbeceedebeeddeeeecceeeedeeedeeebeedeceedebeeedeceeedebee"));
		
		System.out.println("****** Frequency map ******");
		System.out.println(huffmanCodec.getFrequencyMap());
		
		System.out.println("****** Huffman Tree ******");
		System.out.println(huffmanCodec.getHuffmanTree());
		
		System.out.println("****** Coding/Decoding Examples ******");
		System.out.println("Huffman code for e is: " + huffmanCodec.encode('e'));
		System.out.println("Result of decoding 011 is: " + huffmanCodec.decode("011"));
		System.out.println("Result of decoding 00 is: " + huffmanCodec.decode("00"));
		
		System.out.println("****** Character to Huffman mapping ******");
		System.out.println(huffmanCodec.getCharToHuffmanMap());
	}
}

Output

****** Frequency map ******
{d=12, c=8, b=11, e=49}
****** Huffman Tree ******
        [e,49]
[ ,80]
                        [b,11]
                [ ,19]
                        [c,8]

        [ ,31]
                [d,12]


****** Coding/Decoding Examples ******
Huffman code for e is: 1
Result of decoding 011 is: b
Result of decoding 00 is: d
****** Character to Huffman mapping ******
{d=00, c=010, b=011, e=1}

Overview of Huffman Trees

NOTE: If you were to run your program to create a Huffman tree with the string provided in this overview then the tree created by your program will not look like the one presented below.   This has to do with the order of combination of nodes we use in our project.

A Huffman Tree is a special type of binary tree used for data compression.  A small Huffman Tree appears below:

Each leaf in the tree contains a symbol (in this case a Character) and an integer value.  Each non-leaf (internal node) contains just references to its left and right children.  The 0's and 1's that you see are not stored anywhere, we just mentally associate 0 with any left branch and 1 with any right branch.

Each character that appears in the tree is assigned a unique code (a sequence of 0's and 1's) obtained by following the path from the root of the tree to the leaf containing that character.   Below are the codes for the four characters found in this sample tree:

Encoding

A sequence of characters can be "encoded" into a String of 0's and 1's by using the codings as described above.  For example, the String "eddbc" would be encoded into the sequence "01111100101".

Decoding

A sequence of 0's and 1's can be "decoded" into a String of Characters by using the codings backwards.  For example, the sequence "11010111" could be decoded into the String "decd".

Constructing a Huffman Tree from a Stream of Characters

Given a "sample" stream of Characters, one can create a Huffman Tree which will do a very good job of compressing other streams whose characters share the same relative frequency as those in the sample stream used to create the Tree.

1.  The first step is to count the frequency of each character in the "sample" stream of Characters.  Suppose your sample stream is:

"eebbeecdebeeebecceeeddebbbeceedebeeddeeeecceeeedeeedeeebeedeceedebeeedeceeedebee" 

Below are the counts for each character that occurs in the stream:

2.  In this example, the next step is to create four little trees -- one for each character -- each consisting of just a single leaf.  These little trees should be put into some kind of list.  Below is a picture of the list of four little trees:

           

3.  The next step is to select the two trees with the SMALLEST frequencies.  In this example, that would be 8 and 11.  Remove them from the list of trees, and then join them as children of a new node to make a new tree.  Then add this new tree to the list of trees.  Below is the list of trees after this step has been accomplished.  There are now three trees in the list.

        

The "frequency" of a tree that is more than just a single leaf is the sum of the frequencies of all leaves below it.  So the frequency of the first tree above is 19, the sum of it's leaves.

Now we repeat step 3.  (This time, the two smallest frequencies are 19 and 12).  Below is the list of trees after we join them.  (We're down to TWO trees now).

  

Now we repeat step 3 again.  (This time, the two smallest frequencies are obvious -- 31 and 49).  Below is the list of trees after we join them.  (Just one tree now.)

Since the list now contains just one tree, we are done -- this is the Huffman Tree generated by the "sample" sequence of characters.

Web Accessibility