|
Project #7 (Huffman Trees) |
CMSC 132 |
|
Due Date: Wed Nov 30, 6:00 pm |
Object-Oriented Programming II |
|
Type of Homework: Closed |
Fall 2005 |
Objective
The objectives of this project are:
Implementing Huffman Trees
Practice using the Java Collections Framework
This homework is considered a closed homework. Make sure you read the Open/Closed policy before continuing working on this project.
Overview
For this project you will implement the class HuffmanCodec which represents an encoder/decoder for text. We use Huffman trees for the encoding process. We have provided an overview of huffman trees at the end.
Your project will be graded as follows:
Tests (90 %)
Public JUnit tests (24 %)
Release JUnit tests (66 %)
Style (10 %)
For this project:
Even if you are not planning to work on the project for a while, make sure that you can submit your project using the "Submit Project" option in Eclipse and verify your submission has been received in the submit server. It is important that you submit your project using the "Submit Project" option rather than uploading a jar file to the submit server. If you are having problems with this submission process, contact your TA or instructor immediately.
You must debug your code using the Eclipse debugger and should not use "System.out.println" as a debug tool. TAs in office hours will require you to use the debugger to show any problems you are experiencing.
Specifications
The TAs have been instructed not to help you debug your code unless you have written a test case that demonstrates that your project is malfunctioning. You should be writing and using your test cases WHILE you are implementing the project, not as an afterthought at the very end. Testing and software development go hand-in-hand.
For this project, you need to implement the HuffmanCodec class. This class allows you to encode and decode text using Huffman trees. The class defines a Huffman tree which is then used to support the encoding and decoding tasks. When implementing this class, you must observe the following restrictions/requirements:
You must define an inner class named HuffmanNode which represents a node in the Huffman tree. You are free to add any fields/methods to the HuffmanNode class.
You must use each of the following instance variables (which we have already left defined for you in the class shell you will find in the code distribution)
private HuffmanNode root = null; → Represents the Huffman tree
private HashMap<Character,Integer> charFrequencyMap; → Stores the frequency associated with each character of the tree.
private PriorityQueue<HuffmanNode> pqueue; → Priority queue that you must use when combining nodes in order to form the Huffman tree
private HashMap<Character,String> charToHuffmanCodeMap; → Maps each character to the corresponding Huffman code. When we need to decode a text we will look for entries in this map.
You must define the Huffman tree as part of the HuffmanCodec class (the root instance variable should refer to this tree).
IMPORTANT: Several correct Huffman trees can be created depending on the order subtrees are combined. For this project make sure that when combining subtrees, the frequency associated with the left subtree is less than the frequency associated with the right subtree.
The left edge of a node in a Huffman tree will have a value of 0 and the right edge a value of 1.
Feel free to add add any other instance variables to the HuffmanCodec class in order to complete the implementation.
A description of the class methods can be found at project javadoc. Feel free to add any additional class(es) you understand you need, but make sure you place them in the Huffman package.
General Requirements
Honor Section Requirements
In addition to the above requirements, you must satisfy the following requirements:
Submission
Submit your project using the submit project option associated with Eclipse.
Academic Integrity
Please make sure you read the academic integrity section of the syllabus so you understand what is permissible in our programming projects. We want to remind you that we check your project against other students' projects and any case of academic dishonesty will be referred to the University's Office of Judicial Program
Driver Example
package huffman; import java.io.StringReader;
public class DriverExample {
public static void main(String[] args) {
HuffmanCodec huffmanCodec = new HuffmanCodec(new StringReader("eebbeecdebeeebecceeeddebbbeceedebeeddeeeecceeeedeeedeeebeedeceedebeeedeceeedebee"));
System.out.println("****** Frequency map ******");
System.out.println(huffmanCodec.getFrequencyMap());
System.out.println("****** Huffman Tree ******");
System.out.println(huffmanCodec.getHuffmanTree());
System.out.println("****** Coding/Decoding Examples ******");
System.out.println("Huffman code for e is: " + huffmanCodec.encode('e'));
System.out.println("Result of decoding 011 is: " + huffmanCodec.decode("011"));
System.out.println("Result of decoding 00 is: " + huffmanCodec.decode("00"));
System.out.println("****** Character to Huffman mapping ******");
System.out.println(huffmanCodec.getCharToHuffmanMap());
}
}
Output
****** Frequency map ******
{d=12, c=8, b=11, e=49}
****** Huffman Tree ******
[e,49]
[ ,80]
[b,11]
[ ,19]
[c,8]
[ ,31]
[d,12]
****** Coding/Decoding Examples ******
Huffman code for e is: 1
Result of decoding 011 is: b
Result of decoding 00 is: d
****** Character to Huffman mapping ******
{d=00, c=010, b=011, e=1}
NOTE: If you were to run your program to create a Huffman tree with the string provided in this overview then the tree created by your program will not look like the one presented below. This has to do with the order of combination of nodes we use in our project.
A Huffman Tree is a special type of binary tree used for data compression. A small Huffman Tree appears below:

Each leaf in the tree contains a symbol (in this case a Character) and an integer value. Each non-leaf (internal node) contains just references to its left and right children. The 0's and 1's that you see are not stored anywhere, we just mentally associate 0 with any left branch and 1 with any right branch.
Each character that appears in the tree is assigned a unique code (a sequence of 0's and 1's) obtained by following the path from the root of the tree to the leaf containing that character. Below are the codes for the four characters found in this sample tree:
e is coded by "0"
b is coded by "100"
c is coded by "101"
d is coded by "11"
A sequence of characters can be "encoded" into a String of 0's and 1's by using the codings as described above. For example, the String "eddbc" would be encoded into the sequence "01111100101".
A sequence of 0's and 1's can be "decoded" into a String of Characters by using the codings backwards. For example, the sequence "11010111" could be decoded into the String "decd".
Given a "sample" stream of Characters, one can create a Huffman Tree which will do a very good job of compressing other streams whose characters share the same relative frequency as those in the sample stream used to create the Tree.
1. The first step is to count the frequency of each character in the "sample" stream of Characters. Suppose your sample stream is:
"eebbeecdebeeebecceeeddebbbeceedebeeddeeeecceeeedeeedeeebeedeceedebeeedeceeedebee"
Below are the counts for each character that occurs in the stream:
49 e's
11 b's
8 c's
12 d's
2. In this example, the next step is to create four little trees -- one for each character -- each consisting of just a single leaf. These little trees should be put into some kind of list. Below is a picture of the list of four little trees:

3. The next step is to select the two trees with the SMALLEST frequencies. In this example, that would be 8 and 11. Remove them from the list of trees, and then join them as children of a new node to make a new tree. Then add this new tree to the list of trees. Below is the list of trees after this step has been accomplished. There are now three trees in the list.
The "frequency" of a tree that is more than just a single leaf is the sum of the frequencies of all leaves below it. So the frequency of the first tree above is 19, the sum of it's leaves.
Now we repeat step 3. (This time, the two smallest frequencies are 19 and 12). Below is the list of trees after we join them. (We're down to TWO trees now).
Now we repeat step 3 again. (This time, the two smallest frequencies are obvious -- 31 and 49). Below is the list of trees after we join them. (Just one tree now.)

Since the list now contains just one tree, we are done -- this is the Huffman Tree generated by the "sample" sequence of characters.