CMSC 430, Spring 2009

Theory of Language Translation

Project 1

Due 11:59pm Monday, Feb 23nd, 2009


The goal of this project is to learn how scanner generators work. You are to implement in Java a DFA minimizer that

  1. converts a regular expression into an NFA (using Thompson's algorithm described in class)
  2. converts the NFA into a DFA (using subset algorithm discussed in class)
  3. minimizes the DFA (using Hopcraft algorithm discussed in class)
  4. executes the minimized DFA on a string, printing either "accept" or "reject"
The parts of the DFA minimizer that scan and parse input regular expressions have already been implemented. You only need to implement some methods for a transition graph class that is used to represent the underlying finite automata, both deterministic and nondeterministic. Don't worry if your NFA or your non-minimized DFA don't exactly match the ones generated by the ``official'' solution. Two finite automata may be isomorphic (i.e., nodes and edges can be mapped onto each other exactly), but your algorithm may assign different state numbers to a node than the official solution. This is acceptable. If you used the correct algorithms shown in class (especially Thompson's algorithm for closure), your NFA and DFA should have the identical number of states as the FA in the official solution. Each string input should also be accepted/rejected in the same manner as in the official solution.

Getting Started

Project 1 will be implemented in Java using Eclipse. Download the following file:
  • Eclipse project archive file
You can import the project into Eclipse as an existing project in an archive file. Directions for downloading, installing, and using Java and Eclipse are here.

Project Files

You should look at these files
  • - The main program
  • test*.in - Test case inputs
  • test*.log - Results of running your DFA minimizer on test inputs
  • test*.out - Official results demonstrating correct operation
  • goAll - Script to compile everything from scratch
  • goTest - Script to run DFA minimizer on some test input files
You probably don't have to worry about these files
  • re.lex - The scanner specification for regular expressions
  • re.cup - The grammar specification for regular expressions
  • - Code produced by scanner generator (JLex)
  • - Token value definitions
  • - Code generated by parser generator (CUP)


Use the submit server web interface to submit your file.

Hints and Tips

  • All the code you need to write is in the file I have already prepared most of the code. What you need to do is provide bodies to the seven functions at the bottom of the file. Start with choice(), concat(), and closure(). Leave minimize() for last.
  • You can write additional functions/methods for (besides the original functions) to act as helpers for your code. Doing so may be very helpful for the minimize() function.
  • You can apply the scanner/parser generators creating parts of your project using the script "goAll" on UNIX systems. It should be unnecessary since the appropriate output files have already been provided.
  • You should be able to test your code at each step of the way. The existing code will print out the status of the Transition Graph (TG) at each step of the process. The format is as follows:
    Transition Graph
    0: a -> 1 // state 0, edge from 0 to 1 (on 'a')
    1: epsilon -> 2 // state 1, edge from 1 to 2 (on empty str)
    2: epsilon -> 3 , b -> 1 // state 2, edges from 2 to 3 & 1 (on e, 'b')
    3: // state 3, no edges
    start state: 0
    final states: { 3 }
    alphabet: { b,a }
    number of states: 4
  • The scripts used to evaluate the correctness of your DFA minimizer will examine only the number of states found in each FA, and whether input strings are accepted or rejected. However, I reserve the right to look at the actual FA if necessary to judge correctness. Any lines of debugging or trace information your program prints should start with the character %, since all lines beginning with % are ignored.
  • You can debug your minimizer by looking at the output of your .log file and comparing against the provided .out files. The results will probably be mostly different, since different TGs may be numbered differently. The main parts in which the two areas need to match are 1) number of states in each FA, and 2) accept/reject decision for each input string.
  • You need to implement ALL of the functions in for the linking to be successful (to get the executable). However, during development and testing, you needn't write COMPLETE implementations of the functions. Nevertheless, you should at least return valid instances of the return types.
  • Do the DFA minimization step last. You can test whether you are generating correct DFAs by testing different inputs, even if the DFA is not minimized.
  • Implementing the algorithms for this project involves the careful application of a few non-trivial data types, such as sets. You should learn how to use Java class libraries and tools. HashSets and HashMaps are particularly useful for the project.

Academic Integrity

Please carefully read the academic honesty section of the course syllabus. In particular, remember that you are not allowed to show or copy any code for your project, either from the web or other students. Any evidence of impermissible cooperation on projects, use of disallowed materials or resources, or unauthorized use of computer accounts, will be submitted to the Student Honor Council, which could result in an XF for the course, or suspension or expulsion from the University. Be sure you understand what you are and what you are not permitted to do in regards to academic integrity when it comes to project assignments. These policies apply to all students, and the Student Honor Council does not consider lack of knowledge of the policies to be a defense for violating them. Full information is found in the course syllabus---please review it at this time.