|
|
Project 1
Due 11:59pm Monday, Feb 23nd, 2009
Introduction
The goal of this project is to learn how scanner generators work.
You are to implement in Java a DFA minimizer that
- converts a regular expression into an NFA (using Thompson's algorithm described in class)
- converts the NFA into a DFA (using subset algorithm discussed in class)
- minimizes the DFA (using Hopcraft algorithm discussed in class)
- executes the minimized DFA on a string, printing either "accept" or "reject"
The parts of the DFA minimizer that scan and parse input regular expressions
have already been implemented. You only need to implement some methods for
a transition graph class that is used to represent the underlying finite
automata, both deterministic and nondeterministic.
Don't worry if your NFA or your non-minimized DFA don't exactly
match the ones generated by the ``official'' solution. Two finite
automata may be isomorphic (i.e., nodes
and edges can be mapped onto each other exactly), but your
algorithm may assign different state numbers to a node than
the official solution. This is acceptable.
If you used the correct algorithms shown in class (especially Thompson's algorithm for
closure), your NFA and DFA should have the identical number of states as the FA in the
official solution. Each string input should also be accepted/rejected in the same manner
as in the official solution.
Getting Started
Project 1 will be implemented in Java using Eclipse. Download the following file:
- Eclipse project archive file p1.zip
You can import the project into Eclipse as an existing project in an archive file.
Directions for downloading, installing, and using Java and Eclipse are
here.
Project Files
You should look at these files
- TG.java - The main program
- test*.in - Test case inputs
- test*.log - Results of running your DFA minimizer on test inputs
- test*.out - Official results demonstrating correct operation
- goAll - Script to compile everything from scratch
- goTest - Script to run DFA minimizer on some test input files
You probably don't have to worry about these files
- re.lex - The scanner specification for regular expressions
- re.cup - The grammar specification for regular expressions
- Yylex.java - Code produced by scanner generator (JLex)
- sym.java - Token value definitions
- parser.java - Code generated by parser generator (CUP)
Submission
Use the submit server web interface to submit your TG.java file.
Hints and Tips
- All the code you need to write is in the file TG.java. I have already
prepared most of the code. What you need to do is provide bodies to
the seven functions at the bottom of the file. Start with choice(), concat(),
and closure(). Leave minimize() for last.
- You can write additional functions/methods for TG.java (besides the
original functions) to act as helpers for your code. Doing so may
be very helpful for the minimize() function.
- You can apply the scanner/parser
generators creating parts of your project
using the script "goAll" on UNIX systems. It should be unnecessary
since the appropriate output files have already been provided.
- You should be able to test your code at each step of the way. The
existing code will print out the status of the Transition Graph (TG)
at each step of the process. The format is as follows:
------NFA
Transition Graph
0: a -> 1 // state 0, edge from 0 to 1 (on 'a')
1: epsilon -> 2 // state 1, edge from 1 to 2 (on empty str)
2: epsilon -> 3 , b -> 1 // state 2, edges from 2 to 3 & 1 (on e, 'b')
3: // state 3, no edges
start state: 0
final states: { 3 }
alphabet: { b,a }
number of states: 4
- The scripts used to evaluate the correctness of your DFA minimizer will
examine only the number of states found in each FA, and whether input strings
are accepted or rejected. However, I reserve the right to look at the
actual FA if necessary to judge correctness.
Any lines of debugging or trace information your program prints
should start with the character %, since all lines beginning with % are ignored.
- You can debug your minimizer by looking at the output of your .log file
and comparing against the provided .out files. The results will probably
be mostly different, since different TGs may be numbered differently.
The main parts in which the two areas need to match are 1) number of states
in each FA, and 2) accept/reject decision for each input string.
- You need to implement ALL of the functions in TG.java for
the linking to be successful (to get the executable). However, during development and testing, you needn't write
COMPLETE implementations of the functions. Nevertheless, you should
at least return valid instances of the return types.
- Do the DFA minimization step last. You can test whether you are
generating correct DFAs by testing different inputs, even if the DFA
is not minimized.
- Implementing the algorithms for this project involves the careful
application of a few non-trivial data types, such as sets. You should
learn how to use Java class libraries and tools. HashSets and HashMaps
are particularly useful for the project.
Academic Integrity
Please carefully read the academic honesty section of the
course syllabus. In particular, remember that you are not allowed
to show or copy any code for your project, either from the web or
other students. Any evidence of impermissible cooperation on
projects, use of disallowed materials or resources, or unauthorized
use of computer accounts, will be submitted to the Student
Honor Council, which could result in an XF for the course, or
suspension or expulsion from the University. Be sure you understand
what you are and what you are not permitted to do in regards to
academic integrity when it comes to project assignments. These
policies apply to all students, and the Student Honor Council does not
consider lack of knowledge of the policies to be a defense for
violating them. Full information is found in the course
syllabus---please review it at this time.
|