Project 4 - Regular Expression Interpreter
Due April 17, 2009
In this project, you will write an OCaml module that can parse regular
expressions, construct an NFA from a regular expression, and simulate
the execution of the NFA.
- Download the archive file
and extract its contents.
Along with files used to make direct submissions to the
submit server (submit.jar,
submit.rb), you will
find the following project files:
To test your implementation, you can execute the public tests from the
command line by typing commands like ocaml public_RE_to_str.ml.
- Your OCaml program
- Public tests
- Expected output for public test
- Ruby script to run public tests
For this project you are allowed to use the library
functions found in the Pervasives
module loaded by default, as well as the List, Char, and String modules.
Your job is to implement a module Nfa that includes
an API for implementing and using both NFAs and regular expressions.
The signature for the NFA module is provided.
You may not change the NFA signature in
any way, though of course your implementation may include more
types and functions than are listed in the signature.
Note that parts 1 & 2 may be implemented independently, but
completing part 3 will require implementing both parts 1 & 2.
Part 1: Regular Expressions
The first part of this project is to implement regular expressions.
The signature NFA contains the following
module type NFA =
type regexp =
| Char of char
| Union of regexp * regexp
| Concat of regexp * regexp
| Star of regexp
(* Given a regular expression, return a string for regular expression in
postfix notation (as in project 2), with symbols and operators separated
by a single space. Always print the first regexp operand first, so
the output string will always be same for each regexp.
val regexp_to_string : regexp -> string
(* Given a regular expression as string, parses it and returns the
equivalent regular expression represented as the type regexp.
val string_to_regexp : string -> regexp
Here regexp is an OCaml datatype that represents regular
You need to implement the following:
- Empty_String represents the regular expression recognizing the
empty string (not the empty set!). Written as a formal regular expression, this would be epsilon.
- Char c represents the regular expression that accepts the
single character c. Written as a formal regular expression, this would be c.
- Union (r1, r2) represents the regular expression that is
the union of r1 and r2. For example, Union(Char
'a', Char'b') is the same as the formal regular expression a|b.
- Concat (r1, r2) represents the concatenation of
r1 followed by r2. For example, Concat(Char
'a', Char 'b') is the same as the formal regexp ab.
- Star r represents the Kleene closure of regular
expression r. For example, Star (Union (Char 'a', Char
'b')) is the same as the formal regexp (a|b)*
- Write a function regexp_to_string that takes a
a regular expression and returns a string for the regular expression
in the postfix notation used in project 2. You can do this as a postorder
DFS traversal over the regexp data structure.
- Write a function string_to_regexp that takes one input parameter (a string), which is
a regular expression. It parses the string and outputs its equivalent regexp
Some examples of regular expressions and their equivalent regexp
data type are:
- The regular expressions can contain only (, ), |, *, a, b, ..., z and E (for epsilon).
- Note that the precedence for regular expressions are as follows, from highest to lowest:
- Note that all the binary operators are right associative.
- Your function should throw an IllegalExpression exception for invalid regular expressions.
||Union(Char 'a',Char 'b')
||a b |
||Concat(Char 'a',Char 'b')
||a b .
||Concat(Char 'a',Concat(Char 'a',Char 'b'))
||a a b . .
||a E | *
||Concat(Star(Union(Char 'a',Empty_String)),Union(Char 'a',Char 'b'))
||a E | * a b | .
Part 2: NFAs
For the second part of this project, you will write a series of
functions to simulate NFAs using OCaml.
module type NFA =
(* Abstract type for NFAs *)
(* Type of an NFA transition.
(s0, Some c, s1) represents a transition from state s0 to state s1
on character c
(s0, None, s1) represents an epsilon transition from s0 to s1
type transition = int * char option * int
(* Returns a new NFA. make_nfa s fs ts returns an NFA with start
state s, final states fs, and transitions ts.
val make_nfa : int -> int list -> transition list -> nfa
(* Takes a step in an NFA. step m ss c returns a list of states that m
could be in on seeing input c, starting from any state in ss.
There should be no duplicates in the output list of states.
If some state n is in the input list, then this function should
behave as if any states reachable from n via epsilon
transitions are also in the input list.
Similarly, if some state n is in the output list, then any
states reachable from n via epsilon transitions should also be
in the output list.
val step : nfa -> int list -> char -> int list
(* Returns true if the NFA accepts the string, and false otherwise *)
val accept : nfa -> string -> bool
Here are descriptions of the elements of this signature, and what you
need to do to implement them:
- type nfa - This is an abstract type representing
NFAs. It is up to you to decide exactly how NFAs are implemented.
Since the type is abstract, no client that uses your module will be
able to see exactly how they are implemented.
- type transition = int * char option * int - This is a
(non-abstract) type we've made up for convenience to describe an NFA
transition. In the NFAs for this project, states will simply be
identified by number. Then (s0, Some c, s1) represents a
transition from the state numbered s0 to the state numbered
s1, via an arc labeled with the character c. Notice that
the character is optional---the transition (s0, None, s1)
represents an epsilon transition from s0 to s1.
- make_nfa : int -> int list -> transition list -> nfa.
This function takes as input the starting state, a list of final
states, and a list of transitions, and returns an NFA. Again, it is
up to you to decide exactly how NFAs should be implemented, but you
probably do not need to do much more than track these three components
(the starting state, final states, and transition list). As one example,
let m = make_nfa 0  [(0, Some 'a', 1); (1, None, 2)]
sets m to be an NFA with start state 0, final state 2, a
transition from 0 to 1 on character a, and an epsilon
transition from 1 to 2.
- step : nfa -> int list -> char -> int list. This
function takes as input an nfa, a list of initial states, and a
character. The output will be a list (with no duplicates) of states
that the NFA might be in after scanning the character, starting from
the list of initial states given as an argument. For example, letting
m be the NFA above,
step m  'a' (* returns [1; 2] *)
step m  'b' (* returns  *)
step m [0;1] 'a' (* returns [1; 2] *)
step m  'a' (* returns  *)
The step function should account for epsilon transitions. In
particular, (1) you should add to the output list any states reachable via
epsilon transitions from the output list, and (2) you should add to
the set of initial states supplied as a parameter any states reachable
from them via epsilon transitions. Normally you would only do one or
the other of these in an implementation, but for purposes of this
project you must do both.
Also notice that there is no explicit dead state. Instead, if
s is a state in the input list and there are no transitions
from s on the input character, then all that happens is that
no states are added to the output list for s.
The order of states in the output does not matter.
- accept : nfa -> string -> bool. This function takes an
NFA and a string, and returns true if the NFA accepts the string, and
false otherwise. (Hint: You'll want to look at the functions in the
Hint: You need to be a bit careful whenever you combine NFA
representations to be sure that state names (i.e., integers) don't conflict.
You might use the following internal function as an aid in this process:
- next : unit -> int - Return a new integer, different from
any values previously returned by next. (This function is defined on
the OCaml slides.)
Part 3: Converting Regular Expressions to NFAs
The final part of this project is to combine parts 1 & 2 and
implement the regular expression to NFA construction.
module type NFA =
(* Given a regular expression, return an nfa that accepts the same
language as the regexp
val regexp_to_nfa : regexp -> nfa
Your job for this part is to write the function
regexp_to_nfa, which takes a regexp and returns an
NFA that accepts the same language. You'll want to refer back to the
slides from class on this construction. Unlike project 2, as long as
your NFA accepts the correct language, the structure of the NFA does
not matter (since the NFA produced by regexp_to_nfa will only be
tested by seeing which strings it accepts).
You can submit your project in two ways:
Submit your file nfa.ml directly to the
by clicking on the submit link in the column "web submission".
Next, use the submit dialog to submit your nfa.ml file.
Select your file using the "Browse" button,
then press the "Submit project!" button.
Submit directly by executing a Java program on a computer
with Java and network access. Use the submit.jar file
from the archive p4.zip,
To submit, go to the directory containing your project, then either
execute submit.rb or type the following command directly:
java -jar submit.jar
You will be asked to enter your class account and password, then
all files in the directory (and its subdirectories) will be
put in a jar file and submitted to the submit server.
If your submission is successful you will see the message:
Successful submission # received for project 4
The Campus Senate has adopted a policy asking students to include the
following statement on each assignment in every course: "I pledge on
my honor that I have not given or received any unauthorized assistance
on this assignment." Consequently your program is requested to
contain this pledge in a comment near the top.
Please carefully read the academic honesty section of the
course syllabus. Any evidence of impermissible cooperation on
projects, use of disallowed materials or resources, or unauthorized
use of computer accounts, will be submitted to the Student
Honor Council, which could result in an XF for the course, or
suspension or expulsion from the University. Be sure you understand
what you are and what you are not permitted to do in regards to
academic integrity when it comes to project assignments. These
policies apply to all students, and the Student Honor Council does not
consider lack of knowledge of the policies to be a defense for
violating them. Full information is found in the course
syllabus---please review it at this time.