CMSC 330, Fall 2007

Organization of Programming Languages

We meet in CSI 1115 on Tuesdays and Thursdays

Project 4

Due November 21, 2007
11:59:59pm

Introduction

In this project, you will write an OCaml module that can simulate an NFA and can construct an NFA from a regular expression.

What to Submit

You should submit the file nfa.ml.

You can find a p4 directory here that contains an nfa.ml to start from and a .submit file:

http://www.cs.umd.edu/~atif/Teaching/Fall2007/project4/p4.tar.gz

Part 1: NFAs in OCaml

In the first part of this project, you will write a series of functions to simulate NFAs using OCaml. We've supplied you with a file nfa.ml that contains the following module signature for NFAs:

module type NFA =
  sig
    (* You may NOT change this signature *)


    (* Abstract type for NFAs *)
    type nfa

    (* Type of an NFA transition.

       (s0, Some c, s1) represents a transition from state s0 to state s1
       on character c

       (s0, None, s1) represents an epsilon transition from s0 to s1
     *)
    type transition = int * char option * int
      
    (* Returns a new NFA.  make_nfa s fs ts returns an NFA with start
       state s, final states fs, and transitions ts.
     *)
    val make_nfa : int -> int list -> transition list -> nfa

    (* Takes a step in an NFA.  step m ss c returns a list of states that m
       could be in on seeing input c, starting from any state in ss.

       There should be no duplicates in the output list of states.

       If some state n is in the input list, then this function should
       behave as if any states reachable from n via epsilon
       transitions are also in the input list.

       Similarly, if some state n is in the output list, then any
       states reachable from n via epsilon transitions should also be
       in the output list.
     *)
    val step : nfa -> int list -> char -> int list

    (* Returns true if the NFA accepts the string, and false otherwise *)
    val accept : nfa -> string -> bool

    ...
end

Your job is to implement a module Nfa that has this signature. You may not change the NFA signature in any way, though of course your implementation may include more types and functions than are listed in the signature.

Here are descriptions of the elements of this signature, and what you need to do to implement them:

  • type nfa - This is an abstract type representing NFAs. It is up to you to decide exactly how NFAs are implemented. Since the type is abstract, no client that uses your module will be able to see exactly how they are implemented.
  • type transition = int * char option * int - This is a (non-abstract) type we've made up for convenience to describe an NFA transition. In the NFAs for this project, states will simply be identified by number. Then (s0, Some c, s1) represents a transition from the state numbered s0 to the state numbered s1, via an arc labeled with the character c. Notice that the character is optional---the transition (s0, None, s1) represents an epsilon transition from s0 to s1.
  • make_nfa : int -> int list -> transition list -> nfa. This function takes as input the starting state, a list of final states, and a list of transitions, and returns an NFA. Again, it is up to you to decide exactly how NFAs should be implemented, but you probably do not need to do much more than track these three components (the starting state, final states, and transition list). As one example,
      let m = make_nfa 0 [2] [(0, Some 'a', 1); (1, None, 2)]
    
    sets m to be an NFA with start state 0, final state 2, a transition from 0 to 1 on character a, and an epsilon transition from 1 to 2.
  • step : nfa -> int list -> char -> int list. This function takes as input an nfa, a list of initial states, and a character. The output will be a list (with no duplicates) of states that the NFA might be in after scanning the character, starting from the list of initial states given as an argument. For example, letting m be the NFA above,
      step m [0] 'a'    (* returns [1; 2] *)
      step m [0] 'b'    (* returns [] *)
      step m [0;1] 'a'  (* returns [1; 2] *)
      step m [1] 'a'    (* returns [] *)
    
    The step function should account for epsilon transitions. In particular, (1) you should add to the output list any states reachable via epsilon transitions from the output list, and (2) you should add to the set of initial states supplied as a parameter any states reachable from them via epsilon transitions. Normally you would only do one or the other of these in an implementation, but for purposes of this project you must do both.

    Also notice that there is no explicit dead state. Instead, if s is a state in the input list and there are no transitions from s on the input character, then all that happens is that no states are added to the output list for s.

  • accept : nfa -> string -> bool. This function takes an NFA and a string, and returns true if the NFA accepts the string, and false otherwise. (Hint: You'll want to look at the functions in the String library.)

Part 2: Regexps using NFAs

The next part of this project is to implement the regular expression to NFA construction. The signature NFA also contains the following declaractions:

module type NFA =
  sig
    (* You may NOT change this signature *)
  ...
    type regexp =
	Empty_String
      | Char of char
      | Union of regexp * regexp
      | Concat of regexp * regexp
      | Star of regexp

    (* Given a regular expression, return an nfa that accepts the same
       language as the regexp
     *)
    val regexp_to_nfa : regexp -> nfa
    
  ...  
  end

Here regexp is an OCaml datatype that represents regular expressions:

  • Empty_String represents the regular expression recognizing the empty string (not the empty set!). Written as a formal regular expression, this would be epsilon.
  • Char c represents the regular expression that accepts the single character c. Written as a formal regular expression, this would be c.
  • Union (r1, r2) represents the regular expression that is the union of r1 and r2. For example, Union(Char 'a', Char'b') is the same as the formal regular expression a|b.
  • Concat (r1, r2) represents the concatenation of r1 followed by r2. For example, Concat(Char 'a', Char 'b') is the same as the formal regexp ab.
  • Star r represents the Kleene closure of regular expression r. For example, Star (Union (Char 'a', Char 'b')) is the same as the formal regexp (a|b)*

Your job for this part is to write the function regexp_to_nfa, which takes a regexp and returns an NFA that accepts the same language. You'll want to refer back to the slides from class on this construction. As long as your NFA accepts the correct language, the structure of the NFA does not matter.

Hint: You need to be a bit careful whenever you combine NFA representations to be sure that state names (i.e., integers) don't conflict. You might write the following three internal functions as an aid in this process:

  • next : unit -> int - Return a new integer, different from any values previously returned by next. (This function is defined on the OCaml slides.)
  • max : nfa -> int - Return the maximum numbered state for which there is a transition
  • renumber : nfa -> int -> nfa - Given an NFA, add the integer uniformly to all of the states of the NFA

Part 3: Parsing Regular Expression

Write a function string_to_regexp that takes one input parameter (a string), which is a regular expression. It parses the string and outputs its equivalent regexp OCaml data type defined in Part 2.

module type NFA =
  sig
    (* You may NOT change this signature *)
  ...
    
    (* Given a regular expression as string, parses it and returns the
       equivalent regular expression represented as the type regexp.    
     *)
    val string_to_regexp : string -> regexp
  end

Note:

  • The regular expressions can contain only (, ), |, *, a, b, ..., z and E (for epsilon).
  • Note that the standard operator precedence rules apply.
  • Note that all the binary operators are right associative.
  • Your function should throw an IllegalExpression exception for invalid regular expressions.
Some examples of regular expressions and their equivalent regexp data type are:
    String Inputregexp Output
    aChar 'a'
    a|bUnion(Char 'a',Char 'b')
    (a|E)*Star(Union(Char 'a',Empty_String))
    (a|E)*(a|b)Concat(Star(Union(Char 'a',Empty_String)),Union(Char 'a',Char 'b'))

Academic Integrity

The Campus Senate has adopted a policy asking students to include the following statement on each assignment in every course: "I pledge on my honor that I have not given or received any unauthorized assistance on this assignment." Consequently your program is requested to contain this pledge in a comment near the top.

Please carefully read the academic honesty section of the course syllabus. Any evidence of impermissible cooperation on projects, use of disallowed materials or resources, or unauthorized use of computer accounts, will be submitted to the Student Honor Council, which could result in an XF for the course, or suspension or expulsion from the University. Be sure you understand what you are and what you are not permitted to do in regards to academic integrity when it comes to project assignments. These policies apply to all students, and the Student Honor Council does not consider lack of knowledge of the policies to be a defense for violating them. Full information is found in the course syllabus---please review it at this time.

Valid HTML 4.01!