DFAs from Regular expressions
Yesterday we talked about how DFAs work. DFAs are simple computers compute answers to decision problems: questions with yes or no answers. In the last class, I asserted that DFAs and regular expressions were equivalent.
What is equivalence between REs and DFAs
DFAs and regular expressions are equivalent in the following sense: For any regular expression r
, I can generate a DFA that accepts exactly the same set of strings as r
, and vice versa. In other words, there are no regular expressions for which I can’t generate DFAs, and there are no DFAs for which I can’t generate regular expressions.
A few example DFAs and their langauges
Let’s start our discussion today with a few example problems to guide our intuition about DFAs and regular expressions.
English language descriptions
Problem
Give an english language description of the strings accepted by the following regular expression: ((01)(01)(01))*
.
Answer
Binary strings whose length is a multiple of three.
Problem
Give an english language description of the strings accepted by the following regular expression: (01)*(10)*(01)*0(10)*1
.
Answer
All alternating binary strings.
Problem
Give an english language description of the strings accepted by the following DFA:
Answer
All binary strings containing the substring 11
.
(Note: )
Cooking up DFAs from regular expressions
Now, let’s get some intuition about how to generate a DFA from a regular expression. Remember, this is always possible, so we’ll never run into a case where we can’t generate a DFA. We’ll see next how to use a cookiecutter technique to turn an arbitrary regular expression into a DFA.
Using the interactive diagrams
This page contains animated examples of state machines. To interact with them and try a few examples, note the following:

The initial node is always
q0
. There is no incoming arrow to the initial state, as we draw on paper. This is just a limitation of my animation! When you draw them on paper, you should draw them with an initial state like in this diagram 
The current state is always highlighted in blue.

The accepting states are written with double circles.

To move between states, click on the state machine (e.g., next to a transition) and then type the letter (e.g., 1 or 0) that corresponds to the state you want to move to.

You can change the test string (the string the state machine is running) by clicking on it in the upper left corner. It will turn to grey and allow you to edit it. Type a string and then press “enter” (or click it again) to start running the machine on that test string.

You can reset the state of the machine to the beginning by clicking the “reset” button in the upper right hand corner.
Straight line REs
Let’s start with an easy one: 10
.
How do we create a DFA to accept this. Well, first, we know we need an initial state, so let’s draw that:
Now, let’s say we’re in the start state, and we haven’t processed any input at all, what should happen? Should we be able to accept? No, because the empty string does not match the regular expression 10
. Now, let’s think about the following scenario: we’re in the start state, and we see a 1
, what should happen? There are two choices we could make:

We could say that we stay in the start state ()

We could move to a new state, let’s call it
The answer is the second one, we need to move to a new state. Why shouldn’t we loop back around to the start state again? Well, if we did, we’d have a loop from the start state to itself on 1. For example, let’s see what happens when we make loop back to itself:
This is bad, because the self loop means that gobbles up too many 1
s. Instead, we just want it to accept a single 1. In general, if we have a cycle in the state machine, it’s going to accept some expression e*
. We’ll use this intuition later on when we translate regular expressions. So, after accepting 1
, we move to a new state, . Now that we’re at , we know we’ve read the string 1
. Should be accepting? No, because then the string 1
would be accepted, but we want 10
. By similar reasoning, we need another state , and a transition between and :
Now, at q2, we know that we’ve read the string 10
. Now, we should make an accepting state.
But what happens now? Our FSM isn’t total: we don’t say what to do if we read a 1
in state , and we don’t say what to do if we read a 1
or 0
at state . If any of those situations occur, we need to go to an error state, which we’ll call . This state loops around on itself. It traps us, and never lets us escape. Here’s the complete DFA:
Consider the execution on various strings, if we have 101
, we go from , to , to , and then we’re okay to accept. But then we see 1
, and move to . This is required, because if we didn’t have this extra node, we’d accept the string 10
.
Branching REs
Let’s expand this, and write a DFA for the following regular expression: 10
. Well, I know I need a start state . And when we leave , we’re going to have two possibilities:

I read a
1

I read a
0
Should we go back to ? No, because if we did, we’d accept any number of 1
s or 0
s. Instead, we’re going to make new states for when we read a 1
, or when we read a 0
, and call them and . We’re going to make and both accepting states, and then we’re going to add a transition from both of them to an error state.
Now, note that this is not the smallest state machine we could draw. We could have instead had a single state , that went to on either 1
or 0
. In the next lecture, we’ll talk about optimizing regular expressions. Here’s an equivalent, but smaller, DFA:
Looping REs
Now let’s write a regular expression for the following regular expression: (1(10))*
.
In the last example, we talked about how the *
let us loop around. It turns out that the name of this operator is the “kleene star.” The kleene star lets us write zero or more of some regular expression.
Problem
How can I encode e+
in terms of e*
?
For now, let’s ignore the *
operator, and let’s start with the inner regular expression: 1(10)
. How do I build a state machine for that?
Well, we just built a state machine that recognizes (10)
. And then we can easily construct a machine that regoznies simply 1
(by the process in “Straight line REs”). So we take the machine for 1
and run it before the machine for (10)
:
Now for the *
part of (1(10))*
. Consider the set of strings that the machine should accept: 1
, 11
, 101
, 111
, etc.. But not 101
: why not? Because the machine accepts 10
for the first “unfolding” of the *
, and then needs to accept 1(10)
again, but it only has the last 1
to work with, so the regular expression is stuck “waiting for” for more input.
If we think about it a little bit, we’ll realize that the following machine works: