|
|
c m s c 214
s p r i n g 2 0 0 2 |
Prior to BNF, the grammar of programming languages were often expressed in English or some non-mathematical language. This lead to ambiguities.
Formalizing the grammar had two effects. First, it made the valid programs of a language more precise. Second, because it could be stated in a mathematical form, it made it easier to develop compilers for such languages.
If you want to read a CMSC 114 BNF tutorial, click here.
The cards are laid in front of you left to right.
Initially, you only have one green card. At each step, you are allowed to replace one green card with one or more cards based on something called production rules.
Suppose you have 3 cards (call them X, Y, and Z) which are laid left to right (X is in position 1, Y is in position 2, and Z is in position 3). You wish to turn in Y. You will get certain cards back. You must place these cards in the position that Y occupied. For example, if you get 2 cards back (call them A and B), then you will have four cards: X, A, B, Z. Notice A and B are in the position that Y used to be.
You stop when all you have left are red cards (terminals).
The goal of the game is to produce a sequence of red cards which represent a desired string.
<id> ::= <letter> "1" | <letter><letter>A production rule looks a lot like an assignment statement. However, in an assignment statement, you evaluate expressions on the right hand side of =, and take that value and place it into the memory location of the variable (more properly, an lvalue) on the left hand side.
By contrast, a production rule is often thought of from left to right. The rule above says that you can take a "red card" with <id> written on it, and replace it with the right hand side.
In the example above, there are two things on the right hand side.
When you have two things adjacent to each other, as in <letter> <letter> , this means the two things are concatenated. Realize that every non-terminal will eventually be replaced by a terminal, and the result will be a string.
<word> ::= <letter> | <letter><word>This is awkward to read. It would be nice to have an easier way of showing repetition. And there is. We will add three more operators to BNF to make it easier to read.
We will call the resulting enhancement of the BNF as EBNF, which you can call "extended BNF" or perhaps "enhanced BNF".
Two of these operators are often seen in languages that support regular expressions (e.g., Perl).
<word> ::= [<letter>]+If we wanted to say that a word consists of zero or more letters (thus, the empty string would be considered a valid "word"), the EBNF would be:
<word> ::= [<letter>]*Notice the Kleene star is used, instead of +.
That's all there is to EBNF, at least, as much as I need to use for the course.
Interestingly enough, EBNF is not really any more powerful than BNF. Anything you can write in EBNF, you can translate to something equivalent in BNF. EBNF is, what is often called, syntactic sugar. This means that it gives you convenience, without additional "power".
You might think that one advantage is precision. While that's true, BNF isn't always the easiest way to describe things. Sometimes using English, while more vague, is easier. Using BNF to describe input and output is like using a program to describe how to solve a problem. It often has many more details than needed and requires a lot of work to get it precisely right. It may be easier to write pseudo-code than real code, just as it may be easier to write English rather than BNF.
In fact, the advantages of using BNF is simply your ability to read it. As mentioned earlier, many languages have their syntax written in BNF. It's very possibly you have to read the syntax. Or you may have to develop your own. Either way, a working knowledge of BNF can help.
<lower> ::== "a" | "b" | "c" | "d" | "e" | "f" |
"g" | "h" | "i" | "j" | "k" |
"l" | "m" | "n" | "o" | "p" |
"q" | "r" | "s" | "t" | "u" |
"v" | "w" | "x" | "y" | "z"
<upper> ::== "A" | "B" | "C" | "D" | "E" | "F" |
"G" | "H" | "I" | "J" | "K" |
"L" | "M" | "N" | "O" | "P" |
"Q" | "R" | "S" | "T" | "U" |
"V" | "W" | "X" | "Y" | "Z"
<alpha> ::== <lower> | <upper>
<digit> ::== "0" | "1" | "2" | "3" | "4" |
"5" | "6" | "7" | "8" | "9"
<digits> ::== [ "+" | "-" ]? [ <digit> ]+
<udigits> ::== [ <digit> ]+
<empty> ::==
<endl> ::== "\n"
<left> ::== "<"
<right> ::== ">"
<lparen> ::== "("
<rparen> ::== ")"
<tag> ::== <left> <word> <right>
<under> ::== "_"
<blank> ::== " "
<blanks> ::== [ <blank> ]+
<mblanks> ::== [ <blank> ]+
<sep> ::== <blank> | <endl>
<seps> ::== [ <sep> ]+
<mseps> ::== [ <sep> ]*
<var> ::== <alpha> [ <alpha> | <digit> | <under> ]*
<word> ::== [ <alpha> | <digit> ]+
<bword> ::== [ <alpha> | <digit> | <blank> ]+
<dquote> ::== "\""
<squote> ::== "\'"
<dqstring> ::== <dquote> <bword> <dquote>
<sqstring> ::== <squote> <bword> <squote>