|
|
Project 2 - C-- Scanner & Parser
Due 11:59pm Thursday, March 12th, 2009 (Extension to Mon, March 16th)
Introduction
This project is the first in the series of projects that you have to
implement in order to build a compiler for 'C--', a variant of 'C'.
Your compiler will scan, parse, type check, and eventually generate
optimized Java class files from input C-- programs.
In this project you will implement the compiler front end
(scanner & parser) for C--.
Getting Started
Project 2 will be implemented in Java in Eclipse.
Download the following file:
- Eclipse project archive file p2.zip
You can import the project into Eclipse as an existing project
in an archive file.
Before making any changes to your code, make sure you can run
the code in StudentTests.java (applies skeleton scanner/parser
to toy1.c and toy2.c). Remember that you may need to apply
"Refresh" to the entire project so Eclipse can find the
files generated by JLex and CUP.
Along with files used to make direct submissions to the
submit server (submit.jar, .submit), you will
find the following project files:
- Scanner specification for JLex - mycc.lex
- Parser specification for Cup - mycc.cup
C-- Language Syntax
- Comments
// Ignore to the end of line
-
Data types: int & void. There can be arrays of int.
There are no explicit boolean variables, but expressions
may have type boolean if they are the result of a comparison.
-
Operators include:
- arithmetic: +, -, *, /
- relational: ==, !=, <, >
- logical: &&, ||, !
All binary arithmetic and logical operators are left-associative.
NOT (!) and UNARY_MINUS (-) are not associative. I.e., (- - 2)
is illegal; you must use -(-2). Arithmetic operators require
integer operands, logical operators require boolean operands,
The relational operators == and != work on both bool & int.
The relational operators < and > work on only int. Relational
operators ==, !=, <, > all return a boolean.
The operators have the following precedence:
| Highest | ! (unary minus)
|
| * /
|
| + -
|
| == != > <
|
| Lowest | && ||
|
You have to assign the appropriate precedences in CUP.
All math operators (+, -, *, /) and comparisons (<, >)
are defined for only integers. Logical operators (&&, ||, !) apply
only to results of comparisons (booleans). == and != apply to both
booleans and integers.
-
Control Statements:
C-- provides three control statements, if, if-else, and while.
if ( condition )
{
statement_list;
}
if ( condition )
{
statement_list;
}
else
{
statement_list;
}
while ( condition )
{
statement_list;
}
There can be any level of nesting of these control statements.
Curly braces are required for all C-- control statements.
For example, the code
if (x != 0)
a = b + c;
should produce a syntax error.
The correct syntax would be
if (x != 0)
{
a = b + c;
}
There is no implicit type casting in conditional statements.
For instance, "if (x)" is not valid if "x" is an int. You
need to write "if ( x != 0)"
Functions:
All functions must be declared with type int or void.
Function calls in C-- can appear either on the RHS of an
expression or as a statement:
x = a + foo() + 10; // foo must return an integer
foo(); // foo may return either integer or void
A function call will have either 0 or 1 arguments.
Functions may have 'return' statements. There can be one or more
return statements in the statement list. The syntax for the
return statements are
return; // when returns void
return exp; // when returns integer
Program Syntax:
A valid C-- program consists of zero or more global
variable declarations, followed by one or more function
definitions. The body of the functions may contain local
variable declarations, followed by the code.
The function structure will be
[ int | void ] function_name( 0 or 1 parameters )
{
local variable declarations; // optional
statement list;
}
Forward Function Declarations:
A function has to be defined before a call to it can be made.
For instance, the following code should cause bar() to be
marked as an undefined function.
void foo()
{
bar(1);
}
void bar(int x)
{
y = x;
}
However, we do allow forward function declarations. They are
similar to actual function declarations, except they do not
contain a function body.
For example, the following code contains a forward declaration
of bar(), which allows it to be used before it is actually defined:
void bar( int x );
foo()
{
bar();
}
void bar( int y )
{
y = 1;
}
Forward declarations specify the function return type and
parameter type. They must match with actual function declaration.
The name of the parameter in the forward declaration is not relevant.
Built-in functions
There are three built-in library functions: printInt(), printStr(),
and printLn(). The compiler will automatically generate code for
these functions if they are used. printInt() takes an int as an
argument, printStr() takes a literal constant string, and printLn()
has no parameters.
For instance:
int x;
printInt(1); printInt(x); printInt(x+1);
printStr("foo");
printLn();
Scoping
C-- applies nested lexical scoping, where every pair of curly
brackets creates a new nested scope (and can include new variable
declarations). Variables visible include those declared locally
and in enclosing scopes. For instance:
A: int x;
foo()
B: int x,y;
if (x == 1) // refers to x declared at B:
{
C: int x,z;
x = y+z; // refers to x,z declared at C:,
// y declared at B:
}
else
{
int i,j;
x = 3; // refers to x declared at B:
// x,z at C: are not visible
}
}
Requirements
Lexical Analyzer:
The lexical analyzer should identify the keywords, operators, identifiers,
strings, constants and other necessary characters correctly. You have to
generate the lexical analyzer using JLex, the java lexical analyzer generator.
The JLex specifications are very similar to the lex specifications.
Read the JLex manual to learn JLex specifications and syntax.
You may need to explicitly look for carriage returns on PC-based
systems, if the %notunix command to the lexer is not working. You
can do this by specifying "\r\n" instead of "\n" as the newline.
Goals:
- find all legal tokens
- handle comments
- Report the following simple scanning errors using parser.msg():
- "illegal character "
- "unterminated string"
Tasks:
- add comments
- extend strings
- extend numbers
Parser:
The parser is generated using the java parser generator, CUP.
CUP specifications are similar to yacc/bison, taking an LALR grammar.
Read the CUP manual to learn CUP specifications and syntaxes.
Report simple syntax errors, and attempt to recover.
Goals:
- Accept all legal C-- syntax
- Report the following simple syntax errors using "error" token and parser.msg():
- "illegal statement"
- "illegal expression"
- "illegal variable declaration"
- "missing semicolon"
Tasks:
- VOID functions
- array declarations
- multiple variable declarations
- nested lexical scoping
- arithmetic, logical, and relational operators
- forward function declarations
Project Files
You have been given a toy parser which parses a subset of
the C-- language. You need to enhance them to parse the full
C-- language. The files are listed below, with a brief description.
All your modifications will be to the files mycc.lex and mycc.cup.
-
mycc.lex
The JLex specification file for implementing the scanner.
Most of the basic tokens have been added for you. All you
need to do is extend it for comments, strings, and better
handling of identifiers and numbers.
-
mycc.cup
The CUP specification file. You need to extend this file in
order to implement the parser. Most of your changes should be to
this file. A lot of the grammar and actions have been provided.
-
symTabEntry.java
Code for implementing symbol table entries
You should not need to modify this file.
-
symTab.java
Code for implementing the symbol table
You should not need to modify this file.
-
expNode.java
Code for storing information with expression nodes
You should not need to modify this file.
-
Yylex.java, parser.java, sym.java
Files created by JLex & CUP
-
goAll, goAll.bat
Script for compiling mycc & testing it on test*.in
-
goTest, goTest.bat
Script testing mycc on test*.in
-
goJar, goJar.bat
Scriptfile for putting mycc.cup & mycc.lex in p2.jar for submission.
-
toy*.c
Some sample input C-- programs handled by the toy front end initially provided
-
test*.c
Some sample input C-- programs you should try to parse
-
test*.log
Log files created by mycc when using the goAll scripts
-
test*.out
Some sample output files from a fully implemented front end
Main Classes
- parser
Generated by CUP, this is the class which performs the actual
parse. All action code in your CUP grammar are executed as
methods of class "parser". Important public fields include
globalSymTab, currSymTab, and curType. These fields are used
to store information to be transferred between different actions
in CUP. The "parser" class also contains main(), the starting
point of the user code. You should use the static method msg()
to report any scanner & parser error messages.
- sym
Generated by CUP, this class contains the constants for all
the token in the grammar. Useful constants include sym.INT,
sym.BOOL, and sym.VOID. Also used by JLex for the scanner.
- Yylex
Generated by JLex, this class implements the scanner.
- symTab
Stores all the symbols in a scope as a HashMap. The
symbol table also keeps track of its parent. All
local nested scopes are kept in a children list,
except for function scopes, which are stored in the
entry for the function symbol.
- symTabEntry
Stores information for each symbol. Generic information
includes name (as a String) & type (sym.INT, etc). Also
includes information specific to arrays & functions.
Important to note the symTabEntry for a function stores
the symTab for that function.
- expNode
Stores information as to the type and value of an expression.
Helpful for keeping track of intermediate expressions (e.g., 1+2*4).
Hints and Tips
- When implementing your project in Eclipse, you will likely need to
frequently apply the Refresh command in order for Eclipse to
locate the latest files created by JLex, CUP, and your parser.
- If you examine TestUtils.java, you will see that the
test scripts
- Ignore line numbers and white space
- Ignores "Syntax error" warnings
- Only checks for the presence of the following error messages (ignoring
their location and line numbers)
- "unterminated string"
- "illegal expression"
- "illegal variable declaration"
- "missing semicolon"
However, we reserve the right to examine the actual output of your parser
when evaluating its correctness.
- You may add checks for additional syntax errors if you wish, but it
should not affect whether you pass any of the public/release/secret tests.
Submission
You can submit your project in two ways:
-
Submit your mycc.lex and mycc.cup files by
first putting both files in a single .zip or .jar archive.
The files should be at the top level of the archive,
not in a subdirectory.
You may create a jar archive containing mycc.cup & mycc.lex
by going to the directory containing your project, then either
execute goJar (or goJar.bat) or type the following command directly:
jar cf p2.jar mycc.cup mycc.lex
You can submit the resulting p2.jar archive to
to the submit server
by clicking on the submit link in the column "web submission".
Select the p2.jar archive using the "Browse" button,
then press the "Submit project!" button.
-
-
Submit directly by executing a Java program on a computer
with Java and network access. Use the submit.jar file
from the archive p2.zip,
To submit, go to the directory containing your project, then either
execute goSubmit (or goSubmit.bat) or type the following command directly:
java -jar submit.jar
You will be asked to enter your class account and password, then
all files in the directory (and its subdirectories) will be
put in a jar file and submitted to the submit server.
If your submission is successful you will see the message:
Successful submission # received for project 2
Academic Integrity
Please carefully read the academic honesty section of the
course syllabus. In particular, remember that you are not allowed
to show or copy any code for your project, either from the web or
other students. Any evidence of impermissible cooperation on
projects, use of disallowed materials or resources, or unauthorized
use of computer accounts, will be submitted to the Student
Honor Council, which could result in an XF for the course, or
suspension or expulsion from the University. Be sure you understand
what you are and what you are not permitted to do in regards to
academic integrity when it comes to project assignments. These
policies apply to all students, and the Student Honor Council does not
consider lack of knowledge of the policies to be a defense for
violating them. Full information is found in the course
syllabus---please review it at this time.
|