CMSC 430: Project 3: C-- Type checker / AST generator Your goal is to build a type checker for C-- and to generate an intermediate representation in AST form. ************************************** Language Type Checking ************************************** 1. Data types: int, void. There can be arrays of int. Booleans cannot be declared directly, but can be created by relational operators. 2. Operators include: arithmetic +, -, *, / relational ==, !=, <, > logical &&, ||, ! All math operators (+, -, *, /) and comparisons (<, >) are defined for only integers. Logical operators (&&, ||, !) apply only to results of comparisons (booleans). Equality comparison operators (==, !=) apply to both integers and booleans. 3. Functions: All functions must be declared with type int or void. Function calls in C-- can appear either on the RHS of an expression or as a statement: x = a + foo() + 10; // foo must return an integer foo(); // foo may return either integer or void A function call will have either 0 or 1 arguments. Functions may have 'return' statements. There can be one or more return statements in the statement list. The syntax for the return statements are return; // when returns void return exp; // when returns integer 4. Forward Function Declarations: A function has to be defined before a call to it can be made. For instance, the following code should cause bar() to be marked as an undefined function. void foo() { bar(1); } void bar(int x) { x = 1; } However, we do allow forward function declarations. They are similar to actual function declarations, except they do not contain a function body. For example, the following code contains a forward declaration of bar(), which allows it to be used before it is actually defined: void bar( int x ); foo() { bar(1); } void bar( int y ) { y = 1; } Forward declarations specify the function return type and parameter type. They must match with actual function declaration. The name of the parameter in the forward declaration is not relevant. 5. There are three built-in library functions: printInt(), printStr(), and printLn(). The compiler will automatically generate code for these functions if they are used. printInt() takes an int as an argument, printStr() takes a literal constant string, and printLn() has no parameters. For instance: int x; printInt(1); printInt(x); printInt(x+1); printStr("foo"); printLn(); 6. C-- applies nested lexical scoping, where every pair of curly brackets creates a new nested scope (and can include new variable declarations). Variables visible include those declared locally and in enclosing scopes. For instance: A: int x; foo() { B: int x,y; if (x == 1) // refers to x declared at B: { C: int x,z; x = y+z; // refers to x,z declared at C:, // y declared at B: } else { int i,j; x = 3; // refers to x declared at B: // x,z at C: are not visible } } ********************************************** Requirements ********************************************** Type checker: Make sure that all variables accessed are legally declared and have proper types. Warn of undefined variables or variables defined multiple times. A variable can be local or global or passed as a parameter. Goals: 1) enter types of all symbols in symbol table 2) properly handle two levels of lexical nesting: global & local 3) use type information to report the following type errors "undefined variable " "multiple declaration of " (e.g., int x, int x) "requires integer" (for + * < ...) "requires boolean" (for &&, ||, IF, WHILE) "assignment type mismatch" (e.g, x = i < 2) "operand type mismatch" (for ==, !=) "misuse of " (misuse x[i], x(i), x) "parameter mismatch" (e.g., foo(int x), foo()) "return mismatch" (e.g., void foo() { return 5; } "forward decl mismatch " (return type, param) Tasks: check operand types ensure boolean operands where required find undefined variables find multiply defined variables check function parameters/arguments check forward declarations match function decl support printInt, printStr, printLn Tasks: 1) Build scanner & parser using JLex and CUP. 2) Build symbol tables, perform type checking (in mycc.cup) 3) Build AST using action routines (in mycc.cup). Produce ASTs as combinations of expNode and astNode. You do not need to build the AST if any error is encountered. ********************************************** Setup ********************************************** Project 3 requires only JLex, CUP as before. ** Commands to build compiler (in script "go") java JLex.Main mycc.lex mv mycc.lex.java Yylex.java java java_cup.Main < mycc.cup javac -g -d . *.java ** Commands to test the compiler (in script "go1") java mycc.parser $prog.c > $prog.log cat $prog.log **** Description of project files (Files you need to modify) mycc.lex Regular expressions & actions for scanner mycc.cup Grammar & actions for parser (Files you should not need to modify) Const.java Constants for mycc symTabEntry.java Symbol table entries symTab.java Symbol tables astNode.java AST nodes astNodeList.java list of AST nodes expNode.java expression nodes sym.java produced by CUP from mycc.cup parser.java produced by CUP from mycc.cup Yylex.java produced by JLex go script for building compiler go1 script for testing compiler on one file goTest script for testing compiler on several test files goAll script for building & testing compiler on several test files toy*.c toy input files that the skeleton compiler can compile test*.c test input files for project test*.out example compiler output files from working project ********************************************** Description of main data structures ********************************************** 1) parser Generated by CUP, this is the class which performs the actual parse. All action code in your CUP grammar are executed as methods of class "parser". Important public fields include globalSymTab, currSymTab, and curType. These fields are used to store information to be transfered between different actions in CUP. The "parser" class also contains main(), the starting point of the user code. 2) sym Generated by CUP, this class contains the constants for all the token in the grammar. Useful constants include sym.INT, sym.BOOL, and sym.VOID. Also used by JLex for the scanner. 3) Yylex Generated by JLex, this class implements the scanner. 4) symTab Stores all the symbols in a scope as a HashMap. The symbol table also keeps track of its parent. All local nested scopes are kept in a children list, except for function scopes, which are stored in the entry for the function symbol. 5) symTabEntry Stores information for each symbol. Generic information includes name (as a String) & type (sym.INT, etc). Also includes information specific to arrays & functions. Important to note the symTabEntry for a function stores the symTab for that function. 6) expNode Stores information for individual instructions. Basic unit of abstract syntax tree. Contains a number of fields whose contents vary depending on node type. Fields: int nodeType; // type of the node: // Const.INSTRUCTION, Const.IDENTIFIER, // Const.INTEGER, Const.STRING, // Const.ARRAY, Const.CALL int expType; // type of the variable: // sym.INT, sym.VOID, sym.BOOL int val; // value of integer constant String str; // content of literal string symTabEntry ident; // pointer to the symbol table entry int opcode; // opcode of the instruction expNode op1; // operand 1 expNode op2; // operand 2 Node types: if Const.INTEGER expType = sym.INT val = value if Const.STRING expType = sym.VOID str = literal string if Const.ARRAY expType = sym.INT op1 = array subscript if Const.IDENTIFIER expType = sym.INT or sym.VOID ident = symTabEntry for identifier if Const.CALL expType = sym.INT or sym.VOID op1 = name of function op2 = function argument (may be null) if Const.INSTRUCTION expType = sym.INT, sym.VOID, or sym.BOOL opcode = code for instruction op1 = first operand op2 = 2nd operand 7) astNode Node of the abstract syntax tree. Contains a number of fields whose contents vary depending on node type. Fields: int treeType; // type of the tree expNode treeExp; // useful for if, while, instructions astNode node1; // useful for if and while astNode node2; // useful for else astNodeList nlist; // useful for block stmt Node types: if TREE_INSTR treeType = Const.TREE_INSTR expNode = instruction if TREE_IF treeType = Const.TREE_IF treeExp = condition node1 = if block statement node2 = else block statement if TREE_WHILE treeType = Const.TREE_WHILE treeExp = condition node1 = loop body if TREE_BLK treeType = Const.TREE_BLK nlist = list of block statements if TREE_PROC treeType = Const.TREE_PROC treeExp = name of function node1 = block statement body