CMSC 330, Fall 2015
Organization of Programming Languages
Project 1 - Maze Solver
As we saw in lecture, Ruby provides rich support for tasks that involve text processing. For this project, you'll write a Ruby program that processes text files containing maze data, and you will analyze that data to determine certain features of each maze. The goal of this project is to allow you to familiarize yourself with Ruby's built-in data structures and text processing capabilities.
Getting StartedDownload the following zip archive p1.zip. It should include the following files:
Simple Maze Data File Format
Mazes are defined in text files according to a format we describe next, which we refer to as the simple maze data file format. The maze.rb we have provided includes a parser for files in this format. In the last part of the project you will have to write a parser for files in a different format.
The maze data files have a relatively simple structure. Here's an example:
16 0 2 13 11 0 0 du 123.456 0.123456 0 1 uldr 43.3 5894.2341 20.0 5896.904 ... path path1 0 2 urdl path path2 0 2 drlr
The first line in the file is the maze header. It has the form:
<size> <start_x> <start_y> <end_x> <end_y>
These fields indicate (respectively) the size of the maze and the (x,y) coordinates for the start and end points. All mazes are square, so the size indicates both the length and the width. Coordinates start in the upper left-hand corner of the maze and increase as they move down and right. For example, in a maze of size 16, (0,0) is the upper-left corner, and (15,15) is the lower-right corner. With this coordinate system, moving down from a cell increases its y value, and moving right from a cell increases its x value. Thus going up from (5, 8) would take you to (5, 7), going down would take you to (5, 9); going left or right would respectively lead to (4, 8) and (6, 8).
Unlike common mazes that one might find on paper, the start and end points are arbitrary points inside the maze. A valid maze has no openings in the outer wall. The outer perimeter of the maze is a single, solid wall, so you needn't worry about accidentally walking through an open wall out into the space outside the maze.
Every line beyond the first can represent either a cell in the maze or a path through the maze. Each cell specifies where walls are (more precisely are not) in the maze, while a path is a trip through the maze defined by the cells.
Lines representing cells take the form:
<x> <y> <dirs> <weights>
The dirs part is a set of up to four "open wall" characters, (any combination of 'udlr', representing up, down, left, right), followed by up to four floating point weights (separated by spaces), one per character in dirs. For example,
4 7 lur 1.3 5.6 8.2
indicates that the cell at coordinates (4,7) has openings that lead left, up, and right from that cell (and thus there is a wall that prevents movement down). The characters can appear in any order, but may only include 'udlr', and each letter may appear at most once. A direction is not passable if its representative character is not in this list. Similarly, if a maze specification does not mention a particular cell, then you can presume that all of that cell's walls are closed.
Following the list of open walls is a list of weights for each wall opening. These appear in the same order as the open walls: in the example above, the left opening has weight 1.3, the up opening has weight 5.6, and the right opening has weight 8.2. We'll explain what these weights will be used for later.
Lines representing paths take the form:
path <path_name> <start x> <start y> <move 1><move 2>...
In the simple format, there is one path per line. Each path consists of a name, a starting x/y coordinate, and a list of directions (which we'll call "moves"), all concatenated together, that the path takes to reach its destination. The start coordinates must consist only of integers, and directions may only include the letters "u," "d," "l," and "r,"; for example:
path path1 0 2 uurrddllThe path path1 starts at coordinates (0,2) and then proceeds up twice, right twice, down twice, and left twice, to reach its ending point (which happens to be the same as the starting point).
The maze.rb file we have given you will parse in the data in this format. The parser is invoked by the mode print, which prints its results so you can see how it has parsed the different parts of the maze. (You'll change the implementation of print before finishing the project, as described below.)
Part 1: Find Maze Properties
The first thing your program will do, of course, is to read in the maze using the parser provided. You may assume that maze files in the simple format, which we use in parts 1 through 5 of this project, are valid.
Once the maze is read in, your program will compute various properties of the maze, according to the command (mode) it is given. Here are three simple properties you'll compute: the number of open cells in the maze, the number of "bridges", and the list of all cells sorted by their number of openings.
First, if we invoke your script with the mode open, your script should output one line listing the number of cells for which all for directions are open. For example,
% ruby maze.rb open maze1 2The two open cells are at the second row, second and thrid column. (See the pretty-printed version of maze1, below, for a visual depiction.)
Second, if we invoke your script with the bridge mode, your script should output the number of vertically or horizonally open 1-by-3 locations in the maze. Bridges can overlap. For example,
% ruby maze.rb bridge maze1 6
The bridges are:
second row, column 0,1,2; second row column 1,2,3; third row column 0,1,2; second column row 0,1,2; third column row 0,1,2; third column row 1,2,3.
Finally, if we invoke your script with the sortcells mode, your script should print the cells sorted by the number of openings. For example,
% ruby maze.rb sortcells maze1 0,(1,3),(3,0)
The output indicates that two cells (1,3) and (3,0) have no openings, four cells have one opening, etc. Cells with same number of openings are sorted by their column, then row.
Part 2: Process & Sort Paths By Cost
As described in the introduction, some maze files will contain paths. Only paths that travel between cells through openings are valid. For each valid path, you will need to use the weights for each opening in the maze to calculate the cost of the path. For example, if the coordinates (in a simple maze file)
path path1 0 1 drdu
appear in a path, and the cell at (0,1) is defined as
0 1 ldr 342.54 958.1 3.126
the cost of the first move in the path will be 958.1 (the weight for the "d" opening). The cost of a whole path is the sum of the weight of each opening through which it passes. You may assume no two paths will have the same cost.
Once you have found which paths are valid and calculated the cost of each valid path, you need to print out the cost and name of each valid path, in order of cost from lowest to highest. For each valid path print its cost (with exactly 4 decimal places) and name on a separate line, separated by a single space. Hint: you can use printf("%10.4f",x) to print out a float value to 4 decimal places.
% ruby maze.rb paths maze2 99.9958 path1 103.7790 path2
Any paths that are not valid should not be output. If a maze contains no valid paths (or no paths at all), your program should simply print none.
% ruby maze.rb paths maze1 none
Part 3: Pretty-print Maze
The textual specification of mazes makes them difficult to understand. For this part of the assignment, you'll implement a "pretty-printing" function for mazes. Your pretty print format will use the following conventions:
Your program will print a maze in this format when executed with the "print" command.
Here is an example maze that starts at (0,0) and ends at (3,3):
% ruby maze.rb print maze1 +-+-+-+-+ |s| | | + + + +-+ | | +-+ + + + | | | + +-+ +-+ | | | e| +-+-+-+-+ ruby maze.rb print maze2In maze2, the shortest path includes start and end cell. In maze3, the shortest path includes start cell, but does not include the end cell.
Part 4: Find Distance of Cells From Start
For this part, you need to analyze all the openings in a maze to determine the distance of all cells reachable from the start of the maze. We define the distance between two cells x and y to be the number of up/down/left/right cell openings that are passed through when traveling from x to y. If there are multiple paths from x to y, the path with the shortest distance is the distance between x and y. The distance from the start cell to itself is always 0. If there is no valid path from x to y (i.e., y is not reachable from x) then the distance from x to y is undefined.
Once you have calculated the distance for all cells reachable from the start cell, print out the results in order of increasing distance. On each line, first print out the distance d, followed by all cells reachable from the start cell for that distance d. Cells should be printed as coordinates (x,y) in lexicographic order, separated by commas. Note that the first line will thus always be distance 0 followed by the location of the starting point of the maze.
% ruby maze.rb distance maze2 0,(0,3) 1,(0,2) 2,(1,2) 3,(1,1),(2,2) 4,(0,1),(1,0),(2,1),(2,3) 5,(0,0),(2,0),(3,1),(3,3) 6,(3,2)
Part 5: Decide Whether Maze Is Solvable
Now use your script to determine whether or not a maze can be solved. Hint: the cell distances calculated previously might be useful to you in doing this part.
You do not need to return a path representing a solution from start to finish. Your program will only need to indicate whether a path exists by printing "true" when a maze can be solved and "false" otherwise.
% ruby maze.rb solve maze1 true
Part 6: Parse Standard Maze Files
For this last part, we consider a new maze file format, called standard maze files, which is more complex. If we invoke your script with the mode parse, your script needs to read in and parse a standard maze file using Ruby regular expressions, then output the maze in the simple maze file format. If any part of the parsed in maze is invalid, then the output will be different; we discuss these situations further below.
Standard maze file format
We begin by describing the standard maze file format in full detail. Here's an example:
size=16 start=(0,2) end=(13,11) 0,0: du 123.456,0.123456 0,1: uldr 43.3,5894.2341,20.0,5896.904 ... path:"path1",(0,2),u,r,d,...,l; path:"path2",(0,2),d,r,l,...,r
A standard file contains several lines of text according to the following format. The first line is the header, as in the simple case, and is now formatted as follows:
size=<size> start=(<start_x>,<start_y>) end=(<end_x>,<end_y>)Notice that now the size, starting position, and ending position are specifically identified by keywords (size, start, and end, respectively). The spacing should be just as shown above: no spaces before or after the equals sign, and one space between keywords. All of the elements in <> should be nonnegative integers (and size should be at least 1). Lines missing any of the above formatting or having extra or missing spaces are invalid.
Lines representing cells take the form:
<x>,<y>: <dirs> <weights>Following the coordinates (x and y, separated by a comma) is a colon, a space, a set of up to four "open wall" characters ('udlr'), and a comma-separated list of floating point weights (with no space between the commas). Recall the following cell earlier specified in the simple format:
4 7 lur 1.3 5.6 8.2Here is the same specification in the standard format:
4,7: lur 1.3,5.6,8.2
It is acceptable for weights to be negative.
Lines representing paths take the form:
path:"path1_name",(<start x>,<start y>),<move1>,<move2>,...; path:"path2_name",(<start x>,<start y>),...;There are several differences with how paths are formatted in standard maze files, compared to simple maze files. First, each line of text may contain more than one path, with each path separated by a semi-colon followed by a space; the final path also ends with a semi-colon, but no trailing space after it. Each path begins with the keyword path followed immediately by a colon, and then the path name, contained in quotes. Path names can contain any character except space or colon, and quotation marks in path names will be escaped (\"). The first line in the example below shows two path specifications; each path is identified by the second line below (which would not appear in an actual data file):
path:"path1",(0,2),l,r; path:"path2",(0,2),d,u,l;Note that in these examples the last path ends in a semi-colon, but without a trailing space.
If path names have escaped quotes in them, there is no requirement that they correspond to open and closed marks, i.e., you can have any number of escaped quotes in a path's name. In the example
The first path (named hello-"world") starts at (0,2), continuing to (0,1), (1,1), and (1,2). The second path (named goodbye-world) also starts at (0,2), but instead moves to (0,3) and (1,3).
If any path names contain escaped quotes, they must be converted to normal quotes. For instance the names path\"3\" and \"path4\" should be converted to path"3" and "path4" in the simple maze file format output.
In your output, all lines should be in the same order they were in the input file. For example, if path1 came first, and then path2, then make sure these come in the same order and same position in the output file.
Invalid standard mazes
Some lines in a standard maze file may be invalid, i.e., they may not be in the format described above. If any such invalid lines exist, your script should output invalid maze followed by each invalid line in the maze file.
% ruby maze.rb parse maze1-std ...prints out maze1-std in simple maze format... % ruby maze.rb parse maze3-std invalid maze ...prints out all invalid lines in maze3-std...
Furthermore, some lines in a standard maze file, while well formed, may make no logical sense. In particular, they may define cells within the maze that have openings that contradict neighboring cells within the maze or the outer walls of a maze (which should always be closed). If any such cells contradict with regard to their specified openings, your script should output invalid maze followed by the lines from the standard maze file corresponding to whichever cells contradict.
size=16 start=(0,2) end=(13,11) 0,0: u 0.123456 0,1: uldr 43.3,5894.2341,20.0,5896.904 ...
The above two cells are correct in terms formatting but contradict
in terms of the cell openings they specify. As such, both cell
lines should be printed as invalid, i.e.,
invalid maze 0,0: u 0.123456 0,1: uldr 43.3,5894.2341,20.0,5896.904 ...
A standard maze file may contain invalid lines due to formatting and/or cell opening agreement issues; however, there will not be a line that has both formatting and cell opening agreement issues. Any maze file containing an cell opening agreement issue will have a validly formatted header line describing the size, start and end of the maze.
Example standard mazes
For examples of how standard maze files are parsed and used to generate simple maze files (or report errors), look at the files maze1-std.parse.out, maze2-std.parse.out, maze3-std.parse.out, maze4-std.parse.out, and maze1b-std.parse.out generated from maze1-std, maze2-std, maze3-std, and maze4-std.
Hints and Tips
Project SubmissionYou should submit a file maze.rb containing your solution. You may submit other files, but they will be ignored during grading. We will run your solution by invoking:
where <mode> describes what the tool should do (see above), and <file-name> names the file containing the maze data.
Be sure to follow the project description exactly. Your solution will be graded automatically, and so any deviation from the specification will result in losing points. In particular, if you have any debugging output in your program, be sure to turn it off before you submit your program.You can submit your project in two ways:
The Campus Senate has adopted a policy asking students to include the following statement on each assignment in every course: "I pledge on my honor that I have not given or received any unauthorized assistance on this assignment." Consequently your program is requested to contain this pledge in a comment near the top.
Please carefully read the academic honesty section of the course syllabus. Any evidence of impermissible cooperation on projects, use of disallowed materials or resources, or unauthorized use of computer accounts, will be submitted to the Student Honor Council, which could result in an XF for the course, or suspension or expulsion from the University. Be sure you understand what you are and what you are not permitted to do in regards to academic integrity when it comes to project assignments. These policies apply to all students, and the Student Honor Council does not consider lack of knowledge of the policies to be a defense for violating them. Full information is found in the course syllabus---please review it at this time.
This course project is copyright of Dr. Michael Hicks. ©Michael Hicks . All rights reserved. Any redistribution or reproduction of part or all of the contents in any form is prohibited without the express consent of the author.