Assignment # 3: Perl


Abbreviation expansion                      DUE: October 11, 1999 - 6:00pm

Many interactive programs, such as text editors, allow a user to define abbreviations for commonly used strings. Once an abbreviation has been defined, it can be automatically translated into its expanded form. For example, if the abbreviation EPA abbreviates the expanded form Environmental Protection Agency, then the sentence:

Fred's Chemical Company and Taco Shack has been fined by
the EPA for illegally dumping toxic waste.

gets expanded into:

Fred's Chemical Company and Taco Shack has been fined by
the Environmental Protection Agency for illegally dumping toxic waste.

Your assignment is to write a Perl program that rewrites input lines, replacing abbreviations with their expanded forms.

Each input line will contain zero or more strings (non-empty sequences of characters), delimited by spaces. Input lines will be of three types:

  1. Definition lines are in the form: String1 #DEF# String2 String3 ··· StringN
    This means that String1 becomes an abbreviation for String2 String3  ··· StringN (i.e. String1 in the input expands to String2 String3  ··· StringN in the output).
  2. Undefinition lines are in the form: #UNDEF# String
    This removes the abbreviation represented by String from the current set of abbreviations.
  3. Any other line of text. Neither #DEF# nor #UNDEF# will appear on a regular text line.

When your program encounters a line of type 1, it should create a new abbreviation for String1 and produce no output. This abbreviation will be applied to all input lines between the definition line and a subsequent undefinition line. If any one or more of the strings in String2 String3  ··· StringN is itself a pre-existing abbreviation, replace each such string with its expansion, before storing the expansion for String1.   If an abbreviation already exists for String1, delete the old expansion after determining the new expansion.

When your program encounters a line of type 2, it should delete the specified abbreviation.

When your program encounters a line of type 3, it should print out the input line, with each abbreviation expanded. Any abbreviation replaced must match an entire word, not just a substring (a part of a word). Words are delimited by whitespace.

No other words should be affected. Abbreviations within expanded text should be ignored. If the line contains no abbreviations, then it should be printed out unchanged (ignoring whitespace).

Lines of types 1 and 3 also continue onto the following line, if the line ends in a backslash (\). This means that two or more input lines effectively translate to one abbreviation definition or output line.

Finally, if the last character on a line of type 3 is a hyphen (-), then the rest of the last word on that line is at the beginning of the next line (which must be of type 3), up to the first whitespace on that line. In other words, the first word on the next line must be appended to the last word of the line with the hyphen, and the hyphenated word is part of the first line.  Also, if the second line has only one word, and that word again ends in a hyphen, then the hyphenated word from the first line continues, and so on for subsequent lines  A hyphen may also be the character before a line continuation character (\), in which case the last word on that line is concatenated with the first word on the continuation line.

The output from your program should consist of one output line for each input line (or each set of input lines, all but the last ending in a backslash) of type 3. You can assume that strings on the left-hand side of all definitions are single words, and that right-hand side of a definition is a (possibly empty) sequences of words separated by whitespace. Also, don't worry if the output lines are very long.

 

For example:

foo
foo #DEF# foo bar baz
bar
foo
baz \
foobar
foo #DEF# foo bar baz
bar
foo
baz-
foobar bar

 

should output:

foo
bar
foo bar baz
baz foobar
bar
foo bar baz bar baz
bazfoobar
bar

 

Handing in the assignment

Instructions for submitting your work:

  1. Name your Perl source file assignment3.pl, and make it executable;
  2. Your Perl file should read test data from the input data file specified on the command line, redirected from stdin, and write its output to stdout, also perhaps redirected to a file (e.g., assignment3.pl < test_data.txt > test_output.txt);
  3. tar the file(s) for submission (e.g., tar cvf submit3.tar assignment3.pl);
  4. submit the tar file: {~as330003/alpha.bin/submit 3 submit3.tar}.

Your work may not be graded if these procedures are not followed exactly.

A large penalty will be assessed if the required output format is not followed exactly.