CMSC 106 | Project #6 | Spring 2003 |
This project will give you practice processing strings using
pointers and C's character and string processing library functions.
This will allow you to gain a fuller understanding of the
relationship in C between arrays and pointers.
In order to make sure you are using the string functions already defined
for you in <string.h>
, you may not use
any loops that process
an array character by character if there is a string function which would
perform the same operation.
You will be writing a simple web page formatter. The program will read a file containing text and HTML-like "tags" which will tell the program how to format the text for output. The tags will control such things as the line length, text spacing, and text case.
As you read this section, you may also want to refer to the ``Sample
output'' section below.
The input data file will consist of text to be printed and
"tags", which are commands enclosed in angle brackets.
The program will read in the text,
format it according to the tags, and print out the result.
Text will consist of any words and punctuation of the English
language, and it may or may not be arranged into sentences and paragraphs.
The input lines will be no longer than 100 characters each.
The size and arrangement of the output lines will be specified by
default values or tags,
but the maximum length of the output lines will be 120 characters.
There is no limit to the number of input lines.
The program will read, format, and print text until it reaches the end of the
input file.
Tags may appear anywhere in the input text or not at all; they may or may not be separated from the text by blanks. Tag names are always given in upper case, but tag parameters are in lower case. Tags take effect immediately after appearing in the input. The program may assume that tags will be properly formatted (names spelled correctly, etc.). If an angle bracket is not immediately followed or preceded by a proper tag, then it is simply printed as part of the output (see the primary input file for an example). Tags are not printed in the output. The attributes of the text which are controlled by tags are:
These are the tags and their descriptions:
The MARGIN tag specifies the number of blanks to be printed before the text on any output line. The margin appears to the left of the output text, regardless of the alignment. The default margin size is 0.
The WIDTH tag specifies the maximum size of the text (not including the margin) to be printed on any output line. The default width is 80; the maximum is 120; there is no minimum.
The ALIGN tag specifies the alignment of text on a line, where a is either "left", "center", or "right".
When the ALIGN tag appears in the input, the current output line is printed as if there had been a break tag at that point. The next output line will be printed with the alignment specified on the tag. The default alignment is left.
The BR (break) tag acts like a newline character; it indicates the end of the current output line. The following text will appear on the next output line.
The P (paragraph) tag is similar to the break tag in that it causes a line break. However, it also causes a paragraph break, which means that the current line is followed by an empty line, and the next line begins with an indentation of 5 blanks from the margin. The blanks are added to the beginning of the text, regardless of the current alignment. For example, for center alignment, a line consisting of 5 blanks plus text will be centered within the number of columns given by the current width.
The FONT tag specifies the type of text to be printed, where "f" is either "upper"
or "lower". If upper is specified, then all following text is to be printed in upper case.
If lower is specified, then all following text is to be printed in lower case. The FONT tag
remains in effect until it is turned off by a /FONT
tag.
FONT tags are not nested; in other words, an upper font followed by a lower font followed
by a
/FONT
tag will go back to mixed case, not upper case.
The default font type
is to print text exactly as shown in the input (mixed case).
arr[i]
and you have a loop that uses i++
- you are traversing the
array character by character. If you are using pointers to a
character array char *ptr
and you have a loop that uses
ptr++
, you are traversing the array character by character.
Note that you may need a pointer to keep track of where
you are in a string, but you must use string library functions wherever
possible to perform operations on the string.
strlen
, strchr
, strcat
, strcpy
,
strncat
, strncpy
, strcmp
, strncmp
,
strcspn
, strpbrk
, strrchr
, strspn
,
strstr
, or strtok
.
All your C programs in this course should be written in ANSI C, which
means they must compile and run correctly with cc -std1 -trapuv
on the OIT
UNIX Class Cluster. You will lose credit if your program generates any
warning messages when it is compiled. Prototypes must appear for all
functions defined, prototypes must be listed at the top of the program file,
and at most one
return
statement may be used in any function, including main
.
Even if you already know what they are, you may not use any C language
features other than those introduced in Chapters 1 through 13 of your
textbook, plus those in Section 23.4 and any function in Section 23.5 whose
name begins with ``str
'', and the language features presented in
lecture while these chapters were covered. In addition neither the
goto
nor the continue
statement may be used, and the
break
statement may not be used in any loop. Your program
may not use the exit()
library function at all. Lastly, no
global variables may be used. Using any of these disallowed C features
will result in losing credit.
Your program must make use of C's string library functions anywhere possible. You will lose substantial credit if you write code duplicating their effects, rather than just calling them (e.g., if you write loops to copy or compare strings, search for something in a string, find the length of a string, etc., you will lose credit). The character library functions (textbook section 23.4) may also be used, but you will not be penalized if you duplicate their effects rather than using them. You will lose substantial credit if you use any traversals of an array character by character to duplicate the operation of a string library function. If you do need to access individual elements of an array, you MUST use pointer notation, NOT subscripts. (This means that the only square brackets in your program will be used in array declarations, not executable statements.)
In additon to using the string library functions, your program must be written using user-defined functions where appropriate. Your program must define and call at least six functions you have written; writing more than six such functions would be extremely good practice and the best way to avoid errors in developing your program. If your program doesn't contain, and call, at least six such functions, it will be graded as if it does not work on the primary input- even if its output is correct. Below are ideas for some possible functions; feel free to use others. You are permitted to write as many functions as you want, and if you write any of the functions suggested below they need not perform the exact tasks as described. Keep in mind that the more of these functions you write, the easier your program will be to test and implement correctly. Each function may be no longer than 30 lines of executable code (not counting declarations, punctuation, or comments). Functions should call the others described where appropriate. You should write and test each function separately, starting with the simpler ones, before implementing those which could depend upon them.
Again, these are only suggestions to get you started. You can use whatever functions you choose. Keep in mind that some of your functions may need to call other functions.
Your program must have a comment near the top which contains your name, login ID, student ID, your section number, your TA's name, and an original description of the action and operation of the program. In addition, you must have a comment before each function, explaining its action and operation. Your program should be written using good programming style and formatting, as discussed in class and throughout your textbook. For this project, style is considered to consist of:
You may want to skip this section at first, read the rest of the project,
and come back to study it carefully when you are about to begin writing your
program.
Mistakes with string library functions or pointers are very likely to
result in a fatal program execution error (core dump). A statement may look
completely correct, but just because some string or pointer contains a
certain value the program will fail. This makes it extremely
important to use debug printf
statements to narrow down where in your
program the error occurs, before you can even begin to figure out what's
wrong and how to fix it.
Be sure to test each function as it is implemented, before
integrating it with the rest of the program! It is frequently very easy to
test a function which has character array parameters, because you can often
call it by passing any string literals you like into the parameters. As an
example, say you write a function named split
, which is supposed to
find the position where a string is to be split in half (perhaps it has
another integer parameter as well). You can call your function several
times, as
split("This is a character string", 15)
or
split("Try another character string", 12)
, and print the result which
your function produces each time. It's easy to see by hand what result your
function should produce, so test it with a number of strings and be certain
it gives the right answer every time. You can write a little test program
file to call your function, or just add these test calls right at the
beginning of main
. Once you are positive your function works, then you can call it as part of your project and be confident that your
program will be likely to work fine.
Here are several common compilation errors having to do with strings produced by the cc compiler on our class machines and what they mean:
(where X is a statement using one of the string library functions)
You probably forgot the #include <string.h>
at the top of
your program file.
You are trying to assign something to the name of an array. You
can assign something (a pointer value, or NULL
) to a
pointer variable, but not to the name of an array. Maybe you
meant the variable on the left of an assignment to be declared as
a pointer instead of an array.
printf
statements must end with a newline character, or their results may
not show up on the screen if the program has a fatal execution
error (core dump).
printf
statement at the beginning of each
function and at the end of each function, before it returns.
These debug print statements can just say something like
``Starting function X'' and ``Leaving function X''. Run your
program again, and if the output shows that some function was
entered but never completed, that function, or one of the
functions it calls, may be where your problem lies. Remember the
terminating \n
when inserting debug printf
statements.
'\0'
).
NULL
before dereferencing them!
Any evidence of unauthorized use of computer accounts or cooperation on
projects will be submitted to the Student Honor Council, which could result
in an XF for the course, suspension, or expulsion from the University.
Projects are to be written INDIVIDUALLY. For academic honesty
purposes, projects are to be considered comparable to a take-home exam. Any
cooperation or exchange of ideas which would be prohibited on an exam is
also prohibited on a project assignment, and WILL BE REPORTED to
the Honor Council.
VIOLATIONS OF ACADEMIC HONESTY INCLUDE:
|
IT IS THE RESPONSIBILITY, UNDER THE UNIVERSITY HONOR POLICY, OF ANY STUDENT WHO LEARNS OF AN INCIDENT OF ACADEMIC DISHONESTY TO REPORT IT TO THEIR INSTRUCTOR.
Your project must be electronically submitted by the date above, before
11:00 pm, to avoid losing credit as described in the syllabus.
No projects more than
two days late will be accepted for credit without prior permission or a
valid medical excuse, as described on your syllabus. Only the project which
you electronically submit, according to the procedures provided, can be
graded; it is your responsibility to test your program and verify
that it works properly before submitting. Lost passwords or other system
problems do not constitute valid justifications for late projects, so
do not put off working on your program or wait to submit it at the last
minute!
Turn in your assignment using the ``submit'' program as before, except
using ``6'' for the project number. You are to submit only the .c file
containing your source code, not the executable version of your program!
If your program is in a file named ``p6.c'', submit would be run as
submit 6 p6.c
.
Assuming the name of the executable version of the program is
``format.x'', here is a sample execution for one input data set.
The input file is named ``primary_input'', whose contents are shown as displayed by
the UNIX ``cat''command. Following the data file's contents, the results of
running the program with input redirected from that file are shown. The
primary_input
file will be available in your class posting account.
The input and output files are shown here for reference only.
In order to be sure that your project works correctly on the primary input,
YOU MUST USE THE INPUT AND OUTPUT FILES IN THE CLASS POSTING ACCOUNT WITH THE
DIFF COMMAND AS DESCRIBED ABOVE.
Be sure to test your program against a variety of inputs, so you are sure it works in all circumstances!
For this project, the primary input consists of the contents of the input file given above. DO NOT TYPE THESE FILES YOURSELF OR COPY THEM FROM THE WEB PAGE. Use copies of the files from the class posting account. Note that this primary input does not exercise several conditions discussed above which your program should work for in order to earn credit for the secondary inputs.