CMSC 106 Project #6 Spring 2003

Due date: Wednesday, April 30, 2003


1 Purpose


This project will give you practice processing strings using pointers and C's character and string processing library functions. This will allow you to gain a fuller understanding of the relationship in C between arrays and pointers. In order to make sure you are using the string functions already defined for you in <string.h>, you may not use any loops that process an array character by character if there is a string function which would perform the same operation.

You will be writing a simple web page formatter. The program will read a file containing text and HTML-like "tags" which will tell the program how to format the text for output. The tags will control such things as the line length, text spacing, and text case.


2 Project description


As you read this section, you may also want to refer to the ``Sample output'' section below.


2.1 Program input

The input data file will consist of text to be printed and "tags", which are commands enclosed in angle brackets. The program will read in the text, format it according to the tags, and print out the result. Text will consist of any words and punctuation of the English language, and it may or may not be arranged into sentences and paragraphs. The input lines will be no longer than 100 characters each. The size and arrangement of the output lines will be specified by default values or tags, but the maximum length of the output lines will be 120 characters. There is no limit to the number of input lines. The program will read, format, and print text until it reaches the end of the input file.

2.2 Processing and output


The program reads text from the input file, arranges it into lines of the proper length, and prints it out under the control of any tags, which are commands enclosed in angle brackets. Words are added to an output line until the line would become longer than the current value of the line length. At this point, the current line is printed, and a new line is started. If a single word is longer than the line length, then it is printed on a line by itself. Multiple blanks in the input are replaced by single blanks in the output. All punctuation in the input is preserved in the output. A hyphenated word is treated as a single word. A punctuation mark immediately preceding or following a word (without any white space) is treated as if it is part of the word.

Tags may appear anywhere in the input text or not at all; they may or may not be separated from the text by blanks. Tag names are always given in upper case, but tag parameters are in lower case. Tags take effect immediately after appearing in the input. The program may assume that tags will be properly formatted (names spelled correctly, etc.). If an angle bracket is not immediately followed or preceded by a proper tag, then it is simply printed as part of the output (see the primary input file for an example). Tags are not printed in the output. The attributes of the text which are controlled by tags are:

These are the tags and their descriptions:

2.3 Processing requirements

  1. THE MAIN RULE - YOU MAY NOT USE ANY CHARACTER BY CHARACTER TRAVERSAL OF ANY CHARACTER ARRAY WHICH DUPLICATES THE OPERATION OF A STRING LIBRARY FUNCTION!!! If you are using the name of a character array with an index i.e. arr[i] and you have a loop that uses i++ - you are traversing the array character by character. If you are using pointers to a character array char *ptr and you have a loop that uses ptr++, you are traversing the array character by character. Note that you may need a pointer to keep track of where you are in a string, but you must use string library functions wherever possible to perform operations on the string.

  2. If you do need to access individual elements of an array, you MUST use pointer notation, NOT subscripts. (This means that the only square brackets in your program will be used in array declarations, not executable statements.)

  3. You must use at least 6 string.h fuctions from the following list: strlen, strchr, strcat, strcpy, strncat, strncpy, strcmp, strncmp, strcspn, strpbrk, strrchr, strspn, strstr, or strtok.


3 Required functions and implementation


All your C programs in this course should be written in ANSI C, which means they must compile and run correctly with cc -std1 -trapuv on the OIT UNIX Class Cluster. You will lose credit if your program generates any warning messages when it is compiled. Prototypes must appear for all functions defined, prototypes must be listed at the top of the program file, and at most one return statement may be used in any function, including main.

Even if you already know what they are, you may not use any C language features other than those introduced in Chapters 1 through 13 of your textbook, plus those in Section 23.4 and any function in Section 23.5 whose name begins with ``str'', and the language features presented in lecture while these chapters were covered. In addition neither the goto nor the continue statement may be used, and the break statement may not be used in any loop. Your program may not use the exit() library function at all. Lastly, no global variables may be used. Using any of these disallowed C features will result in losing credit.

Your program must make use of C's string library functions anywhere possible. You will lose substantial credit if you write code duplicating their effects, rather than just calling them (e.g., if you write loops to copy or compare strings, search for something in a string, find the length of a string, etc., you will lose credit). The character library functions (textbook section 23.4) may also be used, but you will not be penalized if you duplicate their effects rather than using them. You will lose substantial credit if you use any traversals of an array character by character to duplicate the operation of a string library function. If you do need to access individual elements of an array, you MUST use pointer notation, NOT subscripts. (This means that the only square brackets in your program will be used in array declarations, not executable statements.)

In additon to using the string library functions, your program must be written using user-defined functions where appropriate. Your program must define and call at least six functions you have written; writing more than six such functions would be extremely good practice and the best way to avoid errors in developing your program. If your program doesn't contain, and call, at least six such functions, it will be graded as if it does not work on the primary input- even if its output is correct. Below are ideas for some possible functions; feel free to use others. You are permitted to write as many functions as you want, and if you write any of the functions suggested below they need not perform the exact tasks as described. Keep in mind that the more of these functions you write, the easier your program will be to test and implement correctly. Each function may be no longer than 30 lines of executable code (not counting declarations, punctuation, or comments). Functions should call the others described where appropriate. You should write and test each function separately, starting with the simpler ones, before implementing those which could depend upon them.


Again, these are only suggestions to get you started. You can use whatever functions you choose. Keep in mind that some of your functions may need to call other functions.

4 Checking Your Outputs

You should use the ``diff'' command to check your output for the primary case. For this project, you MUST use ``diff'' without specifying ``-bwi''.

5 Project requirements

Your program must have a comment near the top which contains your name, login ID, student ID, your section number, your TA's name, and an original description of the action and operation of the program. In addition, you must have a comment before each function, explaining its action and operation. Your program should be written using good programming style and formatting, as discussed in class and throughout your textbook. For this project, style is considered to consist of:



6 Developing your program


You may want to skip this section at first, read the rest of the project, and come back to study it carefully when you are about to begin writing your program.

Mistakes with string library functions or pointers are very likely to result in a fatal program execution error (core dump). A statement may look completely correct, but just because some string or pointer contains a certain value the program will fail. This makes it extremely important to use debug printf statements to narrow down where in your program the error occurs, before you can even begin to figure out what's wrong and how to fix it.


6.1 Possible development steps

Be sure to test each function as it is implemented, before integrating it with the rest of the program! It is frequently very easy to test a function which has character array parameters, because you can often call it by passing any string literals you like into the parameters. As an example, say you write a function named split, which is supposed to find the position where a string is to be split in half (perhaps it has another integer parameter as well). You can call your function several times, as split("This is a character string", 15) or split("Try another character string", 12), and print the result which your function produces each time. It's easy to see by hand what result your function should produce, so test it with a number of strings and be certain it gives the right answer every time. You can write a little test program file to call your function, or just add these test calls right at the beginning of main. Once you are positive your function works, then you can call it as part of your project and be confident that your program will be likely to work fine.


6.2 Finding compilation errors

Here are several common compilation errors having to do with strings produced by the cc compiler on our class machines and what they mean:


X is being converted to "pointer to char"

(where X is a statement using one of the string library functions)

You probably forgot the #include <string.h> at the top of your program file.

"X" is not an lvalue, but occurs in a context that requires one.

You are trying to assign something to the name of an array. You can assign something (a pointer value, or NULL) to a pointer variable, but not to the name of an array. Maybe you meant the variable on the left of an assignment to be declared as a pointer instead of an array.


6.3 Program debugging


  1. Don't forget that, for technical reasons, all debug printf statements must end with a newline character, or their results may not show up on the screen if the program has a fatal execution error (core dump).

  2. As mentioned above, core dumps are a common result from problems with strings and pointers. If a program with several functions has a core dump, and you don't know where the problem is, you can add a debug printf statement at the beginning of each function and at the end of each function, before it returns. These debug print statements can just say something like ``Starting function X'' and ``Leaving function X''. Run your program again, and if the output shows that some function was entered but never completed, that function, or one of the functions it calls, may be where your problem lies. Remember the terminating \n when inserting debug printf statements.

  3. If you are having trouble getting a string to print in its entirety, try printing the first character or characters from that string using either pointer or subscript notation. This could tell you if the string is at least starting in the correct place. The problem could involve a missing or misplaced null character ('\0').

  4. Draw lots of pictures to trace exactly where things are in memory, and be sure never to attempt to dereference or follow a pointer which is not pointing at something valid!

  5. If after you have tried these techniques, and tested each of your functions, you still can't figure out why your program doesn't work, bring a printout to our office hours, and we can help you learn how to track the problem down.


6.4 Helpful hints



7 Academic integrity statement


Any evidence of unauthorized use of computer accounts or cooperation on projects will be submitted to the Student Honor Council, which could result in an XF for the course, suspension, or expulsion from the University. Projects are to be written INDIVIDUALLY. For academic honesty purposes, projects are to be considered comparable to a take-home exam. Any cooperation or exchange of ideas which would be prohibited on an exam is also prohibited on a project assignment, and WILL BE REPORTED to the Honor Council.


VIOLATIONS OF ACADEMIC HONESTY INCLUDE:


  1. failing to do all or any of the work on a project by yourself, other than assistance from the instructional staff.

  2. using any ideas or any part of another student's project, or copying any other individual's work in any way.

  3. giving any parts or ideas from your project, including test data, to another student.

  4. having programs on an open account or on a PC that other students can access.

  5. transferring any part of a project to or from another student or individual by any means, electronic or otherwise.


IT IS THE RESPONSIBILITY, UNDER THE UNIVERSITY HONOR POLICY, OF ANY STUDENT WHO LEARNS OF AN INCIDENT OF ACADEMIC DISHONESTY TO REPORT IT TO THEIR INSTRUCTOR.


8 Submitting your project


Your project must be electronically submitted by the date above, before 11:00 pm, to avoid losing credit as described in the syllabus. No projects more than two days late will be accepted for credit without prior permission or a valid medical excuse, as described on your syllabus. Only the project which you electronically submit, according to the procedures provided, can be graded; it is your responsibility to test your program and verify that it works properly before submitting. Lost passwords or other system problems do not constitute valid justifications for late projects, so do not put off working on your program or wait to submit it at the last minute!

Turn in your assignment using the ``submit'' program as before, except using ``6'' for the project number. You are to submit only the .c file containing your source code, not the executable version of your program! If your program is in a file named ``p6.c'', submit would be run as submit 6 p6.c.



9 Sample output


Assuming the name of the executable version of the program is ``format.x'', here is a sample execution for one input data set. The input file is named ``primary_input'', whose contents are shown as displayed by the UNIX ``cat''command. Following the data file's contents, the results of running the program with input redirected from that file are shown. The primary_input file will be available in your class posting account. The input and output files are shown here for reference only. In order to be sure that your project works correctly on the primary input, YOU MUST USE THE INPUT AND OUTPUT FILES IN THE CLASS POSTING ACCOUNT WITH THE DIFF COMMAND AS DESCRIBED ABOVE.

Be sure to test your program against a variety of inputs, so you are sure it works in all circumstances!

% cat primary_input
<MARGIN=0> <WIDTH=60>
<ALIGN=center>
Project 6
<ALIGN=left>
This project will give you practice<BR>processing strings using  pointers
 and C's character and string processing
 library functions. This will allow you to gain a fuller understanding
of the relationship in C between arrays
and  pointers. In order to make sure you are using the string functions
already defined for you in <BR> <string.h>,<BR>
<FONT=upper>you may not use  any loops that process an array character by character
if there is a string function which
would perform the same operation.</FONT>
<P> You will be writing a simple web page formatter.
The program will read a file containing text and HTML-like "tags" which
will tell the program how to format the text for output.
The tags will control such things as the line length, text spacing, and
text case.
%

% format.x < primary_input
                          Project 6
This project will give you practice
processing strings using pointers and C's character and
string processing library functions. This will allow you to
gain a fuller understanding of the relationship in C
between arrays and pointers. In order to make sure you are
using the string functions already defined for you in
<string.h>,
YOU MAY NOT USE ANY LOOPS THAT PROCESS AN ARRAY CHARACTER
BY CHARACTER IF THERE IS A STRING FUNCTION WHICH WOULD
PERFORM THE SAME OPERATION.

     You will be writing a simple web page formatter. The
program will read a file containing text and HTML-like
"tags" which will tell the program how to format the text
for output. The tags will control such things as the line
length, text spacing, and text case.
%


For this project, the primary input consists of the contents of the input file given above. DO NOT TYPE THESE FILES YOURSELF OR COPY THEM FROM THE WEB PAGE. Use copies of the files from the class posting account. Note that this primary input does not exercise several conditions discussed above which your program should work for in order to earn credit for the secondary inputs.



Steve Scolnik 2003-04-21