Formatted Text

 

ASCII files are easy to edit and maintain on the UNIX system, but they are not as readable as text which is formatted with things like bolding, underlining, different size fonts and special (non-ASCII) characters.  To have this formatted text on a microcomputer you would use a word processing package such as Microsoft word, but on the UNIX host machine the editors  you have learned (vi and emacs) can only edit ASCII text files.

 

Text Layout Languages

 

In order to get the formatted text on a UNIX machine, we usually use a text layout language.  There are several text layout languages available.  Once you learn one text layout language, it is easy to learn a second or a third because they all operate on the same basic structure.  It is just a different set of commands you have to learn when switching between layout languages.   The most common text layout languages include:

  1. nroff/troff   developed on the UNIX  system
  2. HTML used for the World Wide Web
  3. Tex/LaTeX used on several platforms
  4. Bookmaster developed on the IBM mainframes

 

When you are working on a word processor and you want some text formatted differently, you select the font/format you would like and the text you would like it applied to and then you see the change right there on the screen in front of you.  With a text layout language, the major difference is that you will not be able to see the effects of your font/formatting choices immediately on the screen in front of you because it is more like writing a program.  The file you create will contain both the text you would like to display and the commands telling in what font/format you would like that text to be displayed.  The file you create will then have to be interpreted and translated so that it can display your text in your selected font/format.  The steps you need to follow when using a text layout language include

  1. edit an ASCII text file using a text editor (i.e. vi or emacs on UNIX)
  2. in that file type both the text to be displayed and the  commands telling how it should be displayed
  3. save this single file
  4. give that file to the program written to interpret the commands from the file
  5. print or display the result of translation

 

For the rest of the lesson we will concentrate on nroff and troff.  These are the text layout languages used primarily on the UNIX system for its own system files.  For example, the manual pages displayed when you use the man command have bolding, underlining and different fonts - this is because they are layed out using these packages. 

 

nroff/troff Commands

 

nroff and troff allow basically the same set of commands and so can be learned together.  The difference between them is that troff was designed to work with high resolution devices such as laser printers while nroff is designed for letter quality printers and for CRT terminals.  Any commands known by one will be ignored if it is in a file interpreted by the other. 

 

In the file you create with your ASCII text editor will contain the text and the commands.  There must be a way for the interpreter to be able to tell which things were meant to be which.   The commands you give are basically of two different forms.

  1. The first form of command is called an escape sequence (and is also called a function).  The escape sequences have to begin with a backslash.  For example the escape sequence \fI is used indicate that you want the font to change to italicized (or underlined in nroff since it can not do italics) and the escape sequence \fP is used to indicate that you want the font to change back to the previous font.  So if your ASCII file contains

Jandelyn \fI Dawn \fP Plane

            the result (after interpretation would be

Jandelyn Dawn Plane

  1. The second form of command is either a primitive request or a request  for a macro as described below.  These commands must begin on a new line and must begin with either a period (.) or a single quotation mark (').  For example the command

.ce 5

will center the 5 lines that follow this command.  The period indicates that it is a command, the ce is the name of the command and the 5 is the argument to the command.  Some commands will take no arguments, others will take one or two - depending on the command itself.  Some of these arguments will have default values if no value is specified.  The command

   .PP

will start an indented paragraph.  The primitive requests are all named with two lowercase characters and the macros are all named with two uppercase characters.  So from these examples, the ce is a primitive request and the PP is a macro.  It is not really important to remember which is which unless you plan to not use the macros.

 

nroff and troff are often used in conjunction with macro packages written to make the use of nroff and troff easier.  Macros are small programs given a name so that you can just call on the name of the package and the program called will fill in all of the details.  For example having a title for a document is a very common request so instead of specifying exactly what size and font you would like your title and how much spacing before and after the title you would like, you can specify that you would like this to be a title.  The macro is programmed to know the details and will put in the size, font and spacing information.  The rest of this document assumes that you will be using the macro package named ms.

 

You will create and save the file using an editor - for example

   % emacs file.1

Then you will translate that file using the nroff or troff interpreter possibly specifying a macro  package as an option.  A common way is to then place the interpreted version into a file so that it can be viewed or printed later.  These commands say to interpret the file named file.1 assuming the ms macro package and to put the output into the file named output.file.

 

   % nroff -ms file.1 > output.file

or

   % troff -ms file.1 > output.file

 

Basic Commands

 

When nroff/troff is interpreting the input file, it will take white space (spaces, tabs and end of lines just as delimiters.  This means that even if the words are typed into the input file in a single column, they will be formatted in the output as a paragraph (across the page) as many as will fit.    There are a few exceptions to this rule - such as white space at the beginning of an input line will not be assumed to be delimiting the text that follows from anything.  The default is also to have full justified text.  Full justified means that both the left and the right margins are straight.  To do this nroff/troff will fill in spaces between words as needed.  These features can be turned on or off as needed using the commands in the following table:

.nf

no fill - accept the lines as typed in the input file (do not view the end of lines just as delimiters

.na

no adjust - do not have the line be full justified

.ad

adjust - full justification should be turned back on

 

Document Wide Settings

 

Before  you can start formatting the text of the document, there are several document-wide decisions that must be made and specified at the beginning of the document.  These include things like the length of the page, the length of a line, and the amount of indenting.  Because the interpreted file could be sent to many different types of printers, the nroff/troff interpreter can't make these decisions.  The choices made are held in storage spaces called "registers" so that they can be remembered used during interpretation. 

 

.pl N

or

.nr PL N

page length - the length of the entire page (including header margin and footer margin) -- default = 11 inches -- the N is replaced by the new value

 

.ll N

or

.nr LL N

line length - the length of the used portion of a typical line

 -- default = 6 inches -- the N is replaced by the new value

.po N

or

.nr PO N

page offset (or left margin)  -- default = 0 inches in nroff and 1 inch in troff -- the N is replaced by the new value

.nr HM N

Header margin

.nr FM N

Footer margin

 

The command shown as primitive commands  (.pl, .po, etc.) will take effect immediately. The setting of the registers (.nr) is actually for the ms macro and will not take effect until the beginning of the next page.

 

Comments

Comments can be put into the source file by using the \" to indicate the beginning of the comment.  Everything from the \" to the end of the current line will be ignored by the nroff/troff interpreter.

 

Units of Measure

When a measurement is given, you can select either inches or centimeters (inches indicated with an i and centimeters indicated with a c).  Character size is measured in points.  A point is 1/72 of an inch.  Since nroff can not handle different font sizes it will ignore point measurements that specify different font sizes.

Command List

.ND

no date should be displayed (default is that the date of interpretation is put into the file)

.ds CH

no page numbers should be displayed (default is that page number are put into the file)

.sp N

N indicates a number of blank lines to be inserted at that point - if the N is not specified, it is assumed to be 1

.ti + N

text indent add the amount of space indicated by N

\*(DY

(escape sequence) telling it to use the register value to put the date in at this point

Paragraph Types

.PP

Regular indented paragraph

.IP L I

Indented paragraph with hanging tag where L indicates the label and I indicates the amount of indenting

.LP

Block paragraph (no indent)

.QP

Quoted paragraph

.XP

Indented paragraph with hanging label

Display commands

These "display with keep" commands from the ms macro package will not allow "filling of lines".  In other words where the end of line character is placed in the source is where it will appear in the result.

.DS L

start a left (only) justified group of lines

.DS I N

start a left (only) justified group of lines that is indented N from the margin

.DS C

start a center justified group of lines (each centered individually)

.DS B

start a center justified group of lines (where they are centered as a block rather than individually

 

Type of Information -  Formatting Commands

These commands allow you to specify what the text represents.  This allows the macro then to determine how it should be displayed.

.TL

Title lines

.AU

Author

.FS

Footnote start

.FE

Footnote end

.SH

Unnumbered Section Heading

.NH N

Numbered Section Heading where N indicates the section level (up to 5 levels deep)

.IP

 

 

Escape Sequence Special Characters

Allow you to put in these characters which are not available on the keyboard.

\{*G

Г  - uppercase Greek Gamma

\(*g

γ - lowercase Greek Gamma

\(dg

† - common symbol used for footnote marks

\(!=

≠ - not equal to sign

\(co

©  - copy right symbol

\*Q

“ - opening quotation mark

\*U

  - closing quotation mark

 

------------------------------------------------------------

The rest of this lecture will be done mostly in the form of examples.  I have put several example into the jplane directory.  You can just change directory into that space and display the files in their non-interpreted form using the more command or in their interpreted form using nroff piped through more.

 

 

  1. The first is an example showing the different paragraph options.  There are several ways to write a paragraph so nroff and troff in the ms macro set give several options.  Each of the pragraph formats is displayed in that paragraph (so be sure you read the file rather than just looking at how it is formatted.   To display this file you would use either the command:

 

 % more ~jplane/nroff.paragraph.example

(to see the unformatted file (and the nroff commands used to create it)

 

or

 

 % nroff -ms ~jplane/nroff.paragraph.example | more

(to see the unformatted file (and the nroff commands used to create it)

 

  1. The second is an example using displays.  It is named   nroff.display.example.  The  displays are different from the paragraphs in that these displays will be kept together on the same page and the display will not "fill" the lines to make them line up nicely.  We usually type in the editors so that the end of a sentence is also the end of a line (where the enter key is pressed).  We do this because the UNIX editors make it easier to edit by lines.  Notice that sed, vi and even emacs are significantly line based in the structure of their commands.  nroff gives us this option because there are things we do not want filled such as poems and addresses.

 

  1. The third example shows the use of footnotes.  The text for the footnote is inserted into the middle of the document in such a way that the source file is extremely hard to read.  The interpreted file is not so difficult to read because the text of the footnotes is moved to the bottom of the page where footnotes should be.  For this example, you may want to look first at the interpreted file and then look at the source that created it.

 

  1. The last example uses another UNIX tool called tbl.  This tool will setup tables so that the nroff or troff layout language can use them.  To use the tbl command with nroff, they must both be applied to the source file before the result is displayed on the screen.  The tbl command prepares the file  for nroff, so the tbl command comes first in the sequence.  The format of the UNIX command would be

 

% tbl file | nroff -ms | more

 

    1. .TS and .TE commands are ms macro commands which tell the interpreter where the table starts and ends.
    2. the data lines of the table uses tabs as delimiters so you can indicate where the different individual fields begin and end.  This may make the source file difficult to read, but the result will look nice because a tab is taken only as a field delimiter not as a unit of space.
    3. Before the data lines,  the options for the overall format  of the table is specified.  The examples show the use of center, box and allbox as formatting options for your table.  This portion of formatting is ended by a semicolon ";" as a delimiter from the rest of the formatting commands
    4. After the overall table formatting has been specified, the column formats are indicated by key letters.  Several of these are shown and explained in the example file.