Week 3
We covered how to open and write to files.
Reading Files
To read a file, we open it, loop through each line, and then close it.
A file is opened using the open command:
open (FILE, "myFile.txt");
The word "FILE" in this example is called a
file handle. It is a
way for us to refer to the
file we have just opened.
Note the open is a function like you had in math. You've seen something
like f(x) = x*2
in high school. The name of that function is f and the argument
goes
in parentheses. In
the case of the open command, the name of the function is "open" and there
are two arguments: the
handle (FILE) and the name of the file to open (myFile.txt). We will see
lots of functions
like this in perl.
The next step is to go line by line through the file. This is done with a
while loop like we used above.
However, instead of using a conditional statement, we used the < >
operator around the name of the file handle:
while () {
}
This tells the computer to get the first line, then get the second line,
and so on. After we
have read every line in the file, the loop quits.
Perl uses a special variable called $_ to store the current line we are on
in this loop. In class,
we created a new variable with a cleaner name to use.
while () {
$line = $_;
}
This looks nicer, but it isn't necessary. You can use the $_ variable with
no problems.
Now that we are in the loop, we can do whatever we need to do with the
line. We can start
by simply printing it out:
while () {
$line = $_;
print $line;
}
This code will reproduce the file by printing each line. That's not very
exciting or useful. More on that in a minute...
When we're done with the file, after the loop, we need to close it. We
just do this:
close FILE;
The Split Command
The split command allows you to break up a line into pieces, and store
each piece. In the sample
file provided, the lines look like this:
1946,1.63,17.26
1947,2.16,20.29
1948,2.77,24.21
1949,2.77,24.44
1950,2.77,24.18
1951,2.77,22.42
The values are separated by commas. The first value is a year, the second
value is a price, and the third value is a price adjusted for inflation.
It would
be nice to store each value separately. To do that, we can split up the
line wherever we see a comma.
($year, $price, $adj_price) = split(/,/,$line);
First, we define the variables that will store each piece. We put them in
parentheses so perl
knows that the first piece of the line goes to the first variable, the
second piece goes to the
second variable, and so on. The split command has two parts in the
parentheses. The first is
what's called a
pattern. Patterns are always between two slashes
(/). This simple
pattern is a comma, because that is what we're splitting the line on. We
will learn much
more complicated patterns eventually. The second argument to the spilt
command is the variable
storing the thing we're splitting up. In this case, it's the line we're
splitting. When it
is done, our three variables $year, $price, and $adj_price will have each
of the three parts
of our line. From there, we can use them. As a simple example, let's just
print out the years:
while () {
$line = $_;
($year, $price, $adj_price) = split(/,/,$line);
print "$year\n";
}
That is simple, but we can do much more complicated examples. In class, we
computed the
average price of oil. To do that, we need to keep a running count of how
many lines we've
looked at, as well as a running total of the price each year:
#!/usr/bin/perl
open (FILE, "test1.csv");
$total = 0;
$count = 0;
while () {
#$line = $_;
($year, $price, $adj_price) = split(/,/, $line);
#print "$year\n";
$count++;
$total = $total + $price;
}
close FILE;
$avg = $total / $count;
print "$avg\n";
Reading from the Web
The examples above will read files from your computer, but not from the
web. There are only
a few changes to make so this will read a file from the web. First, we
want to use the LWP
module (
more details at
http://search.cpan.org/~gaas/libwww-perl-5.812/lib/LWP.pm). LWP is
a module for perl that gives us a lot of easy ways to interact over the
web.
In your code,
you should add this line, just below the first line:
#!/usr/bin/perl
use LWP::Simple;
This lets us use all the great functionality that LWP provides.
Next, we need to get the file from the web. This line replaces the open
statement used above.
$file = get "http://www.cs.umd.edu/~golbeck/perl/test1.csv";
This retrieves the file at the URL and stores it in the variable $file.
However, unlike opening
a file from the computer, we can't go line by line yet because the
variable stores the WHOLE
file. Thus, we need to add one more step to break up the file into lines.
We do that with a
split command, just like we used to break up the line with the commas.
Instead of using a comma as the pattern, we use a \n character,
so break up the file every time there is a new line.
With the commas, we knew
how many pieces the line would be broken into. In this case, we don't know
the number of lines the
file will be split into. The solution is to store all the pieces in a list
(called an array),
rather than in named variables. Arrays start with an @ sign instead of a $
like variables.
$file = get "http://www.cs.umd.edu/~golbeck/perl/test1.csv";
@lines = split(/\n/, $file);
There is one final change to make. We do not use the while loop with the
file handle in this
case. Instead, we use a different kind of loop to go through the elements
of the array. We replace
the while loop with this code:
foreach (@lines) {
The only other change is that we don't need the "close FILE" command
anymore, since we didn't open a file.
Thus, our final file looks like this:
#!/usr/bin/perl
use LWP::Simple;
$file = get "http://www.cs.umd.edu/~golbeck/perl/test1.csv";
@lines = split(/\n/, $file);
$total = 0;
$count = 0;
foreach (@lines) {
$line = $_;
($year, $price, $adj_price) = split(/,/, $line);
$count++;
$total = $total + $price;
}
$avg = $total / $count;
print "$avg\n";
Writing Files
Finally, we saw how to write files today. To write a file, you open it the
same way you
do when you're reading in the file. The only difference is that we add the
> character to the
beginning of the file name. One > will open a file for writing and
overwrite an existing file with the same name. Using two > characters
will open the file and append the text to the end. Here are two examples:
open (OUTPUT, ">myFile.txt");
open (BOB, ">>theBobFile.csv");
Note that if we open multiple files, each must have a unique handle. Now
that we have these
files open, we can write to them using the print command:
print FILE "This line will go in the file\n";
When you are done printing to the file, just close it like we closed the
file we were reading
in.
close OUTPUT;
It's important to close files you are writing to, or else all the content
may not be written properly.
In class, we did an example where we printed to a file with the year and
ratio of the adjusted price to the current price. Here is that code.
#!/usr/bin/perl
use LWP::Simple;
open (FILE, ">myOutput.tdf");
$file = get "http://www.cs.umd.edu/~golbeck/perl/test1.csv";
@lines = split(/\n/, $file);
$total = 0;
$count = 0;
foreach (@lines) {
$line = $_;
($year, $price, $adj_price) = split(/,/, $line);
$ratio = $adj_price / $price;
print FILE "$year\t$ratio\n";
}
close FILE;