Week 3

We covered how to open and write to files.

Reading Files

To read a file, we open it, loop through each line, and then close it.

A file is opened using the open command:

open (FILE, "myFile.txt"); The word "FILE" in this example is called a file handle. It is a way for us to refer to the file we have just opened.

Note the open is a function like you had in math. You've seen something like f(x) = x*2 in high school. The name of that function is f and the argument goes in parentheses. In the case of the open command, the name of the function is "open" and there are two arguments: the handle (FILE) and the name of the file to open (myFile.txt). We will see lots of functions like this in perl.

The next step is to go line by line through the file. This is done with a while loop like we used above. However, instead of using a conditional statement, we used the < > operator around the name of the file handle:

while (<FILE>) { } This tells the computer to get the first line, then get the second line, and so on. After we have read every line in the file, the loop quits.

Perl uses a special variable called $_ to store the current line we are on in this loop. In class, we created a new variable with a cleaner name to use.

while (<FILE>) { $line = $_; } This looks nicer, but it isn't necessary. You can use the $_ variable with no problems.

Now that we are in the loop, we can do whatever we need to do with the line. We can start by simply printing it out:

while (<FILE>) { $line = $_; print $line; } This code will reproduce the file by printing each line. That's not very exciting or useful. More on that in a minute...

When we're done with the file, after the loop, we need to close it. We just do this:

close FILE;

The Split Command

The split command allows you to break up a line into pieces, and store each piece. In the sample file provided, the lines look like this: 1946,1.63,17.26 1947,2.16,20.29 1948,2.77,24.21 1949,2.77,24.44 1950,2.77,24.18 1951,2.77,22.42 The values are separated by commas. The first value is a year, the second value is a price, and the third value is a price adjusted for inflation. It would be nice to store each value separately. To do that, we can split up the line wherever we see a comma.

($year, $price, $adj_price) = split(/,/,$line); First, we define the variables that will store each piece. We put them in parentheses so perl knows that the first piece of the line goes to the first variable, the second piece goes to the second variable, and so on. The split command has two parts in the parentheses. The first is what's called a pattern. Patterns are always between two slashes (/). This simple pattern is a comma, because that is what we're splitting the line on. We will learn much more complicated patterns eventually. The second argument to the spilt command is the variable storing the thing we're splitting up. In this case, it's the line we're splitting. When it is done, our three variables $year, $price, and $adj_price will have each of the three parts of our line. From there, we can use them. As a simple example, let's just print out the years: while (<FILE>) { $line = $_; ($year, $price, $adj_price) = split(/,/,$line); print "$year\n"; } That is simple, but we can do much more complicated examples. In class, we computed the average price of oil. To do that, we need to keep a running count of how many lines we've looked at, as well as a running total of the price each year: #!/usr/bin/perl open (FILE, "test1.csv"); $total = 0; $count = 0; while (<FILE>) { #$line = $_; ($year, $price, $adj_price) = split(/,/, $line); #print "$year\n"; $count++; $total = $total + $price; } close FILE; $avg = $total / $count; print "$avg\n";

Reading from the Web

The examples above will read files from your computer, but not from the web. There are only a few changes to make so this will read a file from the web. First, we want to use the LWP module ( more details at http://search.cpan.org/~gaas/libwww-perl-5.812/lib/LWP.pm). LWP is a module for perl that gives us a lot of easy ways to interact over the web.

In your code, you should add this line, just below the first line:

#!/usr/bin/perl use LWP::Simple; This lets us use all the great functionality that LWP provides.

Next, we need to get the file from the web. This line replaces the open statement used above.

$file = get "http://www.cs.umd.edu/~golbeck/perl/test1.csv"; This retrieves the file at the URL and stores it in the variable $file. However, unlike opening a file from the computer, we can't go line by line yet because the variable stores the WHOLE file. Thus, we need to add one more step to break up the file into lines. We do that with a split command, just like we used to break up the line with the commas. Instead of using a comma as the pattern, we use a \n character, so break up the file every time there is a new line.

With the commas, we knew how many pieces the line would be broken into. In this case, we don't know the number of lines the file will be split into. The solution is to store all the pieces in a list (called an array), rather than in named variables. Arrays start with an @ sign instead of a $ like variables.

$file = get "http://www.cs.umd.edu/~golbeck/perl/test1.csv"; @lines = split(/\n/, $file); There is one final change to make. We do not use the while loop with the file handle in this case. Instead, we use a different kind of loop to go through the elements of the array. We replace the while loop with this code: foreach (@lines) { The only other change is that we don't need the "close FILE" command anymore, since we didn't open a file. Thus, our final file looks like this: #!/usr/bin/perl use LWP::Simple; $file = get "http://www.cs.umd.edu/~golbeck/perl/test1.csv"; @lines = split(/\n/, $file); $total = 0; $count = 0; foreach (@lines) { $line = $_; ($year, $price, $adj_price) = split(/,/, $line); $count++; $total = $total + $price; } $avg = $total / $count; print "$avg\n";

Writing Files

Finally, we saw how to write files today. To write a file, you open it the same way you do when you're reading in the file. The only difference is that we add the > character to the beginning of the file name. One > will open a file for writing and overwrite an existing file with the same name. Using two > characters will open the file and append the text to the end. Here are two examples: open (OUTPUT, ">myFile.txt"); open (BOB, ">>theBobFile.csv"); Note that if we open multiple files, each must have a unique handle. Now that we have these files open, we can write to them using the print command: print FILE "This line will go in the file\n"; When you are done printing to the file, just close it like we closed the file we were reading in. close OUTPUT; It's important to close files you are writing to, or else all the content may not be written properly.

In class, we did an example where we printed to a file with the year and ratio of the adjusted price to the current price. Here is that code.

#!/usr/bin/perl use LWP::Simple; open (FILE, ">myOutput.tdf"); $file = get "http://www.cs.umd.edu/~golbeck/perl/test1.csv"; @lines = split(/\n/, $file); $total = 0; $count = 0; foreach (@lines) { $line = $_; ($year, $price, $adj_price) = split(/,/, $line); $ratio = $adj_price / $price; print FILE "$year\t$ratio\n"; } close FILE;