Homework 3

Exercise 1

This is a list of email addresses from Enron . It is a comma separated list of people who emailed each other. The sender is listed first, and then a comma, and then the recipient.
lavorato@enron.com,wang@enron.com
chapman@enron.com,reed@enron.com
reyes@enron.com,westbrook@enron.com
Write a program to count the number of times each unique email address appears in the file as a sender, and how many times it appears as a receiver. Store the values using a hash.

Your output should have the email address followed by a comma followed by the number of times the address was a sender followed by a comma followed by the number of times the address was a receiver.

x@example.com,1,1 y@example.com,8,2 test@example.com,61,900 If the email address has not been a sender or has not been a receiver, print a 0 in the right place. Name your output file lastname_firstname_hw3_ex1.csv

Some sample correct output:

gray@enron.com,5,1 donahue@enron.com,2,0 presto@enron.com,3,7 gold@enron.com,2,1 doucet@enron.com,5,0

Practice / Challenge not required Exercise 2

I am providing you with a list of tags that users applied to a series of images in the steve project and a list of titles of wikipedia articles (note: this is a huge file. Don't just click on it. Right click and save it). Each line in the steve data file has two identifiers, the tag, and a third identifier separated by tabs.

Your job is to find multi-word tags (i.e. tags that are more than one word) and, if a tag matches a wikipedia article title, print either the tag or the entire line (your choice) from the file that has that tag.

You must use hashes and loop through the contents of each file only once. You man NOT put the values in a data structure and loop many times - you can only loop once.

You must work with local versions of these files - do NOT access them using LWP even while you are working on the assignment. Keep the names of the files identical to how they are linked here. Do not upload copies of these files when you turn in your assignment.