Final Project
This project can be done in groups of up to 4.
The goal of this project is for you to carefully build a social network 
and to analyze it using the knowledge and skills you have developed in 
class. You will be required to produce evidence to support any of your 
claims.
Choose a social network. This can be an online network (Facebook, twitter, 
YouTube), a network extracted from other data (like your email network, 
discussion boards, interactive game playing), or built to represent a 
network that exists offline. 
- Define the nodes in the network. Who are they and what do you want to 
know about them?
- What are the links in the network? What do they represent?
- Collect the data for the network. Using the definitions you have 
chosen for nodes and links, build a representation of the network (an 
adjacency list or matrix). You may use tools like NodeXL to get this 
data, 
or you can build the network by hand. However, you must have real data for 
all the entities; you cannot just make up connections or hypothesize what 
they may be. If you want to use a simulated network, you must have good 
basis for your simulation and use tested techniques. This is not 
recommended unless you have a good reason for doing it. If you want to 
simulate a network, you must talk to me first.
Once you have the data, you should perform the following tasks.
- Visualize this network. You can create a visualization in 
Gephi, NodeXL, 
ManyEyes, some other software, or you can draw it by hand.
- Identify structural features of the network. Are there interesting 
hubs or clusters? Is there a high or low clustering coefficient? Does it 
look like a small world network? Describe interesting features.
- Explain how those structural features relate to real factors in the 
network. For example, if you were studying the network of Congresspeople 
who use Twitter and you see the John McCain has more followers than anyone 
else, you can explain this by the fact that he ran for President. All your 
explanations must be grounded in fact with supporting evidence from 
primary sources; it cannot be an intuitive guess.
Here are some example projects:
- Email Network: You create a social network from your email (or 
someone else's email, e.g. the Enron Email Corpus). The nodes 
are people and the edges indicate that people were on the same email. This 
would add a link between the sender and all the recipeints. People cced on 
the same message could also be linked. Edges can be labeled with the 
number of messages exchanged. They can also be directed (Alice -> Bob is 
Alice emailed bob, and Bob->Alice if Bob emailed her back). 
 Who are your 
most frequent correspondents? Which nodes are most central? Using your own 
knowledge of your strong ties and weak ties, does centrality, degree, 
frequency of emails, or other factors relate to the tie strength? Does 
your email network have clusters of people you email for different things 
(e.g. family, classmates, work people, friends, etc)?
 
- Discussion board network: Based on posted questions and replies, build 
a network of people who interact on a discussion board online. You could 
also include topics as nodes in the network and connect people to the 
topics they have discussed.  Who are the 
most central people? Do they start a lot of discussions or mostly reply? 
Do they engage in a lot of back and forth or do they disappear? Who 
connects to the most topics? Are those people more likely to share 
information? Can you find different types of users based on how they look 
in the social network?
- Sexual Contacts Network: Admittedly, it's hard to get this kind of 
data where people are your nodes and they are linked if they slept 
together. However, it's an *extremely* common kind of network studied in 
epidemeology to better understand transmission of STDs. Who is most 
central in the network you create? Who has a high degree or low degree? 
Based on what we learn about spreading of disease, what is the best way to 
stop it? Which nodes do you target, when, and how?
- I have a few research projects that could be done as class projects. 
These are a good opportunity for students interested in possibly going on 
to graduate school, since we will try to generate publishable results. 
These look very good on grad school applications. If you're interested in 
something like that, please let me know.
- You are also welcome to choose your own topic. If you want to do that, 
please email Dr. Golbeck to discuss your ideas ahead of the first 
deadline. 
Timeline
Note: These deadlines are mostly there to ensure you are making progress 
at the right rate to successfully complete the project. I will provide 
some guidance in the early stages of the project, but I will not be 
reviewing your drafts nor making comments about what you need to change to 
get an A on the paper. I will not grade your assignments ahead of time. If 
you have specific questions, please ask, but do not send your full paper 
and just ask me to look it over and comment. I won't do it.
 
- April 3: Short (1-2 paragraph) description of your network, data 
source, and groups chosen. Email this info to Dr. Golbeck with subject 
line "INFM289I Project Update". Include an estimate of how big your 
network will be. I will review these and let you know if they are big 
enough or too big. Groups should have networks that are significantly 
larger than one person could work with on their own; each member must 
collect the same amount of data that they would if they were working 
alone.
- April 10: Data collection should be complete. An adjacency list for 
your network along with a 1/2 page description of the collection process 
is due, emailed to Dr. Golbeck with the subject line "INFM289I Project 
Update".
 
- April 17: Visualizations and 1/2 - 1 page list of bullet points 
describing interesting features of the network emailed to Dr. Golbeck with 
the subject line "INFM289I Project
Update".
- April 24: 2 page single spaced extended outline of paper due. This 
should include 
descriptions of all the analysis questions, short answers that you will 
support with evidence, and data collection methods.
- May 1: Complete draft of papers due.
- May 1 and 3: In class presentations
- May 8: Final Papers due in class
You can work alone or in groups of up to 4. The workload should scale with 
the number of group members, i.e. a group of 4 must produce 4X the work of 
a person working alone.
Final papers should be 5 pages single spaced for a person working alone 
and for groups there should be an additional 4 pages for each additional 
member (i.e. a group of 2 needs a 9 page paper, a group of 3 needs a 13 
page paper, and a group of 4 needs an 17 page paper). This means that you 
should increase the number of analysis questions per person in order to 
substantially increase the size of the project for groups. Graphs, charts, 
visualizations, tables, etc. all count to your page total.
Keep this in mind - if you do not feel like you can fill 5 single spaced 
pages with analysis of your network, then you have picked something too 
simple.
 Things I want to see in the final paper:
- What is the network you looked at? If appropriate, why does this 
network 
exist or what is it about?
- How did you collect your data? Was it by hand? Was the network already 
available? Why did you make the choices you did?
- Show visualizations. They count toward the page total but don't go 
overboard. They shouldn't be HUGE nor should you use a dozen of them just 
to take up space.
- Analysis - this is most important. Tell me interesting things you 
discovered based on collecting and looking at this network. How do 
features like centrality, tie strength, clustering, etc. relate to actions 
or roles in the network? Use all the features from class that are 
approrpriate. If you apply a network principle (e.g. Small Worlds) explain 
why that is appropriate (e.g. give the statistics that show a high 
clustering coefficient and low average shortest path length).
 
Things I *don't* want:
- A list describing each person in the network and their relationship to 
others. E.g. "Bob is the father in this network. Jim, Joe, and Frank are 
his sons. Jim is the oldest son. He is a carpenter and likes social 
networks..." There's no analysis there and you won't get much credit for 
this.
- A list of ations, e.g. "Bob commented 3 times on Franks posts. Frank 
commented 10 times on Jim's posts and 4 times on Joe's posts". A little of 
that is ok, but it's also not analysis and it bores me. It also shows you 
haven't learned much if this is the best you can do!
Standard things you should be doing anyway: All statements you make must 
have evidence to support them. If you are using outside sources, cite them 
properly. Everything you submit must be your own work. Use standard 1" 
margins with 12 point times font. 
	
 Grading
- 10% for meeting each of the first 5 deadlines above. No credit for 
late 
submissions. This is 50% total.
- 20% in-class presentation
- 30% final paper
Paper Grading
- Length: 2
- Writing quality: 2
- Description of data collection process: 1
- Visualization present: 1
- Analysis: 6
- Note: Depending on your network and the process, I may weight certain 
things toward analysis. For example, if you wrote a lengthy computer 
program to gather complex data, description of that code and analysis of 
the data online would count toward this.