CMSC828G Spring 2008: Link Mining and Dynamic Network Analysis

Course Description

There has been a recent surge of interest in the analysis of data describing all forms of networks, including communications networks, biological networks, social networks, financial transaction networks and more. Despite the diversity of domains, common difficulties and challenges include noisy and incomplete data, dynamic and streaming data, issues of scalability and statistical issues such as identifiability, stationarity, and so on. There are a number of different research communities working on network analysis including statisticians, physicists and computer scientists; each comes with their own view on the problem of network analysis, their own set of tools and their own style of analysis.

In this seminar, we will survey some of the recent work in network analysis. We will cover topics such as: random graph models, information diffusion and social contagion, probabilistic inference and collective classification in network data, link prediction, community formation and detection, clustering in graph data, entity resolution, visual analytic tools for network data, and other topics from graph mining and statistical network analysis as time permits.

The seminar will be very interactive and collaborative. The topics covered and the depth of coverage will depend on the participants' input and interests.

The goal of the course is to give you an overview of current topics in link mining and dynamic network analysis and practical machine learning and statistical modeling experience to analyze network data that arises in your research. The methods are applicable in many areas such as bioinformatics, computer vision, computational linguistics, databases, program analysis, networks and systems. Along the way, you will pick up some practical experience in reading and presenting research papers, writing a literature survey, and doing a course project that ideally will lead to a publishable paper.

In tandem with the course, throughout the semester there will be several invited speakers presenting current work in network analysis. Some of these will be during the scheduled course time, while others, due to schedule constraints, will be outside the regular course time. Students are highly encouraged to attend the invited talks and meet with the speakers.

Prerequisites: Background in machine learning and graphical models suggested. Mathematical maturity and a basic course in probability required.

Course Format

This is a seminar course. Each class will consist of presentations and discussion. Students will be required to do a survey paper (30%) and a class project for the course (45%) . A significant portion of the grade will be based on class participation, which includes paper presentations, contributions to the wiki, and demonstrations (25%).

Because of the interactive nature of the course, auditing is discouraged. However the sessions which have invited speakers will be open to all. For information and announcements about the invited speakers, please join the dynamic network analysis (dna) mailing list:: dna-seminar@cs.umd.edu. To sign up, go here http://mailman.cs.umd.edu/mailman/listinfo/dna-seminar.

The web page for the seminar series is here: http://www.cs.umd.edu/projects/linqs/dna-seminar/

Course Credit

This course does not count as a PhD Core or MS Comps course. This course can be used toward PhD coursework as part of the non-core classes required or towards MS coursework (but it is not an MS qualifying coure).

Course Information

Time: Tue. & Thurs. at 11:00-12:15pm in room: CSI 2107
Professor: Lise Getoor - getoor AT cs.umd.edu
Office hours: 3:00 - 4:00 Tue. in AVW 3217
Web site: http://www.cs.umd.edu/class/spring2008/cmsc828g/

Course Wiki

Tthe class wiki, http://linqs.cs.umd.edu/cmsc828g/, is for students enrolled in the course to share material and discuss content.

Course Mailing List

Tthe class mailing list is for announcements relevant to the class. If you are enrolled in the class please sign up here http://mailman.cs.umd.edu/mailman/listinfo/cmsc828g

Schedule / Syllabus (Subject to Change)

Date & Location Topic / Papers Notes
Tue 1/29, CSIC 2107

Introduction

Students should add their introductions and defintions to the class wiki
Thu 1/31 Defintions and Link Mining Survey

Students present their definitons

Link Mining: A Survey, Lise Getoor, Christopher Diehl, SigKDD Explorations Special Issue on Link Mining, Volume 7, Number 2 - December, 2005

Mon 2/4, 11AM - 12PM, AVW 2460 Aron Culotta Talk, Learning to Extract Knowledge from Text
optional
Tue 2/5 Michell Girvan, Contagion Processes in Complex Networks (slides) The structure and function of complex networks , M. E. J. Newman, SIAM Review 45 , 167-256 (2003).
Tue 2/12 Discussion Link Mining Survey and Newman article

Thu 2/14 Adam Perer, Integrating Statistics and Visualization: Gaining Clarity during Exploratory Data Analysis of Social Networks Intro to the SigKDD Explorations Special Issue on Visual Analytics , by Daniel Keim, and Joern Schneidewind, 2007, (very short).

Balancing Systematic and Flexible Exploration of Social Networks, Adam Perer and Ben Shneiderman, IEEE Transactions on Visualization and Computer Graphics, 12, 5, (October 2006), 693 - 700.

Tue 2/19, AVW 2460

Edo Aroldi, Princeton
Bayesian analysis of complex biological and social systems (abstract)

Mixed membership analysis of high-throughput interaction studies, http://arxiv.org/abs/0706.0294/ (required)

Coordination of Growth Rate, Cell Cycle, Stress Response, and Metabolicctivity in Yeast, http://www.molbiolcell.org/cgi/reprint/19/1/352 (optional)

Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes http://www.molbiolcell.org/cgi/reprint/11/12/4241 (optional)

Thu 2/21, AVW 2460

Jure Leskovec, CMU
Network cascades: observations, models and algorithms (abstract)

Empirical+models for viral marketing
http://www.cs.cmu.edu/~jure/pubs/viral-tweb.pdf
Empirical+models for blogs
http://www.cs.cmu.edu/~jure/pubs/blogs-sdm07.pdf
Algorithms for detecting cascades:
http://www.cs.cmu.edu/~jure/pubs/detect-kdd07.pdf
Tue 2/26, 11AM-12PM, AVW 2460 Doug Downey, UW

 

Thu 2/28

 

Jen Golbeck , UMD
Computing Trust in Social Networks (abstract) ( Slides )

 
Tue 3/4

Discussion and Project One Minutes

Slides on How to Write a Paper , courtesy of Mike Hicks
Instructions on Project Proposal  
Thu 3/6,

Luke McDowell , United States Naval Academy (abstract)

Cautious Inference in Collective Classification, Luke McDowell, Kalyan Gupta and David Aha, AAAI 2007.
Why Collective Inference Improves Relational Classification, D. Jensen, J. Neville and B. Gallagher, KDD 2004.
Tue 3/11

Survey 1: Collective Classification
Mustafa Bilgic
slides

Collective Classification in Network Data Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, Tina Eliassi-Rad. UMD Tech Report 4905, 2008.
Discriminative probabilistic models for relational data B. Taskar, P. Abbeel, and D. Koller. UAI, 2002.
Thu 3/13

Survey 2: Information Diffusion
Saket Navlakha and Barna Saha
slides

Maximizing the Spread of Influence through a Social Network D. Kempe, J. Kleinberg, E. Tardos. Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2003.
Patterns of Influence in a Recommendation Network Jure Leskovec, Ajit Singh, Jon Kleinberg. Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2006.
Fri 3/14, AVW 2460 (updated!)

Lada Adamic, UMich
Expertise Sharing Dynamics in Online Forums (abstract)

 
Tue 3/25

Survey 3: Node Ranking
Brad Skaggs and Lidan Wang
slides

David Austin How Google Finds Your Needle in the Web's Haystack (html)

J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999. (HITS) (pdf)

Amy N. Langville and Carl D. Meyer Deeper Inside PageRank. Internet Mathematics, Vol. 1(3):335-380, 2005. pages 1-20. (pdf) (optional)

S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proc. 7th International World Wide Web Conference, 1998. (pdf) (optional)

Thu 3/27

No class

 
Tue 4/1

No class

 
Thu 4/3

Survey 4: Entity Resolution
Matthias Broecheler and Preetam Maloor
slides

Collective entity resolution in relational data, I. Bhattacharya and L. Getoor. ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 1, 2007. pdf
Toward conditional models of identity uncertainty with application to proper noun coreference, A. McCallum and B. Wellner. IJCAI Workshop on Information Integration on the Web, 2003. pdf
Tue 4/8

Survey 5: Link Prediction
Walaa El-Din Moustafa and Hossam Sharara
slides

The Link Prediction Problem for Social Networks, D. Liben-Nowell, J. Kleinberg. Proc. 12th International Conference on Information and Knowledge Management (CIKM), 2003. pdf
Link Prediction in Relational Data, B. Taskar, M. F. Wong, P. Abbeel and D. Koller. Neural Information Processing Systems Conference (NIPS03), Vancouver, Canada, December 2003. pdf
Thu 4/10

No Class

 
Tue 4/15

Survey 6: Group Discovery - Stochastic Block Models
Galileo Namata and Elena Zheleva
slides

Learning systems of concepts with an infinite relational model. C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada and N. Ueda. National Conference on Artificial Intelligence (AAAI), 2006. pdf
K. Faust and S. Wasserman. Blockmodels: Interpretation and evaluation. Social Networks (Invited Paper), 14, 561, 1992. pdf
Wed 4/17, AVW 2120 Vitor Carvalho , CMU
Modeling Intention in Email: Speech Acts, Information Leaks and User Ranking Methods (abstract)
 
Thu 4/17 Cosma Shalizi, CMU
Discovering Functional Communities in Dynamical Networks (abstract)
 
Tue 4/22

Survey 7: Group Discovery 2
Adam Phillipy and Sam Huang
slides

Community structure in social and biological networks. M. Girvan and M. E. J. Newman, Proceedings of the National Academy of Sciences, 2002. pdf
Clustering Relational Data Using Attribute and Link Information. J. Neville, M. Adler, and D. Jensen. In Proceedings of the Text Mining and Link Analysis Workshop, 18th International Joint Conference on Artificial Intelligence, 2003. pdf
Optional:
Modularity and community structure in networks. M. E. J. Newman.- Proceedings of the National Academy of Sciences, 2006. pdf
Thu 4/24

Survey 8: Topic Models
Prithvi Sen and Roman Stanchak
slides

Probabilistic Topic Models, M. Steyvers and T. Griffiths, in T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Handbook of Latent Semantic Analysis, 2007. pdf
The Missing Link: A Probabilistic Model of Document Content and Hypertext Connectivity. T. Hoffman and D. Cohn, Conference on Neural Information Processing Systems (NIPS), 2001, pdf
Fri 4/25, 2pm AVW 2120

Eyal Amir, UIUC
Lifted Relational Probabilistic Inference (abstract)

 
Tue 4/29

Discussion

 
Thu 5/1

Survey 9: Organization and Taxonomy Formation
Rob Patro and Mike Pekala
slides

Structural Inference of Hierarchies in Networks, A. Clauset, C. Moore, M.J. Newman, ICML Workshop on Social Network Analysis, 2006. pdf
Learning Annotated Hierarchies from Relational Data, D. Roy, C. Kemp, V. Mansinghka and J. Tenenbaum, Conference on Neural Information Processing Systems (NIPS), 2006, pdf
Tue 5/6

Survey 10: Graph Mining
Hassan Sayyadi and Shanchan Wu