CMSC828G: Statistical Relational Learning and Link Mining

Course Description

Statistical machine learning is in the midst of a "relational revolution". After many decades of focusing on learning statistical models from examples described by a simple vector of attribute values, where each example is assumed to be independent and identically-distributed, many researchers are now studying problems in which the examples are linked together into complex networks. These networks can be a simple as sequences and 2-D meshes (such as those arising in part-of-speech tagging and remote sensing) or as complex as social networks, the world wide web, complex visual scenes, dynamic software communication structures and biological data.

This course will cover recent approaches in machine learning which combine probabilistic and logical representations that deal with graph, network and relational data. In this course we will review many of the recently proposed approaches. We will identify the fundamental algorithmic challenges. We will survey work from machine learning, graphical models, and theoretical computer science. In the latter part of the course we will explore several specific applications areas, to be selected based on class interest.

The goal of the course is to give you the practical machine learning and statistical modeling expertise to analyze the types of network data that arise in your research. This should be applicable in areas such as bioinformatics, computer vision, computational linguistics, databases, networks and systems.

Prerequisites: Background in machine learning and graphical models suggested. Mathematical maturity and a basic course in probability required.

Class Format

This is a seminar course. Each class will consist of instructor presentations, student presentations and discussion. Students will be required to do a significant project for the course. A significant portion of the grade will be based on class participation. Additional coursework will include knowledge engineering and programming exercises.

Class Materials

The readings will be based on materials that will be part of an upcoming SRL book and on selected additional readings. Course materials will be available from the class wiki,

1/28 Introduction to SRL
2/4 Introduction to Graphical Models
Graphical Models.ppt
Presenter: Indrajit Bhattacharya

Chapters 1-3 from 'An Introduction to Probabilistic Graphical Models' by Michael Jordan


Hardcopies of ch2 and ch3 are available in the CS Library. These are draft chapters of an upcoming book, please do not distribute

2/11 Frame-based SRL Approaches: Probabilistic Relational Models and
Probabilistic Relational Models.ppt
Due: Focus Problem writeup

Learning Probabilistic Relational Models, L. Getoor, N. Friedman, D. Koller, A. Pfeffer.
chapter from book, Relational Data Mining, edited by S. Dzeroski and N. Lavrac, Eds., Springer-Verlag, 2001.



2/18 Frame-based SRL Approaches continued
slides continued
from last time
Learning Probabilistic Models of Link Structure, L. Getoor, N. Friedman, D. Koller, B. Taskar. Journal of Machine Learning Research, 2002.

PRMs with Class Hierarchies, chapter 5 of Learning Statistical Models of Relational Data, Lise Getoor, PhD Thesis, Stanford University, 2001.


2/25 Probabilistic Entity Relationship Models

ER Review

Probabilistic Entity Relation

Due: Focus Problem description uploaded to wiki, with PRM and PER descriptions


Entity-Relationship Model, ch. 2 Korth and Silberschatz Database Systems Concepts. Hardcopy available in the CS Library.

Probabilistic Models for Relational Data, David Heckerman, Christopher Meek and Daphne Koller, Microsoft Technical Report TR-2004-30.


3/4 Introduction to Logic-based Approaches

Logic Review(1,2,5,6)



Bayesian Logic Programs, Kristian Kersting and Luc DeRaedt, Institure for Computer Science, University of Frieburg, technical report 151,

Learning Bayesian Logic Programs, Kristian Kersting and Luc DeRaedt, Institure for Computer Science, University of Frieburg, technical report 174.


3/11 Logic-based Approaches cont.

LBN Slides

LPRM Slides

Due: Focus Problem logic-based description

Due: Project Proposal

Logical Bayesian Networks, Daan Fierens, Hendrik Blockeel, Jan Ramon and Maurice Bruynooghe. In Proc. 3rd Int'l Workshop on Multi-Relational Data Mining (MRDM-2004),
Seattle, WA, USA, 2004.

Logic-based Approach to PRMs (draft), Lise Getoor and John Grant. Draft available from class wiki


3/18 ICL Slides  

Answering queries from context-sensitive probabilistic
knowledge bases
, Liem Ngo and Peter Haddawy. Theoretical Computer
Science, 171(1-2):147-177, 1996.

The Independent Choice Logic for modelling multiple agents under uncertainty'', David Poole. Artificial Intelligence, 94(1-2), special issue on economic principles of multi-agent systems, pages 7-56, 1997.


3/25 Spring Break      
4/1 BLOG and IBAL

BLOG slides

IBAL slides


BLOG: Probabilistic Models with Unknown Objects, Brian Milch Bhaskara Marthi, Stuart Russell David Sontag, Daniel L. Ong, Andrey Kolobov. Draft available from wiki.

The Design and Implementation of IBAL: A General-Purpose Probabilistic Language. Avi Pfeffer. Draft available from wiki. Please do not distribute.

4/8 SLPs and PRISM Cussens Slides Due: Project Progress Report

Statistical aspects of stochastic logic programs, James Cussens. In Artificial Intelligence and Statistics 2001: Proceedings of the Eighth International Workshop, January 2001.

Parameter learning of logic programs for symbolic-statistical modeling. Sato, T. and Kameya, Y. Journal of Artificial Intelligence Research (JAIR), Vol.15, pp.391–454, 2001.

Integrating by Separating: Combining Probability and Logic with ICL, PRISM and SLPs. James Cussens. Dagstuhl seminar on Probability, Logic and Relational Learning, February 2005.

4/15 Undirected Models: RMNs and Markov Logic

Ben's Slides

MLN Slides


Richardson and Domingos, Markov Logic: A Unifying Framework for Statistical Relational Learning. Draft available from class wiki.

Discriminative Probabilistic Models for Relational Data, B. Taskar, P. Abbeel and D. Koller. Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02?), Edmonton, Canada, August 2002.

Link Prediction in Relational Data, B. Taskar, M. F. Wong, P. Abbeel and D. Koller. Neural Information Processing Systems Conference (NIPS03?), Vancouver, Canada, December 2003.


An analysis of first-order logics of probability, Artificial Intelligence 46, 1990, pp. 311-350

Maximum Entropy Probabilistic Logic. Mark A. Paskin. Technical Report UCB/CSD-01-1161, University of California, Berkeley, 2002.

4/29   Final focus problem writeups due Discussion
5/6     no class

Project Presentations

Tentative Syllabus:

Review (as necessary throughout)
Machine Learning
Models for sequences and spatial data
Graphical Models
First-Order Logic
Inductive Logic Programming

Statistical Relational Learning (SRL) Introduction (1-2 classes)

SRL Representations (3-4 classes)
Probabilistic Relational Models
Relational Markov Networks
Probabilistic Entity Relation Models
Bayesian Logic Programs
Stochastic Logic Programs
Probabilistic Context-free Grammars
and others

Descriptive Modeling in Graphs and Social Networks (2-3 classes)
Network Properties
Random Graphs and the P* Model

SRL Inference (2-3 classes)
Link Prediction
Inference reuse in OOBNs
Collective Classification
Belief Propagation
Approximate Inference

SRL Learning (2-3 classes)
Parameter estimation in Markov Random Fields
Learning in Probabilistic Relational Models
Learning in Relational Markov Networks

Plus we will cover some of the following topics, depending on student interest, paying particular attention to how to approach these problems from an SRL perspective:

Special Topics (5 classes)
Collaborative Filtering and Recommender Systems
Graph Mining
Information Extraction and Integration
Language and Speech Modeling
Malware (virus, worm, etc.) detection
Semantic Web Mining
Sensor Networks
Social Networks and Link Analysis
Spam and Email filtering
Trust Networks
Viral marketing
Web Analysis



Course Credit

This course does not count as a PhD Core or MS Comps course. This
course can be used toward PhD coursework as part of the 2 additional
non-core classes required or as MS qualifying coursework.