Invited Speakers
Presentation Details
Graphs for Machine Learning: Useful Metaphor or Statistical Reality
Stephen E. Fienberg, Carnegie Mellon
Slides: pdf (1.7mb)
Graphs play an important role as a representation for two dual
representations of statistical models---one where the nodes
correspond to units and the edges to variables that relate them to
on another, and the other where the nodes are variables and the
edges represent relationships among them. In the former units are
inherently dependent and relationships may or may not be, whereas
in the later the units are inherently independent and the focus is
on independence relationships among the variables. In this talk I
describe both types of representations and how to think about them
in the context of large-scale data examples, especially those
involving discrete relationships or variables.
Stephen E. Fienberg is Maurice Falk University
Professor of Statistics and Social Science at Carnegie Mellon
University, with appointments in the Department of Statistics, the
Machine Learning Department, Cylab, and i-lab. He has served as
Dean of the College of Humanities and Social Sciences at Carnegie
Mellon and as Vice President for Academic Affairs at York
University, in Toronto, Canada, as well as on the faculties of the
University of Chicago and the University of Minnesota. He was
founding co-editor of Chance and served as the Coordinating and
Applications Editor of the Journal of the American Statistical
Association. He is currently one of the founding editors of the
Annals of Applied Statistics and is co-founder of the new online
Journal of Privacy and Confidentiality, based in Cylab. He has
been Vice President of the American Statistical Association and
President of the Institute of Mathematical Statistics and the
International Society for Bayesian Analysis. His research
includes the development of statistical methods, especially tools
for categorical data analysis, from both likelihood and Bayesian
perspectives. Fienberg is the author or editor of over 20 books
and 400 papers and related publications. His 1975 book on
categorical data analysis with Bishop and Holland, Discrete
Multivariate Analysis: Theory and Practice, and his 1980 book The
Analysis of Cross-Classified Categorical Data are both Citation
Classics and were recently reprinted by Springer. He is a member
of the U. S. National Academy of Sciences, and a fellow of the
Royal Society of Canada, the American Academy of Arts and
Sciences, and the American Academy of Political and Social
Science.
Efficient tools for mining large graphs: Indexing, sampling, counting, and predicting
Aristides Gionis, Yahoo! Research
Slides: pdf (2.9mb)
Graphs provide a general framework for modeling entities and their
relationships, and they are routinely used to describe a wide
variety of data such as the Internet, the Web, social networks,
biological data, citation networks, and more. To deal with large
graphs one needs not only to understand which graph features to
mine for the application at hand, but also to develop efficient
tools that cope with graphs having millions of nodes. In this talk
we will review some recent work in this area. We will discuss
algorithms for indexing distances in graphs, sampling and counting
patterns, finding frequent patterns of evolution, and classifying
nodes on a graph. We motivate the problems we address with real
application.
Aristides Gionis is a senior research scientist in Yahoo!
Research, Barcelona. He received his Ph.D from the Computer
Science department of Stanford University in 2003, and between
2003 and 2006 he has been a senior researcher at the Basic
Research Unit of Helsinki Institute of Information Technology,
Finland. His research interests include algorithms for data
analysis and applications in the Web domain.
Kernel Methods for Structured Inputs and Outputs
Thomas Gärtner, University of Bonn and Fraunhofer IAIS
Slides: pdf (623k)
In this talk I will introduce the principles of kernel methods and
show how this popular class of learning algorithms can be extended
to handle structured inputs and outputs. I will concetrate on
highlighting conceptual differences and similarities rather than
their technical details.
Thomas Gärtner is the head of an Emmy-Noether
research group at the University of Bonn and lead scientist for
machine learning at the Fraunhofer Institute for Intelligent
Analysis and Information Systems IAIS. He holds a PhD from the
University of Bonn, an MSc from the University of Bristol, and a
Diplom from the University of Cooperative Education in
Mannheim. During his career he was employed by the University of
Bonn, Fraunhofer IAIS, the University of Bristol, GMD IPSI, and
Alcatel SEL. His work on kernels for structured data and
structured output prediction is highly cited and earned him
several awards. He serves as an action editor for the Machine
Learning Journal; has given tutorials as well as invited talks at
premier venues such as ICML; has served as a program committee
member for many major conferences on Machine Learning, and as an
area chair for ECML/PKDD. This year he was a member of the senior
program committee of AAAI and an area chair of ICML.
Evaluation Strategies for Network Classification
Jennifer Neville, Purdue University
Slides: pdf (1.6mb)
A central methodological question in machine learning research is
how to accurately compare two learning algorithms and assess
whether the observed performance difference is significant. We
investigate this issue in the context of collective classification
in networks, where there are dependencies among both the labeled
(training) and unlabeled (test) instances. These dependencies can
complicate the direct application of conventional statistical
tests, which assume independent samples. Empirical exploration of
potential sources of bias due to network dependencies shows
surprisingly that a commonly- used form of evaluation can result
in unacceptably high levels of Type I error. In other words, as
much as 50% of the time observed algorithm difference may be
incorrectly determined to be significant, when it is not. We
propose two solutions to this bias---the first is a network
cross-validation sampling method and the second is an analytical
correction to conventional t-tests. We evaluate the corrections on
both synthetic and real world data, with simulated and real
classifiers, showing that the tests successfully adjusts for the
bias, while maintaining reasonable levels of statistical power.
Jennifer Neville is an assistant professor at Purdue
University with a joint appointment in the Departments of Computer
Science and Statistics. She received her PhD from the University
of Massachusetts Amherst in 2006. She received a DARPA IPTO Young
Investigator Award in 2003 and was selected as a member of the
DARPA Computer Science Study Group in 2007. In 2008, she was
chosen by IEEE as one of "AI's 10 to watch." Her research focuses
on developing data mining and machine learning techniques for
relational domains, including citation analysis, fraud detection,
and social network analysis.
Network Event Data over Time: Prediction and Latent Variable Modeling
Padhraic Smyth, UC Irvine
Slides: pdf (2.1mb)
In this talk I discuss the problem of modeling and prediction of
relational event data in the form of time-stamped events involving
a set of actors. This type of data is increasingly common in a
number of different application contexts, such as email and
blogging. The talk will begin by motivating the problem of
modeling such data, discussing for example the difference between
discrete-time aggregated network representations and
continuous-time event-based representations. We will review some
of the basic strategies in building statistical models for such
data, starting with models for static (non-temporal) data and
moving to temporal models. In particular we will focus on
latent-variable models which are emerging as a broadly applicable
and flexible framework for network modeling. Recent ideas in this
area will be discussed as well as new ongoing work. We will also
emphasize the importance of predictive evaluation in network
modeling and discuss a number of issues that arise in this
context. Experimental results will be presented comparing
different modeling approaches using a variety of real-world
event-based network data sets. The talk will conclude with some
speculative comments on future research directions.
Joint work with Arthur Asuncion, Chris DuBois, and Jimmy Foulds.
Padhraic Smyth is a Professor in the Department of
Computer Science and also serves as Director of the Center for
Machine Learning and Intelligent Systems, both at the University
of California, Irvine. He also has joint appointments in the
Statistics and Biomedical Engineering Departments at UC Irvine.
His research interests include machine learning, data mining,
pattern recognition, and applied statistics. He was a recipient of
best paper awards at the 2002 and 1997 ACM SIGKDD Conferences,
received the NSF CAREER award in 1999, the ACM SIGKDD Innovation
Award in 2009, and is a AAAI Fellow. He is co-author of Modeling
the Internet and the Web: Probabilistic Methods and Algorithms
(with Pierre Baldi and Paolo Frasconi in 2003), and was also
co-author of Principles of Data Mining, MIT Press, August 2001,
with David Hand and Heikki Mannila. He received a first class
honors degree in Electronic Engineering from University College
Galway (National University of Ireland) in 1984, and the MSEE and
PhD degrees from the Electrical Engineering Department at the
California Institute of Technology in 1985 and 1988 respectively.
From 1988 to 1996 he was a Technical Group Leader at the Jet
Propulsion Laboratory, Pasadena, and has been on the faculty at UC
Irvine since 1996. In addition to his academic research he is
also active in industry consulting, working with companies such as
Netflix (on the Netflix Prize), eBay, Oracle, Yahoo!, Nokia, and
AT&T.
Mining Massive Graphs for Telecommunication Applications
Chris Volinsky, AT&T Labs
Slides: pdf (1.3mb)
Telecommunications data is all about networks - packet delivery
networks, cell tower networks, fiber optic networks. But perhaps
the most interesting network is the virtual one created by
billions of telephony transactions every day. This callgraph
network represents hundreds of millions of devices and the
billions of connections between them. How do we make sense of
such a massive graph? How do we find communities, or look for
influential members? In this talk I will present various
applications of callgraphs at AT&T, from fraud detection to
customer loyalty to targeted marketing. I will cover our
ego-centric representation of the graph (Communities of Interest)
and discuss how it helps us to analyze the graph at speed and
scale.
Chris Volinsky is Executive Director of the
Statistics Research Department at AT&T Labs-Research in
Florham Park, N.J. Chris got his PhD from the University of
Washington in 1997 studying Bayesian Model Averaging. He joined
AT&T in 1997 and became Director of the Statistics Research
Department in 2004. His research at AT&T focuses on large
scale data mining: recommendation systems, social networks,
statistical computation, and anomaly detection. In 2009, Chris was
a member of the 7-person, 4-country team BellKor's Pragmatic Chaos
that won the $1M Netflix Prize, an open competition for improving
Netflix' online recommendation system.
Dynamic Network Analysis: Model, Algorithm, Theory, and Application
Eric Xing, CMU
Slides: pdf (29mb)
Across the sciences, a fundamental setting for representing and
interpreting information about entities, the structure and
organization of communities, and changes in these over time, is a
stochastic network that is topologically rewiring and semantically
evolving over time, or over a genealogy. While there is a rich
literature in modeling invariant networks, until recently, little
has been done toward modeling the dynamic processes underlying
rewiring networks, and on recovering such networks when they are
not observable.
In this talk, I will present two recent developments in
analyzing what we refer to as the dynamic tomography of evolving
networks. I will first present new sparse-coding algorithms for
estimating the topological structures of latent evolving networks
underlying nonstationary time-series or tree-series of nodal
attributes, along with theoretical results on the asymptotic
sparsistency of the proposed methods; then, I will present a new
Bayesian model for estimating and visualizing the trajectories of
latent multi-functionality of nodal states in the evolving
networks.
I will show some promising empirical results on recovering and
analyzing the latent evolving social networks in the US Senate and
the Enron corporation, and the evolving gene network of fruit fly
while aging, at a time resolution only limited by sample
frequency. In all cases, our methods reveal interesting dynamic
patterns in the networks.
Dr. Eric Xing is an associate professor in the
School of Computer Science at Carnegie Mellon University. His
principal research interests lie in the development of machine
learning and statistical methodology; especially for solving
problems involving automated learning, reasoning, and
decision-making in high-dimensional and dynamic possible worlds;
and for building quantitative models and predictive understandings
of biological systems. Professor Xing received a Ph.D. in
Molecular Biology from Rutgers University, and another Ph.D. in
Computer Science from UC Berkeley. His current work involves, 1)
foundations of statistical learning, including theory and
algorithms for estimating time/space varying-coefficient models,
sparse structured input/output models, and nonparametric Bayesian
models; 2) computational and statistical analysis of gene
regulation, genetic variation, and disease associations; and 3)
application of statistical learning in social networks, data
mining, vision. Professor Xing has published over 100
peer-reviewed papers; he is an action editor of the Machine
Learning Journal, an associate editor of the Annals of Applied
Statistics, and the PLoS Journal of Computational Biology. He is a
recipient of the NSF Career Award, the Alfred P. Sloan Research
Fellowship in Computer Science, and the United States Air Force
Young Investigator Award.
Web Accessibility