PhD Defense: Collective Relational Data Integration with Diverse and Noisy Evidence
Driven by the growth of the Internet, online applications and data sharing initiatives, available structured data sources are now vast in number. There is a growing need to integrate these structured sources to support a variety data science tasks, including predictive analysis, data mining, improving search results, and generating recommendations. A particularly important integration challenge is dealing with the heterogeneous structures of relational data sources. In addition to the large number of sources – both individual sources and their versions over time – the difficulty also lies in the growing complexity of sources, and in the noise and ambiguity present in real-world sources. Existing automated integration approaches handle the number and complexity of sources, but nearly all are based on brittle technologies that cannot handle noise and ambiguity. Corresponding progress has been made in probabilistic learning approaches to handle noise and ambiguity in inputs, but until recently those technologies have not scaled to the size and complexity of relational data integration problems. This dissertation addresses fundamental challenges arising from this gap in existing approaches, and demonstrates promising new relational data integration approaches employing collective, probabilistic reasoning to handle inputs that can be diverse, noisy, and ambiguous.
Chair: Dr. Lise Getoor Co-Chair: Dr. Dana Nau Dean's rep: Dr. Louiqa Raschid Members: Dr. Héctor Corrada Bravo Dr. Alan Ritter