PhD Proposal: Structural Scaffolding for Making Sense of Document Collections

Talk
Joseph Barrow
Time: 
03.08.2021 08:00 to 10:00
Location: 

Remote

Readers of all kinds are called upon to make sense of not only individual documents, but often sets of relevant documents. When making sense of sets of documents, readers can often make use of structural cues, such as product comparison tables, paper citations that provide supporting evidence, or outlines. Without these cues it can be easy to "miss the forest for the trees", and focus on details within documents as opposed to the broader picture. General solutions to this problem are difficult because types of documents and what the reader needs from the documents both vary wildly. To address this, this work proposes "structural scaffolding", which aims to induce certain classes of structures over information in the documents, and explores two different types of scaffolds.In this proposal, a "scaffold" is defined as a data structure built from a set of relevant documents that meets two criteria: (1) it manifests a structure implicit in the documents, and (2) it is purposefully chosen to support a known task. Scaffolds can be broadly categorized into different types. Of the many possible types, this proposal focuses on two: dialectical scaffolds of collections, which relate arguments across documents to each other; and topical scaffolds of documents, which capture aboutness structure within individual documents.For each type, enabling models, enabling data structures, and downstream uses are explored. A common thread for enabling applications is jointly considering content and structure, through both the model and the data structures used. For instance, work presented on segmenting and segment-labeling documents as a topical scaffold shows that a model which jointly learns from the content and structure of segments has significant gains over content-only baselines. Similarly, presented and proposed on dialectical scaffolding is motivated by the importance of considering the relationship between arguments rather than the just text of the arguments, the key intuition behind "syntopical graphs". Some downstream applications of scaffolds are then explored, including: as a training signal for representation learning; supporting a collection browsing task called syntopical reading; and use in an interactive interface for collection browsing.Examining Committee:

Chair: Dr. Philip Resnik Co-Advisor: Dr. Doug Oard Dept rep: Dr. John Dickerson Members: Dr. Jordan Boyd-Graber