PhD Proposal: Analyzing communicative choices to understand their motivations, context-based variation, and social consequences

Talk
Pranav Goel
Time: 
11.15.2022 11:00 to 13:00
Location: 

IRB 3137

In many settings, communicating in a language requires making choices among differ-ent possibilities—the issues to focus on (and/or exclude), the aspects to highlight withinany issue, the narratives to include, and more. These choices, deliberate or not, are sociallystructured. The ever-increasing availability of unstructured large-scale textual data, in partdue to the bulk of communication and information dissemination happening in online ordigital spaces, makes natural language processing (NLP) techniques a natural fit in help-ing understand socially-situated choices (communicative choices) using that textual data.Within NLP methods, unsupervised NLP methods are often needed since large-scale textual data in the wild often does not have accompanying labels, and any existing labels orcategorization may not fit be appropriate for answering specific research questions.This proposal seeks to address the following question: how can we use unsupervisedNLP methods to study texts authored by specific people or institutions in order to effectively explicate the communicative choices being made as well as investigate their potential motivations, context-based variation, and consequences?Our first set of contributions centers on methodological innovation. We focus on topicmodeling—a class of generally unsupervised NLP methods that can automatically discover author’s communicative choices in the form of topics or categorical themes presentin a collection of documents. We introduce a new neural topic model (NTM) that effectively incorporates contextualizing sequential knowledge. Next, we find critical gaps inthe near-universal automated evaluation paradigm that compares different models in thetopic modeling methods research, and we then operationalize different evaluation criteria which are grounded in the needs of the well-defined use case of content analysis. The latter two works call into question much of the recent work in NTM development claiming“state-of-the-art” and emphasize the importance of validating the outputs of NLP methods.To use unsupervised NLP methods to investigate potential motivations, context-basedvariation, and consequences of communicative choices, we link textual data with information about the authors, social contexts, and media involved in their production, and use these connected information sources to help conduct empirical research in social sciences.In our second set of contributions, we analyze a previously unexplored connection between a politician’s donors and their communicative choices in their floor speeches to showhow donations influence issue-attention in US Congress, enabling a new look at money inpolitics and providing an example of studying motivations behind communicative choices.Our third set of contributions uses text-based ideal point extraction to better understandthe role of institutional constraints and audience considerations in the varying expressionand ideological positioning of politicians. Domain experts validate and annotate modeling outputs to establish the reliability of the automated tool. Proposed work will extendthe existing text-based ideal point extraction tool, validate our new method, and use it forempirical research on the impact of issue-context on ideological frames.In our fourth set of contributions, we demonstrate the potential of both unsupervisedNLP techniques and social network data and methods in better understanding the downstream consequences of communicative choices by focusing on misinformation narratives in mainstream media, viewing and highlighting misinformation as something beyond just false claims published by certain bad actors.Our final piece of proposed work will use our experiences with diverse kinds of dataand methods to make our fifth set of contributions: a way of finding and analyzing perspectives not present in (or excluded from) one particular discourse. Specifically, we willpropose a new method and create a new dataset to find cases where certain themes andframes (communicative choices) present in the public discourse (social media, open-endedsurveys, etc.) on a specific issue are not present or given attention to in elite discourse(government communiques, mainstream news media, scientific literature).

Examining Committee

Chair:

Dr. Philip Resnik

Department Representative:

Dr. John Dickerson

Members:

Dr. Jordan Boyd-Graber

Dr. Naeemul Hassan

Dr. Kris Miler

Dr. David Lazer (Northeastern University)