PhD Proposal: Exploring Blind and Sighted Users' Interactions with Error-Prone Speech and Image Recognition

Talk
Jonggi Hong
Time: 
01.30.2020 12:00 to 14:00
Location: 
IRB 5105

Speech and image recognition, already employed in many mainstream and assistive applications, hold great promise for increasing independence and improving the quality of life for people with visual impairments. However, their error-prone nature combined with challenges in visually inspecting errors, can hold back their use for more independent living. This thesis explores blind users’ challenges and strategies in handling speech and image recognition errors through non-visual interactions looking at both perspectives: that of an end-user interacting with an already trained and deployed model such as an automatic speech recognizer (ASR) but also that of an end-user who is empowered to attune the model to their idiosyncratic characteristics such as a teachable object recognizer. To better contextualize the findings and account for human factors beyond visual impairments, user studies also focus on sighted participants as a parallel thread.More specifically, Part I of this thesis explores blind and sighted participants experience with automatic speech recognition errors through audio-only interactions. Here, the recognition result is not being displayed instead it is played back through text-to-speech. Through carefully engineered speech dictation tasks in both crowdsourcing and controlled-lab setting, this part investigates the percentage and type of errors that users miss, their strategies in identifying errors, as well as potential manipulations of the synthesized speech that may help users better identify the errors.Part II investigates effective interactions for identifying and reducing errors in the context of teachable object recognizers. In this case, users are not simply the consumers of model predictions but can have a more active role by tweaking the behavior of the model with their training examples. Thus, new errors can be introduced that are related to their training examples. Through crowdsourcing and controlled-lab studies, Part II decouples these errors based on inexperience in machine teaching and challenges in photo-taking.Examining Committee:

Chair: Dr. Hernisa Kacorri Dept rep: Dr. Marine Carpuat Members: Dr. Huaishu Peng