Kim, J., Oard, D., Soergel, D. (January 2003)
This paper reports on an exploratory study of the criteria searchers use when judging the relevance of recorded speech from radio programs and the attributes of a recording on which those judgments are based. Five volunteers each performed three searches using two systems (NPR Online and SpeechBot) for three questions and judged the relevance of the results. Data were collected through observation and screen capture, think aloud, and interviews; coded; and analyzed by looking for patterns. Criteria used as a basis for selection were found to be similar to those observed in relevance studies with printed materials, but the attributes used as a basis for assessing those criteria were found to exhibit modality-specific characteristics. For example, audio replay was often found to be necessary when assessing story genre (e.g., report, interview, commentary) because of limitations in presently available metadata. Participants reported a strong preference for manually prepared summaries over passages extracted from automatic speech recognition transcripts, and consequential differences in search behavior were observed between the two conditions. Some important implications for interface and component design are drawn, such as the utility of summaries at multiple levels of detail in view of the difficulty of skimming imperfect transcripts and the potential utility of automatic speaker identification to support authority judgments in systems.