Notes
Slide Show
Outline
1
"Extract from Catherine Plaisant talk..."
  • Extract from Catherine Plaisant talk at the Human-Computer Interaction Lab (HCIL) annual symposium (www.cs.umd.edu/hcil/soh)


  • About FeatureLens (a MONK prototype)
  • Interface developed at Maryland using
    D2K frequent pattern analysis from NCSA
2
Exploring and Visualizing Patterns in
Literary Text Collections with FeatureLens

    • Anthony Don, Catherine Plaisant, Tanya Clement
    • University of Maryland


    • Loretta Auvil, NCSA


    • With the help of others from the MONK project



3
Supporting Literary Scholars
4
 
5
Motivating Example
  • Study of Gertrude Stein’s  “The Making of Americans”  (MoA)
  • Tanya Clement, PhD student from English Department
6
Sample paragraph
  • [1086] Always from the beginning there was to me all living as repeating. This is now a description of my feeling. As I was saying listening to repeating is often irritating, always repeating is all of living, everything in a being is always repeating, more and more listening to repeating gives to me completed understanding.
7
Sounds
  • [1086] Always from the beginning there was to me all living as repeating. This is now a description of my feeling. As I was saying listening to repeating is often irritating, always repeating is all of living, everything in a being is always repeating, more and more listening to repeating gives to me completed understanding.
8
Words
  • [1086] Always from the beginning there was to me all living as repeating. This is now a description of my feeling. As I was saying listening to repeating is often irritating, always repeating is all of living, everything in a being is always repeating, more and more listening to repeating gives to me completed understanding.
9
Higher level concept
  • [1086] Always from the beginning there was to me all living as repeating. This is now a description of my feeling. As I was saying listening to repeating is often irritating, always repeating is all of living, everything in a being is always repeating, more and more listening to repeating gives to me completed understanding.
10
Ngrams (sets of consecutive words)
  • [1086] Always from the beginning there was to me all living as repeating. This is now a description of my feeling. As I was saying listening to repeating is often irritating, always repeating is all of living, everything in a being is always repeating, more and more listening to repeating gives to me completed understanding.
11
Example of “Literary” question about MoA

  •    Do the changes in repetition
    correspond to the novel’s evolving
    theories about identity and representation? And how?


12
Questions the tool tries to address
  • What text features are highly repeated in the text?
    • Frequent words
    • Frequent n-grams (consecutive words)
    • Frequent patterns of n-grams (more “fuzzy” non consecutive matches)
  • How do they change over time (i.e. along the text)?
    • Locate features in text
    • Compare features
    • Distribution over time
    • Find features that exhibit specific distributions (e.g. spike)
13
Trying existing tools – Text Arc
14
Trying existing tools – Text Arc
15
 
16
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
Trends in distributions
  • Define metrics on distributions and rank features accordingly
    • increase/decrease topics evolution
29
Trends in distributions
  • Define metrics on distributions and rank features accordingly
    • spikes/sinks     specific events
30
 
31
 
32
 
33
 
34
 
35
Evaluation strategy
  • Ongoing longitudinal case study
    • Tanya Clement and « The Making of Americans »
  • Pilot user study with 8 users
    • 3 tasks then free exploration (30 min)
    • think aloud protocol - gather insights about text
36
Example of finding
 
by Tanya Clement
studying Gertrude Stein’s text
37
 
38
 
39
 
40
 
41
 
42
 
43
 
44
 
45
 
46
 
47
 
48
Questions we addressed
  • What text features are highly repeated in the text?
    • Frequent words
    • Frequent n-grams (consecutive words)
    • Frequent patterns of n-grams (more “fuzzy” non consecutive matches)
  • How do they change over time (i.e. along the text)?
    • Locate features in text
    • Compare features
    • Distribution over time
    • Find features that exhibit specific distributions (e.g. spike)
49
Thank you
  • Text mining requires good UI to analyze results
  • Live demo and technical report at: www.hcil.cs.umd.edu/hcil/textvis/featurelens



  • Support from HCIL and Andrew W. Mellon Foundation