Notes
Slide Show
Outline
1
 Exploring High Dimensional Data with
the Rank-by-Feature Framework




Ben Shneiderman  ben@cs.umd.edu

Founding Director (1983-2000), Human-Computer Interaction Lab
Professor, Department of Computer Science
Member, Institutes for Advanced Computer Studies &
Systems Research




University of Maryland
College Park, MD 20742
2
 


3
Scientific Approach (beyond user friendly)
  • Specify users and tasks
  • Predict and measure
    • time to learn
    • speed of performance
    • rate of human errors
    • human retention over time
  • Assess subjective satisfaction
         (Questionnaire for User Interface Satisfaction)
  • Accommodate individual differences
  • Consider social, organizational & cultural context
4
Design Issues
  • Input devices & strategies
    • Keyboards, pointing devices, voice
    • Direct manipulation
    • Menus, forms, commands
  • Output devices & formats
    • Screens, windows, color, sound
    • Text, tables, graphics
    • Instructions, messages, help
  • Collaboration & communities
  • Manuals, tutorials, training
5
U.S. Library of Congress







  • Scholars, Journalists, Citizens
  • Teachers, Students
6
Visible Human Explorer (NLM)
  • Doctors
  • Surgeons


  • Researchers
  • Students
7
NASA Environmental Data
  • Scientists
  • Farmers


  • Land planners
  • Students
8
Bureau of the Census

  • Economists, Policy makers, Journalists
  • Teachers, Students
9
NSF Digital Government Initiative

  • Find what you need
  • Understand what you Find
10
Information Visualization
      • The eye…
      • the window of the soul,
      • is the principal means
      • by which the central sense
      • can most completely and
      • abundantly appreciate
      • the infinite works of nature.


      •       Leonardo da Vinci
      •                 (1452 - 1519)

11
Using Vision to Think
  • Visual bandwidth is enormous
    • Human perceptual skills are remarkable
      • Trend, cluster, gap, outlier...
      • Color, size, shape, proximity...
    • Human image storage is fast and vast
  • Opportunities
    • Spatial layouts & coordination
    • Information visualization
    • Scientific visualization & simulation
    • Telepresence & augmented reality
    • Virtual environments
12
Information Visualization: US Research Centers
  • Xerox PARC
    • 3-D cone trees, perspective wall, spiral calendar
    • table lens, hyperbolic trees, document lens
  • Univ. of Maryland
    • dynamic queries, range sliders, starfields, treemaps, timeboxes, zoombars
    • tight coupling, dynamic pruning, lifelines
  • IBM, Microsoft, AT&T
  • Georgia Tech, MIT Media Lab
  • Univ. of Wisconsin, Minnesota,
      Calif-Berkeley, CMU
  • Pacific Northwest National Labs


13
 
14
 
15
 
16
 
17
 
18
 
19
 
20
Information Visualization: Mantra
  • Overview, zoom & filter, details-on-demand
  • Overview, zoom & filter, details-on-demand
  • Overview, zoom & filter, details-on-demand
  • Overview, zoom & filter, details-on-demand
  • Overview, zoom & filter, details-on-demand
  • Overview, zoom & filter, details-on-demand
  • Overview, zoom & filter, details-on-demand
  • Overview, zoom & filter, details-on-demand
  • Overview, zoom & filter, details-on-demand
  • Overview, zoom & filter, details-on-demand


21
Information Visualization: Data Types
  • 1-D Linear Document Lens, SeeSoft, Info Mural, Value Bars
  • 2-D Map GIS, ArcView, PageMaker, Medical imagery
  • 3-D World CAD, Medical, Molecules, Architecture
  • Multi-Var Parallel Coordinates, Spotfire, XGobi, Visage,
    Influence Explorer, TableLens, DEVise
  • Temporal Perspective Wall, LifeLines, Lifestreams,
    Project Managers, DataSpiral
  • Tree Cone/Cam/Hyperbolic, TreeBrowser, Treemap
  • Network Netmap, netViz, SeeNet, Butterfly, Multi-trees


22
Treemap: view large trees with node values
  • Space filling
  • Space limited
  • Color coding
  • Size coding
  • Requires learning
23
Treemap: Stock market, clustered by industry
24
 
25
Treemap: Gene Ontology
26
 
27
 
28
 
29
 
30
LifeLines: Customer Histories
  • Temporal data visualization
  • Medical patient histories
  • Customer relationship management
  • Legal case histories


31
Temporal Data: TimeSearcher 1.3
  • Time series
    • Stocks
    • Weather
    • Genes
  • User-specified
      patterns
  • Rapid search


32
Temporal Data: TimeSearcher 2.0
  • Long Time series (>10,000 time points)
  • Multiple variables
  • Controlled precision in match
       (Linear, offset, noise, amplitude)


33
Goal: Find Features in Multi-D Data
  • Finding correlations, clusters, outliers, gaps,   à Cognitive difficulties in >3D


  • Therefore utilize low-dimensional projections
    • Perceptual efficiency in 1D and 2D
    • Use Rank-by-Feature Framework to guide discovery
34
Multi-V: Hierarchical Clustering Explorer
35
What’s interesting?
36
What’s interesting?
37
Information Visualization: Tasks
  • Overview Gain an overview of the entire collection
  • Zoom Zoom in on items of interest
  • Filter Filter out uninteresting items
  • Details-on-demand Select an item or group and
    get details when needed
  • Relate View relationships among items
  • History Keep a history of actions to support
    undo, replay, and progressive refinement
  • Extract Allow extraction of sub-collections and
    of the query parameters


38
Goal: Find Features in Multi-D Data
  • Finding correlations, clusters, outliers, gaps,   à Cognitive difficulties in >3D


  • Therefore utilize low-dimensional projections
    • Perceptual efficiency in 1D and 2D
    • Use Rank-by-Feature Framework to guide discovery
39
Do you see anything interesting?
40
What features stand out?
41
Correlation…What else?
42
… and Outliers
43
Demonstration
  • Breakfast Cereals
    • 77 cereals
    • 8 dimensions (or variables) : sugar, potassium, fiber, protein, etc.
  • US counties census data
    • 3138 counties
    • 14 dimensions : population density, poverty level, unemployment, etc.
44
Rank-by-Feature Framework: 1D
45
Rank-by-Feature Framework: 2D
46
 
47
HCE Status
  • In collaboration and sponsored by Eric Hoffman: Children’s National Medical Center
  • Categorical Variables:  4.0 beta, May 2005
  • 60K lines of C++ codes, 58 Classes
  • 2,000+ downloads since April 2002
  • www.cs.umd.edu/hcil/hce
48
GRID Principles
  • Graphics, Ranking & Interaction for Discovery (GRID)
  • Study 1D,
         Study 2D,
              Then find features
  • Ranking guides insight,
         Statistics confirm



49