A Framework for Auditory Data Exploration and Evaluation with Geo-referenced Data Sonification
Haixia Zhao, Catherine Plaisant, Ben Shneiderman
Dept. of Computer Science & Human-Computer Interaction Lab
Univ. of Maryland, College Park, MD 20742
Dept. of Computer and Info Science
Towson Univ., Towson, MD 21252
We first describe an Action-by-Design-Component (ADC) framework to guide auditory interface designs for exploratory data analysis. Applying the framework to the interactive sonification of geo-referenced data, we systematically explored and evaluated its design space. A data exploration tool, iSonic, was implemented for users with vision impairment. In depth case studies with 7 blind users showed that iSonic enabled them to find facts and discover data trends of geo-referenced data, even in unfamiliar geographical contexts. Analysis of user task behaviors and usage patterns confirmed that the framework has captured auditory information seeking actions and components that were naturally adopted by subjects to accomplish geo-referenced data exploration tasks. The results provide evidence for us to extend the framework, and guidance for designers of unified auditory workspaces for general exploratory data analysis.
Vision impairment, Sonification, Auditory user interfaces, Information seeking, Universal usability, Maps
H5.m. Information interfaces and presentation (e.g., HCI)
Information visualization has produced many innovative techniques/interfaces for people with normal vision to use their tremendous visual ability to explore and discover data facts/trends. When information is presented with visual properties, such as color and spatial location, it is not easily viewable by users with vision impairment. In addition, visual data interaction is typically done by using pointing devices, such as computer mice, to directly manipulate the visual objects displayed on the screen. Such interaction is hard without sustained visual feedback. Although a few visualization tools (e.g., ) allow keyboard-only navigation inside some visual graphs, most visualizations are not usable for users with vision impairment.
One example is the current access to government statistical data. Such data is often geography-related, such as population distribution by geographical regions, and often presented as choropleth maps that typically use colors to show the value for each map region. Required by Section 508 (www.section508.gov), all USA federal agencies need to make such data accessible.
A widely used accommodation for users with vision impairment to access digital information is to rely on screen readers to speak the textual content. To make non-textual elements accessible to screen readers, textual equivalences are needed. For static graphs, it is a standard practice to provide textual labels during the system development . For dynamic graphs, tabular data presentations are used instead (e.g., ), or textual summaries can be automatically generated from the data set.
Several problems exist in the current approaches. First, while a concise textual description is helpful, the data interaction that is a critical part of data exploration process is lost. Automatic textual summarization techniques require pre-defined summary templates and do not have enough flexibility to support all user needs in exploratory data analysis. Second, a tabular presentation may be good for basic data browsing but is hard for in-depth data comprehension and analysis. Third, speech can accurately describe information but tends to be long in duration and hard to realize complex information.
Data interaction has been extensively investigated in visualization systems. But little was done regarding whether techniques in visualizations can be translated for use in auditory data exploration without visual aids, and what design implications are involved. Some research used musical sounds to present sonified “overviews” of simple graphs (e.g., ) but support for other task-oriented data interactions is typically missing.
We believe it is important to investigate whether an analogue to standard techniques in visualizations can be established for the auditory mode. In this paper, we first describe an Action-by-Design-Component (ADC) framework for designing auditory interfaces for analytical data exploration. We use a set of Auditory Information Seeking Actions (AISA) to characterize task-oriented data interaction without visual aids, identify Design Components for supporting AISAs, and discuss their general design considerations. This framework has been used to investigate the design space of geo-referenced data sonification. In our earlier work, we reported on some initial sonification designs [27, 28], but the focus was limited to conveying data distribution patterns on maps and the studies were conducted with blind-folded sighted users.
Guided by the ADC framework, we now developed a general exploratory data analysis tool for users with vision impairment, called iSonic. iSonic contains a highly coordinated map view and table view, and supports AISAs within and across the two views (Fig 1). We will describe iSonic features and discuss the design rationale to illustrate the framework.
Afterwards, we report an empirical evaluation of the keyboard-only version of iSonic with 7 users with complete vision impairment (42 hours of in-depth observation and interview data.) which enabled us to examine the effectiveness of iSonic design choices. After extracting common iSonic usage patterns and analyzing user feedback, we discuss the benefits and limitations of the ADC framework.
Sonification, the use of non-speech sound, has been used in various interface designs (e.g., non-visual GUI presentations [3, 13]), as well as data presentations . Using the highly structured nature of musical sounds to convey information works even when no everyday auditory equivalence exists, is less tiring and generally more appropriate than everyday sounds . Research has shown that musical sounds enhance numeric data comprehension (e.g. ) and humans can interpret a quick sonified overview of simple data graphs (e.g., [2, 6, 7]). Some guidelines were extracted (e.g., [2, 23]) and toolkits were developed to help researchers try different data-to-sound attribute mappings (e.g. [15, 24]). While some allow basic user movements in the graph (e.g., ), previous data sonification typically lacks supports for task-oriented data interactions.
In visual data exploration, the information seeking mantra “overview first, zoom, filter, then details-on-demand”  characterizes the general visual information seeking process and was an effective visualization design guideline. Several visualization interfaces (e.g., Sage, Snap-together) were designed that allow users to construct multiple graphical data views and perform data exploration through unified interaction methods within and across the views.
However, such a framework or interface is absent for data exploration in the auditory mode without visual aids. Some recent models (e.g. [9, 18]) tried to describe interactive data sonification, but they emphasize spatial immersion effects in a physical world modeling of the data set hence may not be suitable for abstract data. More importantly, none has characterized task-oriented information seeking needs in the auditory mode, or addressed design considerations for interaction without visual aids, such as “can users with complete vision impairment operate multiple coordinated auditory views”.
In this section, we first describe the Auditory Information Seeking Actions (AISA), contrasting them with visual actions. Then we briefly mention some general design considerations for the Design Components to support AISAs. Those considerations will be reviewed in more details when we discuss the design of iSonic.
Auditory Information Seeking Actions (AISA)
We believe that an exploratory data analysis task in the auditory mode can be accomplished by a series of Auditory Information Seeking Actions (AISA). Many of the actions resemble those in visual information seeking but involve different cognitive processes and present special design challenges due to the highly transient nature of sound.
Figure 1: highly coordinated table and map views of the counties of the state of Maryland. Superimposed on the color coded (choropleth) map is a representation of the recursive 3x3 keyboard exploration grid.
Obtaining a gist is to experience the overall data trend via a short auditory message. It guides further exploration and may allow the detection of anomalies and outliers. A gist is an auditory “overview” but has special design and cognition challenges (see next section) because human auditory perception is much less synoptic than visual perception.
Navigation is “moving around” to examine portions of the data set by listening to a sub-gist of that portion. It needs to follow paths that are natural to the data relations. A visual interface provides a sustained display for users to directly manipulate. In auditory interfaces, users need to construct a mental representation of the display space and virtual navigation structures in order to efficiently move in the data set. Without a persistent display, they can easily get lost. To regain the orientation, users need to situate themselves by requesting their status. While navigation is an exploratory action, searching is a more fixed-goal action that directly lands on the data items by specifying search criteria. Searching breaks the process of mental representation construction, so situating may be needed to regain orientation after the search is completed.
Filtering out unwanted data items according to some query criteria helps to trim a large data set to a manipulable size, and allows users to quickly focus on items of interest. In visualization, dynamic query coupled with rapid (less than 100 milliseconds) display update is the goal . In the auditory mode, different goals need to be established because such a short time is usually not enough to present a gist of changes. Results may need to be given after filtering is done instead of continuous display updates during the filtering process.
By selecting, users specify special interest in particular data items. Those data items are marked and can be revisited later or examined in other contexts. When the number of items is small, users can listen to the details. While speech is often too lengthy for obtaining an overall gist, it can be an effective presentation at the details-on-demand level.
In visualization, linked brushing allows users to manipulate the data in one view while seeing the results in other views. It requires users to construct and maintain multiple mental representations of the data views simultaneously which can be mentally intensive in the auditory mode. Additionally, auditory feedbacks from multiple views need to be clearly distinguished to avoid confusion and overloading. In the auditory mode, brushing can be done in a sequential style by selecting data items in one view, then explicitly switching to another view to examine them in a different data relation.
Each AISA consists of one or multiple interaction loops in which the user uses an input device to issue a command and listen to the auditory feedback. The center of the loop is the data view that governs the navigation structure, allowing the user to build a mental representation of the data space and correctly interpret the auditory feedback.
A data view is a form of presenting the data items and their relations, such as a table, map, or line graph. Research has shown that users with vision impairment were able to learn, interpret, and benefit from non-tabular data presentations. There is also evidence [1, 27] that choosing the right data view for a given task dramatically influences performance.
Navigation structures should reflect the data relations in the data view. In some previous work, users used a mouse or other input devices to move in the 2-D or 3-D data space to activate sounds of the data items within a certain distance from the cursor position. Such a “torch metaphor”  navigation could be useful for some data views, e.g., a scatterplot, but may be inefficient for others, e.g. a node- link diagram.
The choice of input device needs to consider both effectiveness and universal availability. Speech as input can be tempting but lacks the kinesthetic feedback users can get from operating physical input devices. Sensory feedback can help with users’ orientation and mental representation in the interaction. Card et al.  categorized physical input devices by their physical manipulation properties and defined several choice factors such as the cost. We can maximize users’ situation awareness by matching an input device’ properties with those of the navigation structures. However, it is important to keep the system device- independent by providing good alternatives in the absence of the desired device. For success by users with vision impairment, a system should provide interactions optimized for keyboard-only operations.
As a general principle, the auditory feedback should have a low latency. It should be generally short to fit the short- term memory (STM) or allow pauses for midpoint STM processing. Short and responsive feedback increases user engagement and allows users to quickly refine their control activities in the exploration process. It should synchronize with other display modalities to allow perceptual combinations. While humans are good at selective listening, attending to multiple simultaneous sounds is difficult and the amount of accurate information that can be extracted from simultaneous sound streams is limited . The sounds of multiple items often need to be sequenced along the time dimension instead of being played all at once. This imposes special design challenges when no natural mapping exists from the data relation to the time dimension. When the number of data items is large, data aggregation may be necessary to design short feedback.
Guided by the ADC framework, we have systematically explored the design space for geo-referenced statistical data and designed iSonic (Figure 1). Two users without residual vision were involved in the iterative design process. Many
iSonic design decisions were based on their suggestions, as well as results from the evaluation of some design choices with blind-folded sighted subjects. Auditory interfaces are difficult to describe on paper, so we also submitted a supplementary video.
iSonic provides two highly coordinated data views – a region-by-variable table and a choropleth map (Fig 1). The table shows multiple statistical variables simultaneously. Each row corresponds to a geographical region and columns to variables. Table rows can be sorted by pressing ‘o’ while at the desired column, allowing quick location of low or high values. While geographical coordinates and adjacencies could be added as table columns, such information is better displayed on a map. Subjects in our previous study  strongly preferred a map over a table for discovering geographical value trends and performed better on pattern recognition tasks with a map than with a geographical knowledge enhanced table. Other views, such as line graphs or scatterplots, can be helpful for some analytical tasks, but were not used at the current work stage. We wanted to first examine how users could operate multiple coordinated auditory views. Auditory and visual displays are synchronized to allow communication between sighted and blind users.
When choosing input devices, we considered both device availability and how effectively their physical properties match the navigational properties of the two data views.
In iSonic, the table navigation follows the row and column table structure. It is discrete and relative because what matters is the relative row/column order, not the exact spatial location or size of each table cell. On the other hand, the map navigation follows the regions’ positions and adjacencies. Both the relative region layout and the absolute region locations and sizes are useful.
iSonic works with a keyboard alone. A keyboard is available on most computers and blind users are very comfortable using it. We use the arrow keys as natural means for relative movements in the left, right, up, and down directions. The numerical keypad potentially allows relative movements in 8 directions. The keyboard can also be transformed into a low resolution 2-D absolute pointing device, e.g., by mapping the whole keyboard layout to 2-D screen positions. In iSonic, we map the 3x3 layout of the numeric keypad.
iSonic also works with a touchpad. Touchpads are relatively common. A 14” touchpad costs less than $150. A touchpad provides high resolution 2-D absolute pointing and allows continuous movements by fingers. The kinesthetic feedback associated with arm and finger movements, combined with the touchpad frame as the position reference, may help with users’ position awareness on maps. Tactile maps placed on the touchpad can be helpful , but we chose not to rely on them because they need to be changed when the map changes and tactile printers are expensive and rarely available. When resources are available, a generic grid with subtle tactile dots may be used instead as a position and direction aid.
iSonic integrates the use of speech and musical sounds. Values are categorized into 5 ranges, as in many choropleth maps, and mapped to five violin pitches. The same mapping is used in the table view. Various musical instruments are used to indicate when users are outside the map or crossing a region border in the touchpad interface, or crossing a water body to reach a neighboring region in the keyboard interface. Stereo panning effects are used to indicate a region’s azimuth position on the virtual auditory map. It is also used in the table to indicate the column order. Using the plus and minus keys, users can switch among four information levels for each region: region name only, musical sound only, name and sound, name and sound plus reading of the numerical value.
There are many alternatives. Sound duration can present the value but would significantly prolong the feedback and is not appropriate when values of many regions need to be presented. Region locations could be mapped to sound locations using virtual spatial sound synthesized with Head Related Transfer Functions (HRTF) . Spatial sound provides high perceptual resolution in the azimuth plane, but is not satisfactory in the elevation plane, especially when a generic HRTF is used. Using individualized HRTF could improve the elevation perception but its measurement is a long process requiring special equipment and careful calibration. Additionally, HRTF spatial sound is computing intensive. While we have connected iSonic to a virtual spatial sound server and plan to investigate the use of individualized HRTF spatial sound, we currently focus on MIDI stereo sound. We also tried to play a piano pitch after each violin value pitch to indicate the region’s elevation position. Unfortunately, such extra sound was not found to increase performance .
iSonic supports AISAs in both the table and the map views, including sequential brushing between the two views. Each interface function can be activated from a menu system that also gives the hotkey and a brief explanatory message.
In the table view, a gist is produced by automatically playing all values in a column or a row. The sequencing follows the values’ order in the table, from top to bottom, or left to right. In the map view, there is no natural mapping from the geographical relation to the time relation. Research has shown that sequencing that preserves spatial relations helps users to construct a mental image of the 2-D representation. Sequencing is done by spatially sweeping the map horizontally from left to right then vertically, like in a typewriter. When the end of sweep row is reached, a tick mark sound is played and the stereo effect reinforces the change. A bell indicates the end of the sweep of the whole map. The same sweep order holds for sub-gists of parts of the map. For both views, the current information level controls the amount of detail in the gist, thus controlling its duration. For example, when the information level is set to “musical sound only”, a sweep of the entire US state map containing 51 regions lasts for 9 seconds.
Table navigation is done by using arrow keys to move up, down, left, right, and to top, bottom, left and right edges. Users can press ‘u’ to switch between two modes. In the cell mode, the current cell is played. In the row/column mode, a sub-gist of the whole row or column is played. While it is easy to navigate the table, using a keyboard to navigate maps with irregularly shaped and sized regions brings special design challenges. Relative movements between neighboring regions reveal region adjacency but do not convey region shapes, sizes, or absolute locations. Subjects in our previous studies reported that they only had weak location awareness by using this navigation method. Furthermore, it is a challenge to define a good adjacency navigation path for a map that is not a perfect grid. A movement may deviate from the direction users expect. Reversibility of movements can also be a problem in which a reversed keystroke may fail to take the user back to the original region. To tackle some of the problems, we tested cell-by-cell movements on a mosaic version of the map . However, it did not improve users’ location awareness, and was much less preferred because it required more keystrokes to move around.
We expect that navigations based on absolute pointing may help. Kamel and Landay first used a 3x3 grid recursion method via the keypad in a drawing tool . In iSonic, the map is divided into 3x3 ranges (Fig 1) and users use a 3x3 numerical keypad to activate a spatial sweep of the regions in each of the nine map ranges. For example, hitting ‘1’ plays all regions in the lower left of the map, using the same sweep scheme as the overall gist. Users can use Ctrl+[number] to zoom into any of the ranges, within which they can recursively explore using the 3x3 pattern or use arrow keys to move around. Pressing ‘0’ sweeps the current zoomed map range or the whole map.
With the touchpad, users drag their fingers or press spots on the smooth surface touchpad to activate the sound of the region at the finger position. Stereo sounds provide some complementary direction cues. The sound feedback stops when the finger lifts off. The touchpad is calibrated so that the current map range is mapped to its entire surface. Preliminary observations suggest that both the keyboard and touchpad navigations allow users to gain geographical knowledge. A controlled experiment is planned to compare them in details.
Pressing ‘space’ plays the details of the current region. Another way to get the details is to increase the information level to the maximum level in which all details of a region are given by default when users navigate to that region.
When users press ‘I’ (as for ‘Information’), iSonic speaks the current interface operational status. In the table, it includes the row/column counts, headings of the current table position, navigation mode, sorting status, regions selected, and so on. In the map, it includes the name of the variable displayed, navigation position, regions selected, and so on.
In both views, users can press ‘L’ (as for ‘Lock’) to select/unselect the current region and press ‘A’ to switch between “all regions” and “selected regions only”. In ‘selected regions only’, AISAs only activate sounds of the selected regions.
Brushing is done by users switching back and forth between the two views. The views are tightly coupled so that action results in one view are always reflected in the other. For example, users can select a region in the table view and show “selected regions only”. When users switch to explore the map view, only the selected region will be played. By sweeping each of the 9 map ranges, users can roughly but rapidly locate the region on the map.
Filtering was done by slider-based queries. It is complex even for sighted novice users and was not evaluated in the current study. Searching is obviously helpful but was not implemented at the time of the study.
During early design iteration for iSonic, controlled experiments were conducted to compare the effectiveness of design alternatives including the choice of data views, map navigation methods and sound encoding schemes [27, 28]. However, an exploratory data analysis task is a complex process that involves many interface components. During the process, many inherent human subject variations can come into play, such as experience and cognitive styles. In order to obtain insights into users' auditory information seeking behaviors, we chose to conduct case studies. Through a combination of direct observation, thinking aloud protocol, and in-depth interview, case studies can reveal the underlying design strengths and weaknesses, and capture common user behaviors as well as individual differences.
During the summer of 2005, we conducted intensive case studies with 7 local blind users, producing 42 hours of observation and interview data, with an average of 6 hours per user. Using cross-case analysis, we were able to extract common user behaviors and feedback that allowed us to (1) evaluate the effectiveness of iSonic design choices; (2) identify features helpful to each data exploration task category and examine the utility of the ADC framework; (3) identify task road blocks in order to target training and modifications to the interface and the framework.
All seven subjects possessed basic computer skill and relied on screen readers to access computer information. They were all comfortable with maps and tables, had experience with numerical data sets, and used government statistical data at work. All subjects were in the age range of 23 to 55. Three of them were born blind (P2, P3, P4) and the others became legally blind after 15 (P1, P5, P6, P7). None of them had residual vision. Among the born blind, 2 were males, one with a college degree (P2) and the other with a doctorate degree in law (P3). The remaining female (P4) had a masters degree in English. Among the subjects who became blind after 15, one was a male (P7) with a college degree in business and commerce. The other male (P1) was about to finish college in science and technology. For the two females, one had a college degree (P5) and the other had a master degree (P6), both in social science. All subjects volunteered to participate, and were compensated for their time.
The studies used the basic iSonic configuration that is accessible to most computer users: stereo auditory feedback through a pair of speakers and a standard computer keyboard as the input device.
Three data sets were used, one for training, one for testing, and one for post-test free exploration. The data was 2003 census data on general population information, employment of population with a disability, housing value and vacancy, education levels, and household income. The training data set contained 8 variables and was about the 50 US states plus the District of Columbia. The test data set contained 12 variables and was about the 24 counties in the state of Maryland. The post-test data set was about the 44 counties in the state of Idaho (but subjects were not told what it was). Subjects’ geographical knowledge of US states and Maryland counties ranged from excellent to very poor. This allowed us to observe the influence of geographical knowledge on task behaviors and interface usability.
Seven tasks were designed for each data set. Three tasks required value comparison in the geographical context (T5, T6, T7), and four did not need any geographical knowledge (T1, T2, T3, T4). Task orders were different between the training and testing sessions, but were consistent for all subjects. The testing tasks are summarized below.
T1: (Find min/max) Name the bottom 5 counties with the lowest housing unit value.
T2: (Find the value for a specific item given the name) What is the population of Dorchester county?
T3: (Correlation) Which of the two factors is more correlated to “Median household income”: “percent population with bachelor's degree and above”, or “Percent employed population”?
T4: (Close-up item comparison) For what factor(s) does Montgomery county do better than Frederick county: (1) employment rate for population with a disability, (2) percent population with at least college education, (3) household income, and (4) average housing unit value.
T5: (Find items restricted first by value relations then by geographical locations) How many of the bottom 5 counties with the lowest housing unit value are in the western part of the state? Name them?
T6: (Find items restricted first by geographical locations then by value relations) For all three counties that border Frederick, plus Frederick, which one has the highest percent housing unit vacancy?
T7: (Value pattern in geographical context) Comparing “population with a disability” and “percent population with a disability”, which variable generally increases when you go from east to the west and from the north to the south. Subjects also performed a similar set of testing tasks in Microsoft Excel 2002 with their usual screen readers (all happened to use JAWS), and compared the task experience. It was not our intention to compete with Excel. Rather, we considered Excel as the standard tabular data viewer, and used the comparison as a method to solicit user comments on what interface features were helpful to each task. All subjects had some previous experience with Excel, while some were expert users. We did not provide tactile maps when subjects used Excel, because many blind users do not have access to tactile maps (only P5 owns one for Maryland).
Each case study was carried out in two sessions on consecutive days, at the subject’s home or office. In the first session, the subject listened to a self-paced auditory step- by-step tutorial, tried out all iSonic features and practiced seven sample tasks with the training data set. For each training task, a sample solution and the correct answer were given. Subjects could either first try to solve the task on their own, or directly follow the sample solution.
In the beginning of the second session, those subjects with limited Excel experience were given time to practice. After adjusting the speech rates to the subjects’ satisfaction, they performed seven tasks similar to the training tasks in both Excel and iSonic. For each pair of tasks, the subject first did the Excel task then the iSonic task and finally compared the interface experience for that task. The iSonic task was similar to the Excel task but modified. They used the same testing data set but involved different variables, so data learning between tasks can be ignored. We asked the subject to do the Excel task first because we wanted to minimize the effect on the Excel task resulting from the geography learning in the corresponding iSonic task. While there was a chance of strategy transfer from the Excel task to the iSonic task, the Excel task might also have benefited strategically from the iSonic training task. An interview was conducted after subjects performed all the testing tasks in both interfaces. Finally, subjects were asked to freely explore an unknown map and data (the post-test data set) for 5 minutes and report things they found interesting. This was to observe what users would do when they encountered a new map and data.
After spending an average of 1 hour 49 minutes going through all the interface features by following the tutorial, subjects successfully completed 67% of the training tasks without referring to the sample solution or any other help. After the training, subjects were able to retain their newly acquired knowledge and successfully completed 90% tasks on the next day in a different context without any help. For 74% of the tasks that subjects used different strategies than the given solution in the training, they adopted the sample strategies in the test session.
For tasks that did not require geographical knowledge, the average testing success rates were similar for iSonic and Excel, both at 86%, although subjects ranked iSonic easier than Excel, at 7.9 vs. 7.0 based on a 10-point scale (a higher number being easier). The explicitly reported reasons, in decreasing order of frequency, included: (1) the pitch was helpful in getting the value pattern and comparing values; (2) it was easier to sort in iSonic because sorting was done by pressing one key in the desired column to toggle the sorting status, instead of handling multiple widgets in the dialog window as in Excel; (3) it was helpful to isolate a few regions from other interfering information by selecting. (4) It was flexible to adjust the information level during the task; (5) there was more than one way to get the same information.
For geography-related tasks, the average testing success rate was 95% in iSonic. In Excel, the two subjects with excellent knowledge about Maryland geography (P3 and P5) achieved a success rate of 67%. Other subjects either skipped some tasks due to the lack of geographical knowledge or tried to make an educated guess but gave incorrect answers, resulting in an average success rate of 20%. On a 10-point scale, subjects gave iSonic an average of 8 on easiness for all the 3 geography-related tasks, and gave Excel an average of 5.8 for the tasks they performed. The explicitly reported reasons included: (1) the map was easy to use and very helpful (mentioned by all subjects in all 3 tasks); (2) it was great to be able to switch between the map and table, select things in one view then look at them in another; (3) the pitch was helpful in getting the value pattern and comparing values; (4) there was more than one way to get the same information.
Overall it was easy for the subjects to choose an efficient combination of interface features to do the tasks (average 7.4 on a 10-point scale with 10 being easy). Correlation tasks, however, turned out to be challenging. Most subjects understood the concept but did not know how to do it efficiently in iSonic until they viewed the sample solution. Only P7 easily came up with the sample solution. He sorted the main variable ascending in the table view, then in the row/column navigation mode, swept other columns with “pitch only” to check which one has more consistently increasing pitch pattern. Other subjects mostly went across all requested columns to check if the pitches or numbers were consistently small or large for each region. Some also sorted one or all columns. One subject (P6) said she would have the data plotted in a scatterplot or multi-line graph and had her human reader look for the highest correlation. All subjects, except P4, were able to learn from the sample solution and successfully applied it in the test session. The geographical value pattern tasks were easy for most subjects except for P7 who guessed the answer correctly but was very uncomfortable. Instead of “visualizing the map”, he emphasized accuracy by trying to calculate and compare the average value for each of the 9 map ranges. This was consistent with our earlier finding that task strategies affect geographical value pattern recognition .
Setting aside the above strategic difficulties, incorrect answers in iSonic were caused by two common errors: (1) subjects sorted the wrong variable (a third of all errors). This might be due to the high similarity of variable names, and that the interface did not confirm the variable being sorted when the sorting key was pressed. (2) Some subjects skipped the 1st region in the table (a third of all errors) because pressing the down arrow key after hearing “already top edge” took the subjects to the 2nd row instead of the 1st.
Task Strategies and iSonic Usage Patterns
Map vs. table: All subjects used the table for most value comparisons, and used the map when they needed to compare items in the geographical context (e.g., T7) or to acquire/confirm region locations. The table was often used to change the variable to display on the map, but more importantly, the sorting feature was used to find minimum or maximum values, named regions, and values of specific regions. The table was also used to compare the values of multiple regions, and to check correlations. The map was used sometimes by a few subjects to find regions.
Brush: All subjects became proficient in switching between the table and the map views according to the changing needs for data relations during the task. The tight coordination between the map and the table views was considered the most significant strength of iSonic by all subjects. “It is cool to select things in one view and look at them in the other”. “The biggest advantage of this tool is the ability to quickly change between the table view and the map view”. To find items restricted first by value relations then by geographical locations (e.g., T5), most subjects first used the table to find items meeting the value restriction, selected to isolate them, then switched to the map to check their geographical locations. Some subjects skipped the use of the map and used their pre-test geographical knowledge to judge if the selected items satisfied the geographical restriction. A few subjects first used the map to find all items that met the geographical restriction, remembered them, then sorted the table to find items satisfying the value restriction, and reported the intersection of the two sets. The latter two strategies relied on subjects’ memory of the intermediate results and caused some errors. Subjects said they would have used selecting to mark items during view switching if the number of items were larger. To find items restricted first by geographical locations then by value relations (e.g., T6), most subjects first found and selected items meeting the geographical restriction on the map, then either used the pitch and value in speech to check if they meet the value restriction, or switched to table and used sorting to compare their values.
Use of Pitch: Using pitches to present numeric values was considered intuitive, entertaining and very helpful to data comprehension. It took some subjects a few tasks to get used to this idea but they became increasingly inclined to using pitches for both trend analysis and close-up value comparison. “Pitch makes it a lot easier and quicker to compare values”. “Tones are very helpful to find patterns in a series of values. In some extent it helps me to do things I used to do with (visual) graphs”. “All the other applications are boring. iSonic has its personality. It has the map that I really enjoyed. The tones are entertaining and fun”. To use pitches, most subjects either changed to the pitch-only information level (especially for trend analysis), or used the level with both pitches and numbers in speech, but quickly navigated through items, only waited for a number to be spoken for confirmation purpose (in value comparison). Some subjects were able to tell the absolute value category using only one pitch while some needed to use other pitches as references. All subjects, except P4, were comfortable with the simultaneous pitch and speech presentations. P4 reported that pitches and speech interfered with each other, and requested to tone down the pitch volume. However, she declined the suggestion to completely remove pitches, because she used pitches exclusively in trend discovery.
Information Level and Details-on-demand: All subjects frequently adjusted the information level during a task. Subjects mostly used name plus pitch or name plus pitch together with the value in speech. When the information level with value in speech is used, many subjects cut it short by navigating to another item before the value speech finished, and only waited for it to finish when they wanted to confirm the value. In automatic map sweep searching for a region, spoken values were typically removed. To sweep the map or a table column for value patterns, e.g., for geographical patterns or correlations, most subjects used the pitch-only level because it let them skim through the data the fastest. A few subjects chose to keep the names on to keep track of the meaning of each sound while still being able to go through the data at a decent pace. To find a named region on the map, P7 often used the “name only” level. Details-on-demand was mostly done by increasing the information level to the maximum level instead of pressing the ‘space’ key.
Gist: Table sweep was very intuitive. To check value patterns, e.g., for the correlation tasks, some subjects used an automatic pitch-only sweep of each column by navigating the table in the row/column mode.
Automatic sweep of the whole map was typically done with pitch only or with the region name spoken along with the pitch. P3 said “automatic sweep will be my first step to get acquainted with a new map to get the big picture” During the post-test free exploration of an unknown map, P3 swept the map several times in pitch-only to obtain a rough idea of where the highly populated regions were before starting to explore. P2 swept the unknown map once and accurately reported that most highly populated regions were in the west, by judging from the pitches and the sound panning positions. P2 was the only one that consciously used stereo panning cues in tasks. Most subjects said it was not difficult to understand how the sweep was done, but they need to know what the map looked like to make sense of it. Once they broke the whole map into nine smaller ranges and swept each range using the keypad, it made more sense. All subjects, except P7, were able to easily tell if a variable has a given geographical distribution pattern, by sweeping the nine ranges in pitch only. Unexpectedly, map sweep was also frequently used by all subjects to locate a region on the map. This was typically done with the region names spoken, and often combined with the arrow key navigation and 9- range sweep. It was also used to check what regions have been selected.
Navigate: Navigating the table was easy. All subjects mostly used cell mode because “it allows finer control of what to play”. The row and column mode was used by some subjects to sweep a column for the correlation and close-up comparison tasks.
All subjects reported that overall it was very easy to navigate the map. The 3x3 exploration was frequently used by all subjects except P2 who mainly used arrow keys to navigate and used sound panning to judge region locations. All subjects understood the mapping between map locations and the 3x3 layout of the keypad. They were able to use the 3x3 exploration to find the map location of a specific region, and to find what regions are in each map part. While subjects mostly looked for a region by navigating the table (typically by first ordering it alphabetically), sometimes they used the map. They often first used the 9 numeric keys to find out which range contains that region, then used arrow keys to move to that region. The 3x3 exploration also allowed some subjects to acquire knowledge about the overall map shape and the region layouts. During the study, P3, P5, P6, and P7 reported the overall map shape and region density distribution. P7 also used two-level recursive 3x3 exploration to find the county layout in the central and eastern parts of northern Maryland.
Subjects seemed to be able to zoom into/out of the 9 map ranges and stay aware of their zooming positions. Many subjects played with zooming extensively in training but did not use it in the test. Their explanation was there was no need from the tasks and the Maryland map only has 24 counties. If there was a need to focus on one area on a much bigger map, zooming could be helpful. Many subjects expressed the concern that zooming may become mentally intensive as the map scale grows. One observed problem with 3x3 zooming is that some adjacent regions are assigned to different ranges and thus not reachable from each other after zooming. The problem can be remedied by allowing zooming centered on a region of interest.
Arrow key navigation was essential to find a region’s geographical neighbors and was used by all subjects in adjacency tasks. It was also used often to explore regions in a small map range, typically identified earlier with the 3x3 exploration. While P2 mostly used arrow keys to navigate the map, most subjects were inclined to use the 3x3 exploration because it gave the absolute region locations.
“Arrow key navigation takes me everywhere on the map. It is not efficient especially when I am not familiar with the map”. “The nine keys tell me what are in the northwest and so on. It narrows me down to a specific range”.
To address the irreversibility problem in arrow key navigation, iSonic supports previous/next navigation to let users go through every region once and only once, following their order in the map sweep. Although a few subjects mentioned the irreversibility problem, they thought it was a natural fact about maps and no one seemed to be bothered. No one used previous/next navigation after the training because “there is no need for it” or “it does not make sense on maps”.
Situate: Subjects used situating to get the table sorting status, the current table position, the current map and map position, and the number of selected regions. Many subjects reset the interface before each task and did not use situating much since they remembered what they had done. However, all subjects considered this function essential so they do not need to redo the work “after a bathroom visit”.
Select, Search and Filter: All subjects were able to use selection and switch their focus between “all regions” and “selected regions only”, even across the two data views. Some subjects requested the ability to select variables besides selecting regions. Subjects also requested first-letter searching of regions. Filtering was not requested since the data sets are small.
It is clear that iSonic enabled subjects to find facts and discover data trends within geo-referenced data, even in unfamiliar geographical contexts. The design choices in iSonic were overall easy to use and allowed subjects to effectively explore data in the map and table views without special interaction devices.
The studies do have limitations. The subjects might have made favorable comments because they wanted to please the experimenters. An average of 6 hours’ use was not enough to go beyond the novice usage stage. Investigation of the tool’s long-term use in real work circumstances will provide further understanding. We only tested users without any residual vision. Further studies with partially sighted users may reveal different usage patterns and visual- auditory interactions that may modify our results and framework.
However, the studies provided clear evidence that the Action-by-Design-Component framework captured the actions that were naturally employed by blind users during data exploration. The framework, which is analogous to many interactions in visualizations, works for auditory interfaces when applied properly. The key conclusions and design implications were:
(1) All subjects were capable of choosing and switching between highly coordinated table and map auditory views, in order to complete the tasks. We believe users could also deal with more and different views such as graphs.
(2) Using musical pitches to present numerical data makes it easier to perceive data trends in data series and enhances close-up value pair comparison. The integrated use of musical sounds and speech allows users to listen to overall trends and to get details.
(3) A single auditory feedback detail level is not sufficient. Our 4 levels were all used productively. While it is hard to understand a data element without the appropriate context, too much detail slows down the sequential presentation and can be overwhelming for gaining the big picture. Designers need to carefully select multiple information levels and let users adjust it to fit their tasks.
(4) A rapid auditory gist is valuable in conveying overall data trends and guiding exploration. For maps, perceiving spatial relation from a sequence of sounds can be difficult, but sweeping the map as separate smaller ranges in a consistent order was effective.
(5) Navigation structures should reflect the data relation presented by the data view. In the map, designers would do well to provide 3x3 exploration using the numeric keypad and adjacency navigation using arrow keys. Users benefited from absolute localization and relative movements. Even a coarse map partitioning mapped to the physical spatial layout of a numeric keypad can provide valuable geographical knowledge. Stereo sound panning can be helpful but seems to be secondary in giving location cues for most subjects.
(6) Selecting was valuable for all subjects in focused data examination. They were able to operate selection within and across data views and accomplish brushing.
CONCLUSION AND FUTURE WORK
We described an Action-by-Design-Component framework for designing auditory interfaces for analytical data exploration. We applied the framework to geo-referenced data and evaluated the resulting interface with 7 blind users. The effectiveness of the interface suggests that the framework may also be useful for other designers. In the future, we hope to extend this framework by applying it to the interactive sonification of graphs. By defining a programming paradigm and incorporating more data views into iSonic, we intend to develop a unified auditory workspace for general analytical data exploration.
We thank our subjects for participating and thank Alex Silver for helping to run the studies. This work is supported by the National Science Foundation under Grant No. EIA 0129978 and ITR/AITS 0205271.
1. Bennett, D.J., Edwards, A.D.N., Exploration of non- seen diagrams, Proc. Intl. Conf. on Auditory Display (ICAD), (1998).
2. Brown, L., Brewster, S.A., Ramloll, R., Burton, M., Riedel, B., Design guidelines for audio presentation of graphs and tables, Proc. ICAD (2003).
3. Brewster, S.A., Using nonspeech sounds to provide navigation cues, ACM Trans. on CHI, 5, 3 (1998), 224-259.
4. Card, S., Mackinlay, J.D., Robertson, G., The design space of input devices, Proc. ACM CHI (1990).
5. Donker, H., Klante, P., Gorny, P., The design of auditory user interfaces for blind users, Proc. NordiCHI (2002).
6. Flowers, J.H., Buhman, D.C., Turnage, K.D. Cross- modal equivalence of visual and auditory scatterplots for exploring bivariate data samples. Human Factors, 39, 3 (1997), 340-350.
7. Franklin, K.M., Roberts, J.C., Pie chart sonification, Proc. IEEE Information Visualization (2003), 4-9.
8. Handel, S. Listening: An Introduction to the Perception of Auditory Events. MIT Press (1989).
9. Hunt, A., Hermann, T., Pauletto, S., Interacting with sonification systems: closing the loop, Proc. IEEE Information Visualization (2004).
10. Kamel, H.M., Landay, J.A., The integrated communication 2 draw (IC2D): a drawing program for the visually impaired. Proc. ACM CHI (1999).
11. Kramer, G., Walker, B., Bonebright, T., Cook, P., Flowers, J., Miner, N., Neuhoff, J., Sonification Report: Status of the Field and Research Agenda (1997) http://www.icad.org/websiteV2.0/References/nsf.html
12. Landau, S., and Gourgey, K., Development of a talking tactile tablet, Information Technology and Disabilities, VII(2), 2001.
13. Mynatt, E.D., Weber, G., Nonvisual presentation of graphical user interfaces: Contrasting two approaches, Proc. ACM CHI (1994).
14. North, C., Shneiderman, B., Snap-Together Visualization: A user interface for coordinating visualizations via relational schemata, Proc. Advanced Visual Interfaces (2000).
15. Pauletto, S., Hunt, A., A toolkit for interactive sonification, Proc. ICAD (2004).
16. Ramloll, R., Yu, W., Riedel, B., Brewster, S.A., Using non-speech sounds to improve access to 2D tabular numerical information for visually impaired users. Proc. BCS IHM-HCI (2001).
17. Roth, S. F., Chuah, M. C., Kerpedjiev, S., Kolojejchick, J. A., Lucas, P., Towards an information visualization workspace: combining multiple means of expression, Human-Computer Interaction, 12, 1-2 (1997), 131-185.
18. Saue, S., A model for interaction in exploratory sonification displays, Proc. ICAD (2000).
19. Shneiderman, B., The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations, Proc. IEEE Symposium on Visual Language (1996).
20. Shneiderman, B., Plaisant, C., Designing the User Interface: Strategies for Effective Human-Computer Interaction, 4th Edition, Addison Wesley (2005).
21. Sikora, C.A., Roberts, L.A., Murray, L., Musical vs. Real world feedback signals, Proc. ACM CHI (1995).
22. W3C, Web Content Accessibility Guidelines, http://www.w3.org/WAI/intro/wcag.php
23. Walker, B.N.. Lane, D.M., Psychophysical scaling of sonification mappings: a comparison of visually impaired and sighted listeners, Proc. ICAD (2001).
24. Walker, B.N., Cothran, J.T., Sonification sandbox: a graphical toolkit for auditory graphs, Proc. ICAD (2003).
25. Wenzel, E.M., Arruda, M., Kistler, D.J., Wightman, F.L., Localization using nonindividualized head-related transfer functions, Journal of the Acoustical Society of America, 94, 1 (1993),111-123.
26. Willuhn, D., Schulz, C., Knoth-Weber, L., Feger, S., Saillet, Y., Developing accessible software for data visualization, IBM Systems Journal, 42, 4 (2003). http://www.research.ibm.com/journal/sj/424/willuhn.html
27. Zhao, H., Plaisant, C., Shneiderman, B., and Duraiswami, R., Sonification of geo-referenced data for auditory information seeking: design principle and pilot study, Proc. ICAD 2004.
28. Zhao, H., Smith, B.K., Norman, K., Plaisant, C., and Shneiderman, B., Listening to maps: user evaluation of multiple designs of interactive geo-referenced data sonification, IEEE Multimedia Special Issue on Interactive Sonification , Apr-Jun 2005.