Dynamic Neighborhood Labeling for Data Visualization
Jean-Daniel Fekete
Ecole des Mines de Nantes
4, rue Alfred Kastler, La Chantrerie
44307 Nantes, France
Jean-Daniel.Fekete@emn.fr
Catherine Plaisant
Human-Computer Interaction Laboratory
University of Maryland
College Park, MD 20782, USA
Plaisant@cs.umd.edu
Project website: http://www.cs.umd.edu/hcil/excentric/
ABSTRACT
The widespread use of information visualization is hampered
by the lack of effective labeling techniques. A taxonomy of labeling methods
is proposed. We then describe "excentric labeling", a new dynamic technique
to label a neighborhood of objects located around the cursor. This technique
does not intrude into the existing interaction, it is not computationally
intensive, and was easily applied to several visualization applications.
A pilot study indicates a strong speed benefit for tasks that involve the
rapid exploration of large numbers of objects.
KEYWORDS
Visualization, Label, Dynamic labeling
INTRODUCTION
A major limiting factor to the widespread use of information
visualization is the difficulty of labeling information abundant displays.
Information visualization uses the powerful human visual abilities to extract
meaning from graphical information [Card et al, 1998, Cleveland, 1993].
Color, size, shape position or orientation are mapped to data attributes.
This visualization helps users find trends, and spot exceptions or relationships
between elements on the display. Experimental studies have been able to
show significant task completion time reduction and recall rate improvements
when using graphical displays instead of tabular text displays (e.g., [Lindwarm-Alonso
et al., 1998.]) However textual information in the form of labels remains
critical in identifying elements of the display. Unfortunately, information
visualization systems often lack adequate labeling strategies. Often labels
are entirely missing and users have to peck at graphical objects one at
a time. Sometimes labels overlap each other to the point of obscuring the
data and being barely usable; or they are spread out in such a way that
the relation between objects and labels becomes ambiguous. The problem
becomes acute when the data density increases and the labels are very long.
To address this problem we propose "excentric labeling"
as a new dynamic technique to label a neighborhood of objects (Figure 1
to 3). Because it does not interfere with normal interaction and has a
low computational overhead, it can easily be applied to a variety of visualization
applications.
The labeling problem is not new. It has been extensively studied for cartographic purposes [Christensen et al., 1998] where printing or report generation is the main purpose of the application. But very few solutions have been proposed to automate the labeling process of interactive applications. In this paper we propose a taxonomy of labeling methods, then describe our excentric labeling technique in detail, discuss its benefits and limitations, and illustrate how it can benefit a variety of applications.
The labeling challenge can be stated as follows: given a set of graphical objects, find a layout to position all names so that each name (label) is:
Labeling techniques can be classified into two categories:
static and dynamic. The goal of static labeling is to visually associate
labels with a maximum (hopefully all) graphic objects in the best possible
manner. But good static technique are usually associated with delays not
suitable for interactive exploration. Dynamic labeling began with interactive
computer graphics and visualization. Two attributes account for the "dynamic"
adjective: the set of objects to be labeled can change dynamically, and
the number and layout of displayed labels can also change in real time,
according to user actions.
Static Techniques
Static techniques have been used for a long time in cartography. Christensen et al. (to appear) wrote a recent summary of label placement algorithms. Cartography also needs to deal with path labeling and zone labeling, which is less widespread in visualization. We do not address those two issues in this article. But the same algorithms can be used for both cartography and general visualization. Since static techniques have to find "the" best labeling possible, the set of objects has to be carefully chosen to avoid a too high density in objects or labels. In cartography, this is achieved by aggregating some information and forgetting (sampling) others (this process is called "generalization"). This technique could be nicknamed the "label-at-all-cost" technique since one of the constraints is to label all objects of the display.
For data visualization, a similar process of aggregation can be applied to achieve a reasonable result with static techniques (e.g., aggregation is used in the semantic zooming of Pad++ [Bederson, 1994] or LifeLines [Plaisant et al., 1998]), but the logic of aggregation and sampling is mainly application dependent. Label sampling has been used occasionally (e.g., Chalmers et al., 1996).
The most common techniques remain the "No Label" technique, and the "Rapid Label-all" technique which leads to multiple overlaps and data occlusion [e.g., in the hyperbolic browser [Lamping et al, 1995]). Also common is the "Label-What-You-Can" technique in which only labels that fit are displayed; other labels that would overlap or occlude data objects are not shown (e.g., in LifeLines).
Some visualizations avoid the problem completely by making
the labels the primary objects. For example, WebTOC [Nation et Al, 1997]
uses a textual table of contents and places color and size coded bars next
to each label.
Dynamic techniques
Dynamic labeling techniques are more varied (see Table 1). The classic infotip or "cursor sensitive balloon label" consists at showing the label of an objet right next to the object when the cursor passes over it. The label can also be shown on a fixed side window, which is appropriate when labels are very long and structured.
In the "All or Nothing" technique, labels appear when the number of objects on the screen falls below a fixed limit (e.g., 25 for the dynamic query and starfield display of the film finder [Ahlberg et al., 94]). This is acceptable when the data can be easily and meaningfully filtered to such a small subset, which is not always the case. Another strategy is to require zooming until enough space is available to reveal the labels, which requires extensive navigation to see all labels. This technique can be combined elegantly with the static aggregation technique to progressively reveal more and more details - and refined labels - as the zoom ratio increases.
The overview and detail view combination is an alternative zooming solution [Plaisant et al., 1994]. The detail view can also be deformed to spread objects until all labels fit (i.e., in the way of a labeling magic lens). Those last two techniques require either a tool selection or dedicated screen space.
Chalmers et al., proposed dynamic sampling where only
one to three labels are displayed, depending on the user's activity. Cleveland
describes temporal brushing: labels appear as the cursor passes over the
objects (similarly to the infotip), but those labels remain on the screen
while new labels are displayed, possibly overlapping older ones.
Type |
|
|
STATIC | No label | No labels! |
Label-only-when-you-can (i.e. after filtering objects) | Need effective filters. Labels are rarely visible. | |
Rapid Label-All | High risk of overlaps or ambiguous linking to objects | |
Optimized Label-All | Often slow - may not be possible | |
Optimized Label-All with aggre-
gation and sampling |
Effective but application dependant- may not be possible | |
DYNAMIC | ||
One at a time
|
Cursor sensitive balloon label | Requires series of precise selection to explore space (slow), cannot reach overlapped objects. |
Cursor Sensitive label in side-window | Same as above. Constant eye movement can be a problem, but avoids occlusion of other objects. | |
Temporal brushing (Cleveland) | More labels visible at a time, but overlapping problem. | |
Global display change
|
Zoom until labels appear | May require extensive navigation to see many labels (can be effectively combined with semantic zooming, e.g., Pad++) |
Filter until labels appear | May require several filtering to see labels (can be effectively combined with Zooming, e.g., starfields) | |
Focus + context
|
Overview and detail view without deformation | Effective when objects are separated enough in the detail view to allow labels to fit (not guaranteed.) |
Overview and detail with deformation/ transformation (i.e.fisheye or magic lenses) | Deformation might allow enough room for labels to fit. (not guaranteed). May require tool or mode to be selected. | |
Global deformation of space (e.g., Hyperbolic Browser) | Requires intensive navigation and dexterity to rapidly deform the space and reveal all labels (e.g., by fanning the space). | |
Sampling
|
Dynamic sampling (Chalmers et al.) | Few labels are visible. |
NEW | Excentric labeling | Fast, no tool or special skill needed. Spread overlapping labels, and align them for ease of reading. |
Table 1: Taxonomy of labeling techniques
EXCENTRIC LABELING
Excentric labeling is a dynamic technique of neighborhood
labeling for data visualization (Figure 1 to 3). When the cursor stays
more than one second over an area where objects are available, all labels
in the neighborhood of the cursor are shown without overlap, and aligned
to facilitate rapid reading. A circle centered on the position of the cursor
defines the neighborhood or focus region. A line connects each label to
the corresponding object. The style of the lines matches the object attributes
(e.g., color). The text of the label always appears in black on a white
background for better readability. Once the excentric labels are displayed,
users can move the cursor around the window and the excentric labels are
updated dynamically. Excentric labeling stops either when an interaction
is started (e.g., a mouse click) or the user moves the cursor quickly to
leave the focus region. This labeling technique does not require the use
of special interface tool. Labels are readable (non overlapping and aligned),
they are non-ambiguously related to their graphical objects and they don't
hide any information inside the user's focus region.
Algorithm and Variations
To compute the layout of labels, we experimented with several variants of the following algorithm:
Non-Crossing Lines Labeling – Radial Labeling
The non-crossing lines labeling layout (Figure 4) does not maintain the vertical or horizontal ordering of labels, but avoids line crossings. This technique facilitates the task of tracing the label back to the corresponding object. It can be used in cartography-like applications where ordering is unimportant. The initial position on the circle (step 2 of previous section) is computed with a radial projecting onto the circumference of the focus circle. It is always possible to join the object to the circumference without crossing another radial spoke (but two radii - or spokes- may overlap). Then, we order spokes in counter-clockwise order starting at the top (step 3). The left set is filled with labels from the top to the bottom and the right set is filled with the rest.
Labels are left justified and regularly spaced vertically. We maintain a constant margin between the left and right label blocks and the focus circle to draw the connecting lines.
For the left part, three lines are used to connect objects to their label: from the object to the position on the circumference, then to the left margin, and to the right side of the label box. This third segment is kept as small as possible for compactness, therefore barely visible in Figure 4, except for the bottom-left label. For the right labels, only two lines are used from the object to the initial position to the left of the label. The margins contain the lines between the circumference and the labels.
When the vertical ordering of graphic objects has an important
meaning we use a variant algorithm that does not avoid line crossing but
maintains the relative vertical order of labels. This will be appropriate
for most data visualization, for example, in the starfield application
FilmFinder [Ahlberg, 1994], films can be sorted by attributes like popularity
or length, therefore labels should probably be ordered by the same attribute.
Instead of computing the initial position in step 2 by projecting the labels
radially to the circumference, we start at the actual Y position of the
object. The rest of the algorithm is exactly the same. Figure 1 and 2 shows
examples using the vertically coherent algorithm, which is probably the
best default algorithm. Crossing can occur but we found that moving slightly
the cursor position animates the label connecting lines and helps find
the correspondence between objets and their labels.
Horizontally Coherent Labeling
When the horizontal ordering of graphic objects has a
special meaning, we further modify the algorithm in step 5. Instead of
left justifying the labels, we move them horizontally, so that they follow
the same ordering as the graphic objects in Figure 5.
Dealing with window boundaries
When the focus region is near the window boundaries, chances are that the label positions computed by the previous algorithms will fall outside of the window and the labels appear truncated (e.g., the first characters of the left stack labels would not be visible when the cursor is on the left side of the window).
Excentric labeling fills a gap in information visualization
techniques by allowing the exploration of hundreds of labels in dense visualization
screens in a matter of seconds. Many labels can be shown at once (optimally
about 20 at a time.) They are quite readable and can be ordered in a meaningful
way. Links between objects and labels remain apparent. The technique is
simple and computationally inexpensive enough to allow for smooth exploration
while labels are continuously updated. Of course these algorithms don't
solve all the problems that may occur when labeling. Three important challenges
remain, and we propose partial solutions for them:
Dealing with too many labels
We estimate that about 20 excentric labels can reasonably be displayed at a time. When more objects fall in the focus region, the screen becomes filled by labels and there is often no way to avoid that some labels fall outside the window. We implemented two "fallback" strategies: (1) showing the number of items in the focus region, and (2) showing a sample of those labels in addition to the number of objects (see Figure 3). The sample could be chosen randomly or by using the closest objects to the focus point. Although not entirely satisfactory, this method is a major improvement over the usual method of showing no labels at all, or a pile of overlapping labels.
The dynamic update of this object counts allows a rapid
exploration of the data density on the screen. Of course (this is
data visualization after all) the number of objets could also been shown
graphically by changing the font or box size to reflect its level of magnitude.
Dealing with long labels
Labels can be so long that they just don't fit on either
side of the focus point. There is no generic way to deal with this problem
but truncation is likely to be the most useful method. Depending on the
application, labels may be truncated on the right, or on the left (e.g.,
when the labels are web addresses), or they may be truncated following
special algorithms. Some applications may provide a long and a short label
to use as a substitute when needed (e.g., Acronyms). Using smaller fonts
for long labels might help in some cases. If long lines occur infrequently,
breaking long labels in multiple lines is also possible.
Limiting discontinuities
One of the drawbacks of the dymamic aspect of excentric
labeling is that the placement of an object’s label will vary while the
cursor is moving around the object. This is needed to allow new labels
to be added when the focus area covers more objects, but can lead to discontinuities
in the placement of labels. For example when the cursor moves from the
left side of an object to its right side, the label will move from the
right to the left stack. This effect is actually useful to confirm the
exact position of a label but might be found confusing by first time users.
We found that discontinuties were more common with the non-crossing algorithm
than the Y coherent algorithm which we favor despite the risk of lines
crossing.
POSSIBLE IMPROVEMENTS
Depending on the application, several improvements might be considered :
We have implemented excentric labels within three different
applications: a java version of starfield display/dynamic query visualization
[Ahlberg et al, 1994] (Figure 7), a Java implementation of LifeLines (Figure
9), and a map applet to be used for searching people in a building. The
addition of excentric labeling to the first two applications was done in
a few hours. The last program was built from scratch as an evaluation tool.
We are in the process of comparing excentric labeling with a purely zoomable interface. The map of a building is displayed with workers names assigned randomly to offices. Subjects have to figure out if a given person is assigned to a room close to one of three red dots shown on the map (symbolizing three area of interest that a visualization would have revealed, e.g., areas close to both vending machines and printers). Each subject has to repeat the task ten times with new office assignments and red dot locations. The questions asked are of the form: "is <the person> in the neighborhood of one of the red dots?" Subjects reply by selecting "yes" or "no". The time to perform each task and the number of errors are recorded. Subjects using excentric labels (Figure 9) have to move the cursor over and around each highlighted point and read the labels. Subjects using the zooming interface have to move the cursor over each highlighted point, left click to zoom until they can read the labels (one or two zoom operation), right click to zoom back out or pan to the next point.
Our initial test of the experiment highlighted how speed
and smoothness of zooming is crucial for zooming interfaces. In our test
application a zoom or pan takes about 3/4 seconds to redraw. This is representative
of many zooming interfaces, but in order to avoid any bias in favor of
the excentric labeling we chose to ignore the redisplay time (the clock
is stopped during redraws in the zooming interface version).
CONCLUSION
Despite the numerous techniques found in visualization
systems to label the numerous graphical objects of the display, labeling
remains a challenging problem for information visualization. We believe
that excentric labeling provides a novel way for users to rapidly explore
objects descriptions once patterns have been found in the display and effectively
extract meaning from information visualization. Early
evaluation results are promising, and we have demonstrated that the technique
can easily be combined with a variety of information visualization applications.
ACKNOWLEDGEMENT
This work was mainly conducted while Jean-Daniel Fekete
visited Maryland during the summer 1998. We thank all members of the HCIL
lab for their constructive feedback, especially Julia Li for her initial
research of the labeling problem, and Ben Shneiderman for suggesting the
main-axis projection. This work was supported in part by IBM through the
Shared University Research (SUR) program and by NASA (NAG 52895).
REFERENCES