Interactive Information Visualization of a Million Items

One million items treemap

One Million Items Treemap

40,000 items population of the USA scatter plot (Census 2000 data)

40,000 items population of the USA scatter plot (Census 2000 data)

12,134 items Linux Kernel 2.5.33 source

12,134 items Linux Kernel 2.5.33 source

Scroll to:
Project description
Related Sites


New: the code and executable are now available, see Downloading.

Project description:

To what extent can information visualization scale?

This project designs new algorithms and interaction techniques to adapt popular information visualization representations, such as treemaps and scatter plot diagrams, for displaying one million of items or more in a effective way, without resorting to any aggregation technique.

To achieve this goal, we are developing special techniques to:

We are relying on hardware graphics acceleration to allow for smooth transitions between views, interpolation between layouts and synthesis of graphics attributes such as "overlaps" (among other things).



Fekete, J.-D., Plaisant, C.
Interactive Information Visualization of a Million Items, Proceedings of IEEE Symposium on Information Visualization 2002 (InfoVis 2002), Boston, USA, Octobre 2002.

Interactive Information Visualization to the Million , Jan 2002 HCIL Technical Report.

Related sites

See the other Visualization Projects at HCIL.

Visualizing One Million Items

Visualizing one million of items on a 1600x1200 screen is a challenge in term of visualization, graphics, perception and interaction. We have designed new techniques to achieve it for treemaps and scatter plots.

Appropriate Visual Attibutes

Shaded rectangles close-up Shaded rectangles

Visualization systems typically draw items with a one-pixel border, spending two lines and two columns of pixels and sending the geometry twice. We use slightly shaded quadrilaterals so that they remain distinguishable when tiled or stacked.

Synthetic Overlap Attribute

Overlap Count Overlap Count

Whereas treemaps are space-filling visualization techniques where areas never overlap, scatter plots cannot avoid overlap. Even with hundreds of items, the distribution tends to be sparse with areas of high density that are hard to see. Transparency is useful when up to five items overlap, but with one million items, hundreds of overlapping items are not rare. To solve that problem, we synthesize an overlap attribute to show the item density.

The upper right white rectangle shows that several items are overlapping exactly at that position. They are of same type and size and otherwise unnoticeable.

Transparency and Stereovision

50% Transparent
	   Scatter Plot

Using accelerated graphics card provides graphic attributes not available on traditional graphics APIs. We have experimented with transparency and stereovision. Transparency is required with overlapped items but is not sufficient by itself to understand the number of overlaps. Furthermore, by blending colors, it interferes with its preattentive processing. Therefore, transparency is only useful when it can be varied interactively to reveal overlaps and density of overlapping items.

Perspective view

Stereovision is now available on all the standard graphics cards for less than $100 through shutter glasses. Stereovision is preattentive, but there again, overlaps interfere with it and cannot be avoided at all, even with space-filling visualization techniques, since stereovision requires a perspective projection that introduces occlusion and therefore overlaps. Like transparency, we have found stereovision mostly useful for transient inspections.

Animation and Interaction

Exploring a large data set without a-priori knowledge requires trying several mappings from data attributes to visual attributes. A special problem arises when changing views using time multiplexing: the whole screen changes and the user has usually no clue on where the regions of the previous view have gone on the new one. We have designed techniques to manage these changes in a smoother manner.

View Flipping

When the positions of the data items are preserved - i.e. when only changing colors or stacking order - flipping between views enables quick comparisons thanks to the retina persistency. See the animation (1Mb file).

flip2 flip1

In this example, the first configuration shows a directory tree using a coloring by file types (categorical) and the second by file age (numerical). Flipping shows the relationships between them.

Interpolation of geometrical attributes

When the geometry is modified between two views and the display flips from one to the other, it is hard and long to understand relationships between views, even with a few data items.

We have implemented a set of interpolation techniques to animate the transformations from one view to the other so that the eye can follow one or a set of close items and understand their trends between both views. When some items are selected, we only animate them. The animation lasts one or two second and can be replayed back and forth interactively. We also provide a slider to interactively interpolate from the current view to the previous for further control and understanding of the movements of items.

The simplest technique for animation is linear interpolation. However, when several visual attributes change, linear interpolation is confusing and only allow users to track position changes at best. To help users understanding trends, we split the animation into one or two stages: 1) changing positions and 2) changing dimensions.

Animation of Scatter plot configurations

sp2sp-small2 sp2sp-small

For scatter-plot-like visualization, position and size are independent so the middle configuration is easy to compute. Linear interpolation of positions is used to reach the middle configuration from the initial and linear interpolation of sizes is used to reach the final configuration. Both show trends when they exist. See the animation (13Mb file).

Animation of Space-filling configurations

Space-filling visualization techniques compute positions according to the size of items. Most of them [SeeSys,SeeSoft, Slice&Dice Treemaps] can be animated by linearly changing the size attributes since they use a layout that remains stable.

sd2sd-small2 sd2sd-small

Clicking on an image plays the animation of two configurations of Slice&Dice treemap showing the logarithmic size of files on the first and the linear size on the second. Large files, such as videos, use most of the screen real-estate and hide smaller files, only visible on the first configuration. See the animation (12Mb file).

Only some treemaps (squarified, strip, ordered, etc) use layout algorithms that are not geometrically stable. However, we found a general way to stabilize all these layouts for stage 2 by computing the final layout and linearly changing the sizes of all areas according to their initial values.

sq2st-small3 sq2st-small2 sq2st-small

Clicking on an image plays the animation of a squarified treemap configuration showing the logarithmic size of files changing to a strip treemap showing the linear size of files. The first stage uses linear interpolation whereas the second changes the size of zones but keeps the final layout. See the animation (15Mb file).

sq2sp-small2 sq2sp-small

Animation between unrelated layouts is also possible, such as squarified treemap to scatter plots. See the animation (14Mb file).

Dynamic Queries

Dynamic queries rely on interactively filtering and redisplaying a data set through a continuous interaction. Current systems use "range-sliders" to filter one attribute either changing the smallest value, the largest, or sweeping a range of values between the smallest and the largest. To achieve the redisplay speed required for smooth interaction, we have designed a technique that relies on hardware acceleration. The data set should be loaded into main memory. When the user activates a slider to perform the query dynamically, all the items are sent to the GPU and stored in a display list. The Z coordinate is calculated according to the attribute being filtered by the dynamic query so, for example, if a film database is displayed and the user wants to filter on the size of the film, the size is assigned to the Z-axis. Each time the slider moves, a new near or far plane value is computed and sent to the GPU and the list is redisplayed, leaving the visibility computation to the hardware.


Sliders or range-sliders are used to control the interaction. On current systems, their precision is limited to their size, augmenting the size increasing the precision at the cost of screen real-estate and longer movements to reach the slider. Our sliders and range-sliders are small (around 100 pixels) but their precision increase when the mouse leave their region. Fast changes can be done by keeping the mouse on the control's region while precision is achieved by going farther away from the control. See the animation (32Mb file).

Dynamic Labeling

Textual labels are not preattentive but are nevertheless important to understand the context in which visualized data appear. Labeling each item cannot be done statically on a dense visualization so we have used and improved the Excentric Labeling dynamic techniques.


For Treemaps, when left clicking, a popup menu is triggered to select the depth level of interest. Dynamically, labels at this level are displayed on the outside region, which is also grayed out. See the animation (10Mb file).


For scatter plots, moving the mouse over items shows their labels around the cursor. See the animation (4Mb file).

Displaying readable text over colored areas is achieved by outlining it. We use black text with a white outline. The outline is computed by applying a one pixel morphological dilation on each character.


Our system reads data encoded in XML or a directory structure as input formats. It is made of 23,000 lines of C++, using high-performance techniques such as template metaprogramming [21] to achieve the required speed. We have used it with an NVidia GeForce3 board on a 2Ghz Pentium and a 3Dlab Wildcat 5110 on a dual 1.7GHz Pentium. To scale to the million, the computation of layouts should be done in time linear with the number of items. This is the case with treemaps and scatter plots but not with VisDB for example. Even using the fastest techniques, layout computation takes about 50% of the redisplay time.

Despite the high theoretical performance of the boards, we have not been able to go beyond 6 millions quads per second on any of the boards we tried. The theoretical speed of 15 million triangles per second is only achievable for triangle strips, which is of no use for scatter plots and would require expensive computation for treemaps.

Combining software and hardware techniques provides a sustained performance around 2.5 million quads per second. By using texture mapping for animating treemaps, we achieve 10 frames per second for animating across any family of treemap.

For scatter plots, we have only reached 3 frames per second for animations on 1 million items and 6 frames per second as worst for dynamic queries. Finding techniques for improving that speed would be useful but the next generation of graphics cards and computers will solve the problem.

These results are an improvement by a factor 20 to 100 on the best systems available.


The system is available under the Free Software Foundation GPL (General Public Licence). You can download the latest version from this page below.

Download the binary version millionvis-0.2-win2k.tar.gz or

Download the source version millionvis-0.2.tar.gz or

Web Accessibility