InfoVis 2003 Contest - EVAT - Environment for Visualization and Analysis of Trees

David Auber, Maylis Delest, Jean-Philipppe Domenger, Pascal Ferraro, Robert Strandh

{maylis,auber,domenger,ferraro,strandh}@labri.fr
LaBRI - University Bordeaux1

See Infovis 2003 Contest rules and task at http://www.cs.umd.edu/hcil/iv03contest/

Ratings used below: (Strength,Possible,Difficult,Not Available)

Pairwise comparisons of trees: Topological changes

Did anything change, in general, or in a subtree?

Rating:
Strength
Process:
Two overviews are shown side by side. Due to the property of the layout algorithm, changes on a tree areopagated to changes on the labels and minor or major changes are visible. Then zooming allows us to detect any changes.
Image:

Overview and zoom on LogsA and LogsD
Answer:

What nodes were added, deleted?

Rating:
Strength
Process:
By the iteration of the fast heuristic, changes are extracted. Then, the changes are highlighted.
Image:

Overview and zoom on LogsA and LogsD
Answer:

Did any node or subtrees "move" in the tree? Can you characterize those movements?

Rating:
Strength
Process:
After extraction of similar subtrees, the fast heuristic is appliedonece more. Parts of the trees have same colors and suggest the movment in the tree.
Image:

Overview and zoom on ClassifA and ClassifB
Answer:
  • In view 1, one can see, looking at the colors, that in the upper-right corner, something have moved.
  • In view 2, this part is displayed. It corresponds to the Mammalia directory. The colors given by the fast heuristic identify the same subtrees. For instance, some violet or blue subtrees have moved.
  • In view 3, for instance, the Scandentia directory has moved. In ClassifA file, it was a child of Eutheria (child of Theria, itself child of Mammalia). In ClassifB file, it is a direct son of Mammalia.

Pairwise comparisons of trees: Attribute value changes

Global impression: did things change a lot or not?

Rating:
Strength
Process:
The size and hitCount attibutes have been mapped to the size of the node and to the color respectively . Big blue circles represent the biggest files with high access.
Image:

Overview and zoom on LogsA and LogsB
Answer:
  • In view 1, with the uniform mapping algorithm of hitCount to RGB color and the linear mapping algorithm of Size to the size of nodes, one can see that the biggest files are around the monte directory. In upper-right corner, around the papers directory, the hitCount values have changed.
  • In view 2, a scatter plot is issued in log scale. No big changes can be detected but a lot of small changes can be visually detected. A selection of the biggest and more popular pages can be selected: upper-right corner.
  • In view 3, one can see the previous selection on the global view.
  • In view 4, one can see the selection using the expert search. As instance, the file PFWeb3.jpg has a size 773 155 and has decreased is popularity from 263 to 249.

What nodes or subtrees changed the most?

Rating:
Strength
Process:
Same algorithm as previous raws. Visual detection on the view.
Image:

Overview and zoom on LogsA and LogsB
Answer:
  • In view 1, hitCount is uniformly mapped to the size of the node. Down in the middle, one can see a node whose size has drastically changed.
  • In view 2, zooming on the node shows that it is the csicCurrent.jpg file whose hitCount has decreased from 3693 to 2367.

Did the value of attribute XYZ for this node increase or decrease? In absolute terms, or relatively to other siblings or other nodes.

Rating:
Not available
Process:
Image:
Answer:

General visualization of trees: Topology

Overall characteristics: How large is the tree? How many levels deep? What is the deepest branch? Does the depth vary between subtrees or not?

Rating:
Srength
Process:
One can select using the histogram and then the deepest branches are displayed.
Image:

Overview and zoom on ClassifA
Answer:
In all, the views, the number of nodes is displayed in the left corner of the window. The ClassifA file has 190 264 nodes.
  • In view 1, the histogram of the depth are shown. The maximum is 14. One can select the nodes whose value is 14.
  • In view 2, the extraction of selected nodes is displayed in a hierarchical layout. Thus, the deepest branches are displayed.
  • In view 3, the global view shows the difference between depth. There are big differences.

What is the path of this node?

Rating:
Strength
Process:
All the nodes in the path are shown in pink.
Image:

Zoom on ClassifA
Answer:
In upper-left window, the path to Mammalia is shown. In the central view, the path is displayed. The path is Animalia/Chordata/Vertebrata/Mammalia.

Local relatives: What are the children, siblings, or cousins of this node?

Rating:
Not Available
Process:
Image:
Answer:

Filtering by level: Show only the first level, or show only 3 levels down, or remove all the leaves

Rating:
Strength
Process:
Filter on the Depth and/or Arity variables.
Image:

Overview and zoom on ClassifA
Answer:
  • In view 1, the first level is shown.
  • In view 2, the three first levels under the node Mammalia are displayed.
  • In view 3, the global view with leaves suppressed is drawn. The size of the subtree 35342.

Topologies question that involve counting nodes can be seen as attribute dependant questions: e.g. Which branch contains the largest number of nodes? or Which branch has the largest fan-out?

Rating:
Strength
Process:
Filter using the expert search mode. Use the variables node and leaves.
Image:

Overview and zoom on ClassifA and LogsA
Answer:
  • In view 1, one can see that Arthropoda is the biggest branch, with 107038 nodes.
  • View 2 is a zoom on the Arthropoda subtree.
  • In view 3, in the logsA file, Hollings is the subtree with the largest fan-out.

General visualization of trees: Attribute based

Find nodes with high values of a numerical attribute X? (relative query)

Rating:
Strength
Process:
In expert search mode, one can choose an attribute and filter highest values. The nodes are highlighted in pink.
Image:

Overview on LogsA
Answer:
In this view the biggest files are highlighted. The largest one is coopingstreamlarge.mov whose size is 239520417.

Find nodes with given value of a numerical attribute X? (absolute query)

Rating:
Strength
Process:
In search mode, choose the attribute X and choose the value. The nodes are outlined in red.
Image:

Overview on LogsA
Answer:
In logsA file there is 43718 nodes with a hitCount of 0. Labels correspond to the root of the largest subtree.

Find nodes with value Y of categorical attribute X - What value of a categorical attribute occurs more often? e.g. Are there more farm animals or pets?

Rating:
Strength
Process:
In search mode, choose the attribute X and choose the value. The nodes are outlined in red.
Image:

Overview on ClassifA
Answer:
In the ClassifA file there is 26164 nodes with rank equal to Genus. Labels correspond to the root of the largest subtree.

Find nodes with certain values of two or more attributes (What video file is used the most?)

Rating:
Strength
Process:
The scatter plot layout allows the user to see the cross distribution of variables. In expert search mode, one can select and combine several attribute into a boolean expression. Selected nodes appear in pink.
Image:

Overview and zoom on LogsA
Answer:
  • In view 1, the scatter plot of hitCount in x and size in y is displayed.
  • In view 2, we have selected the files located in the upper-right corner of the scatter plot. They are displayed in a hierachical layout. Thus the user can know the position in the tree.
  • In view 3, we select a file inspecting three parameters: the size, the hitCount, and a substring of the name. Only the file icdl.mpg has a size 20097952 and a hitCount greater or eqal to 71 with a .mpg suffix.
  • In view 4, we see that PhotoDB.zip the size of which is 44321336 thus larger than icdl.mpg is as popular as icdl.mpg. The largest file and more popular is CS-TR-4405.pdf. Its size is 1246206 and hitCount 251. The second largest .mov file was visited only 7 times.

Number of nodes in a tree or subtree? (How many animals? How many mammals?) Comparison of branches of the tree (Subtrees with most nodes; are there more mammals or fish?) Largest fan-out (What is the largest group of animals with same lineage?)

Rating:
Strength
Process:
As shown in the previous section, nodes, fan-out are precomputed automatically in EVAT. Then each attribute choosen can be mapped and highlighted to node size and colors.
Image:

Overview and zoom on ClassifA
Answer:
  • View 1 shows the Mammalia subtree. The number of leaves is mapped to the size of the nodes. The total number of nodes in ClassifA is 190264 and the number of leaves is 154922. The total number of nodes in Mammalia is 3023 and the number of leaves is 2137.
  • View 2 presents the extraction of the Bonyfishes subtree which has 13180 nodes. Thus there is more Bonyfishes than Mammalia. Bonyfishes has 175 piranhas and 299 Soles.
  • View 3 shows the branch with the highest number of animals (leaves): 91218.
  • View 4 shows the node that has the highest number of leaves (378): Drosophila.

General visualization of trees: Known items

Which nodes have a particular string in their label? (Find "giraffe" in a tree of animals)

Rating:
Strength
Process:
Use regular expression in the search mode. Nodes appear in red in real time.
Image:

Overview on ClassifB
Answer:
Using Common name, there is no giraffe in ClassifA file. There are two in ClassifB: Giraffe and giraffeSeaHorse.

Locate a node knowing its path

Rating:
Not Available
Process:
Not really useful. If one knows a node surely one is able to give its label. Then, the problem is solved by task T3.2.
Image:
Answer:

Go back to a node you have visited before

Rating:
Not available
Process:
Image:
Answer:

General visualization of trees: Labeling

Review all the labels in a subtree

Rating:
Strength
Process:
After extraction of the subtree, the search menu gives access to all information for any attribute.
Image:

Overview on ClassifA
Answer:
The Aphididae subtree is shown. All labels (common name or latin name) are displayed in the lower-left window.

General visualization of trees: Browsing

Explore the tree by performing a series of up and downs in the tree

Rating:
Strength
Process:
These actions can be performed by mouse movements and/or by resize, extract menus. For instance in the search menu, one can extract the subtree from a node (extract down) or extracting the path to the node (extract up).
Image:

Overview and zoom on ClassifA
Answer:
  • View 1 shows a zoom of Petrygota.
  • View 2 presents a focus on Miridae. Note that Petrygota was not lost after the zoom action.
  • View 3 shows the path from Animalia to Miridae (extend up action).
  • View 4 shows the Miridae subtree (extend down action).

General visualization of trees: Managing the analysis

Marking nodes of interest

Rating:
Strength
Process:
Through the use of the search menu, nodes can be selected. Then, the shape menu in visualization part allows to mark the nodes of interest.
Image:

View on phylo_A_ABC_03-02-01
Answer:
All the vertices whose labels include the substring abc are displayed as a square. Others are displayed as a circle.

Removing special anomalies

Rating:
Not available
Process:
Image:
Answer:

Saving visualization settings for future reference

Rating:
Strength
Process:
File - Save in TLP format
Image:
Answer:
A file containing all the loaded or created views.

Keeping the history of your analysis, reviewing it and replaying it with different parameters

Rating:
Not Available
Process:
Image:
Answer:

Phylogenies: Application specific tasks

Rating:
Difficult
Process:
We have only implemented the well-known Zhang algorithm. The result is color on the two trees. If two nodes of two different trees have the same color then it means that Zhang algorithm suggests mapping one node to the other. If the color is white then it means that Zhang algorithm does not map this node to any node.
Image:

View on phylo_A_ABC_03-02-01

Overview on phylo_A_BAD1_ABC_03-02-01 and phylo_A_BAD2_ABC_03-02-01.xml
Answer:
  • The first view shows the two trees.
  • The colors are given by Zhang algorithm. This suggest that branches with same colors are similar.

Classifications: Application specific tasks

To what extent are the differences in the classifications due to differences in how animals are thought to be related? Are there other kinds of differences and can you explain them?

Rating:
Strength
Process:
Use fast heuristic and the union-intersection selection tool.
Image:

Overview on ClassifA and ClassifB
Answer:
  • In view 1, all the nodes that are not in the two trees have been suppressed. One can note that the two trees are the same.
  • In view 2, the nodes which appear in one tree and not in the other are displayed in red. As instance, the Aves directory creates a big difference.
  • In view 3, the fast heuristic has been applied. The colors suggest that some parts are the same in the two subtrees.

Can you say in how many different subtrees a particular common name (such as "dolphin" or "horse") is used? How closely are these animals related? Are common names a good guide to understanding relationships?

Rating:
Strength
Process:
Search on attribute common name the string. The extract down action allows us to know the subtrees. The extract up action shows their common ancestor. a new view
Image:

Exploration of ClassifA file
Answer:
  • In view 1, we choose to extract "horse" and "Horse". There are 46 nodes having such a label, 3 are root of subtrees and 22 are isolated leaves.
  • In view 2, we extract "dolphin". One can see that common names are not very relevant for the classification. There are dolphin mollusks and dolphin mammals.
  • In view 3, the same common name is used as label river dolphins for two different nodes.

How many species or subspecies are named after biologists named "Townsend"?

Rating:
Strength
Process:
Search on latin name attribute the string. Display the rank.
Image:

Exploration of ClassifA file
Answer:
  • The View 1 shows that the name Townsend or townsend appears in 51 nodes
  • View 2 suggests that Townsend have done research on many groups of animals. Indeed, it is the whole tree and there are many pink circles on the drawing. He worked on 47 species and 9 subspecies.

What kind of feedback does your tool provide to alert the user quickly when a wrong name is entered?

Rating:
Not Available
Process:
Image:
Answer:

For the top five subtrees with the most nodes-- are they likely to have a parent of a particular rank? Or does this happen in many ranks? Can you comment on how useful "rank" is?

Rating:
Strength
Process:
Filter on the size intrinsic attribute.
Image:

Exploration of ClassifA file
Answer:
The five biggest subtrees are in the same branch. Thus they have a common parent which is the biggest. The rank is Phylum. On the lower-right corner of the view, the ranks are displayed in decreasing order. The root of the five subtrees appear outlined in pink.

File system and usage logs: Application specific tasks

Where are the big directories?

Rating:
Strength
Process:
On the visualization menu, map extrinsic cumulative attribute size.
Image:

Exploration of LogsA file
Answer:
The biggest directories are displayed in red. The directories users, projects and movies have a cumulated size greater than 109. The largest directory is users.

Can you see different patterns in the files? (Can you make out the difference between personal pages, class pages and research project pages?)

Rating:
Strength
Process:
Use the Fast heuristic in the comparison menu and zoom.
Image:

Exploration of LogsA file
Answer:
We have chosen to compare two subtrees usershollings and hollings.
  • The first view shows the coloring on the two subtrees. Same color suggest same subtrees.
  • View 2 uses a bubble layout. One can see some similarity between subtrees.
  • A zoom of lectures shows that the two subtrees are the same.

Were there a lot of pages created recently? If so, in which part of the file system?

Rating:
Strength
Process:
Visualization, colors, RGB, linear - map extrinsic attribute ctime to color .
Image:

Exploration of LogsA file
Answer:
  • In view 1, the red nodes are those created recently.
  • In view 2, the 1003 more recent files are selected. Some of them are displayed in the central view.
  • View 3 shows an extraction of the 1003 files. One can see the directories in which they appear.
  • In view 4, the skeleton of the previous extraction is shown.

Are the newer directories bigger than the older projects?

Rating:
Strength
Process:
Because the directories have no creation time, we have considered that the ctime of a directory is the cumulated ctime of its nodes. We use expert search menu and scatter plot
Image:

Exploration of LogsA file
Answer:
  • View 1 shows the newer directories.
  • View 2 is a scatter plot with ctime cumulated as x axis and leaves as y axis. Of course, the same action can be used on other parameters. One can see that the bigger newer are users and class. The usershollings directory is really old but with the big subtree.

When was the page giving directions to the department last updated?

Rating:
Strength
Process:
Use search menu and navigate.
Image:

Exploration of LogsA file
Answer:
In this view, we have selected all the files whose names contain the substring direction. * Only one file is located under the directory department. We look at the mtime attribute the value of which is 1013635672.

Which are the popular webpages?

Rating:
Strength
Process:
Use expert search menu for the selection and then extract the nodes.
Image:

Exploration of LogsA file
Answer:
The 26 more popular pages are displayed in red.

Are there some labs more popular than others?

Rating:
Strength
Process:
Search and extract.
Image:

Exploration of LogsA file
Answer:
We have selected all the files that belong to a lab directory. The hitCount is cumulated on each node. There are three lab directories:
  • under the directory usersgaburici has a score of 0
  • under the directory class/.../labs has a score of 13
  • under the directory users/gaburici/cmcs330c/old has a score of 79 and is the more popular.

Which areas are getting more popular? Less popular?

Rating:
Not Available
Process:
Image:

Exploration of LogsA and LogsD file
Answer:
  • View 1 shows the selection.
  • View 2 shows the difference between A and D. In the left window, the red nodes are those that become less popular.
  • For instance, the lectures directory has 1618 cumulated hitCount in A and 749 in D.

Are new pages more popular that old pages?

Rating:
Strength
Process:
Visualization, size - map extrinsic attribute hitCount to the size of the nodes
Image:

Exploration of LogsA file
Answer:
  • View 1 shows the selection on nodes having non zero ctime and hitCount greater than 444.
  • In view 2, the selction is extracted. hitCount is mapped to the size of the nodes and ctime to the color (i.e. yellow for small values old files). One can note that old files as banners have the same popularity as recent file globe2.
  • In view 3, the newest files are selected.

Which old pages are popular?

Rating:
Strength
Process:
Visualization, size - map extrinsic attribute hitCount to size and ctime to color.
Image:

Exploration of LogsA file
Answer:
In the view, the oldest popular files are selected. The created time is displayed in the lower-right window.

What proportion of the pages are never used?

Rating:
Strength
Process:
Use the detailed data and expert search menu.
Image:
Answer:
  • There are 20033 pages html or shtml.
  • 8746 are unused (hitCount 0)
  • 9163 has been visited 1 to 5 times

What proportion of the pages are seldom used?

Rating:
Not Available
Process:
Image:
Answer:

Other Strengths of the System

Found in tree movment of subtrees.

Rating:
Strength
Process:
Use fast heuristic
Image:

Exploration of ClassificationA and ClassificationB file
Answer:
  • In view 1, we have extracted Mammalia and applied the fast heuristic. One can see that many subtrees are the same.
  • In view2, the nodes that were not identified as similar in view 1 are extracted. We choose one of them, Cebidae, in order to show the differences.
  • In view 3, we have extracted the subtree Cebidae and used the fast heuristic. One can then see that the differences in Cebidae come from Callicebus.
  • In view 4, the paths to Callicebus are displayed and are different.
  • In view 5, one can see that in fact the Callicebus subtrees are the same. Thus the difference comes from the position in the tree.

Contact


LaBRI-Transfert
mail

Web Accessibility