Fuzzy Tree




Fuzzy Tree

FuzzyTree is a class derived from JTree that shows a partial view of an XML tree. It is primarily intended for viewing large XML documents in a limited display size. The number of elements to display, x, is specified, and the tree displays the "best" x elements in the tree. Determining the best elements to display is done by assigning a score to each node in the document, based on the size of the subtree that the node is the root of, and the number of subtrees isomorphic to the subtree. More details on the algorithm are provided below.

Expanding and collapsing subtrees in the FuzzyTree is accomplished by modifying the scores of nodes so that they will be added to or removed from the tree. As a result, expanding or collapsing in the FuzzyTree does not have the same behavior as in a JTree. In particular, a node may have three states, in contrast with the JTree, where a node has one of two states (open or closed). In the FuzzyTree, a node may be fully open (all of its children are displayed), partially open (some of its children are displayed), or closed (none of its children are displayed).

Sometimes, a node may be in the display tree, but its parent may not be (because it has a much higher score than its parent). In such a case, the node is placed under its nearest ancestor, and is an "orphan," and has a different colored icon.

Fuzzy Tree Viewer


To run the Fuzzy Tree Viewer, use the following command:

java -classpath xerces.jar;. FuzzyTreeViewer <xml document>

java -classpath xerces.jar:. FuzzyTreeViewer <xml document>

<xml document> is the optional name of an XML document to load initially. The -help switch prints a short list of enviroment variables used by the application.

The Xerces library is used for XML parsing.


A screen shot of Fuzzy Tree Viewer is shown below.


The slider allows the user to change the number of nodes that are displayed in the Fuzzy Tree. Alternatively, a new number of nodes can be entered into the text field. The slider is a MovingSlider, and the labels and position of the slider will change as the user moves it right and left. The slider displays a small portion of the range of acceptable values in a larger virtual range. A new XML document can be opened using the Open New File button, and the parameters used by the Fuzzy Tree can be changed using the Change Parameters button. The node's icon or label can be clicked on to expand/collapse. The last node clicked on is highlighted.

Types of clicks in Fuzzy Tree

Image Files

Node icons are loaded from the following files:

Environment variables:
Name Type Description
Default Value
initialDisplayedNodes int Initial number of nodes displayed in the fuzzy tree
minSliderVal int Minimum value for slider
sliderDisplayRange int Range of values displayed by the slider
sliderLeftMargin int The minimum number of values to the left of the slider position that should be displayed
sliderRightMargin int The minimum number of values to the right of the slider position that should be displayed
minExpandPercent int Minimum percentage score increase when expanding
maxCollapsePercent int Maximum percentage score decrease when collapsing
maxExpandPercent int Maximum percentage score increase when expanding
minCollapsePercent int Minimum percentage score decrease when collapsing
childrenToExpand int Approximate number of nodes added in a non-shift click
childrenToCollapse int Approximate number of nodes removed in a non-shift click
increaseReductionPercent int Percentage by which score increase percentage is reduced at each level
decreaseReductionPercent int Percentage by which score decrease percentage is reduced at each level
increaseThresholdPercent int Threshold at which score increase is no longer applied
decreaseThresholdPercent int Threshold at which score decrease is no longer applied
userHistorySize int Number of score changes that are saved before the oldest one is undone
parserDebugLevel int Debug level for parsing functions, 0 prints nothing
debugLevel int Debug level for FuzzyTree, 0 prints nothing (see FuzzyTree documentation)
debugFilename String Filename to output debug statements to. If not specified and debugLevel > 0, output will go to System.out
actionsFilename String Filename to output user action statements to. If not specified and debugLevel > 0, output will go to System.out

Class Relationships

The FuzzyTree uses the FuzzyTreeModel class, and many of its properties are simply delegates to methods in the FuzzyTreeModel. FuzzyTree is responsible for the view of the data, and this class sets up the renderer and implements the mouse click event handler. FuzzyTreeModel implements the Swing TreeModel interface, providing methods for getting tree data (such as getRoot, getChild, getChildCount, etc.). The other two important FuzzyTreeModel methods are userExpandSubtree() and userCollapseSubtree(), which expand/collapse a subtree through score changes. These methods are called by FuzzyTreeModel when the user expands or collapses a node.

The XML data is stored in a static DataGraph member of FuzzyTreeModel. Contained in the DataGraph are DataNodes, which each represent a node in the XML tree. Nodes that are displayed in the tree are each an instance of DisplayNode, which contains a DataNode and some additional data (such as the node's parent in the display tree and node's children in the display tree). The DataNode contained by the DisplayNode has a pointer back to the DisplayNode that references it (for a DisplayNode d, d.dataNode.displayNode == d). A DataNode.displayNode value of null signifies that the node is not in the display tree.

Initial node scores are set by the ScoreGenerator class. The FuzzyTreeParameters class encapsulates the parameters used by FuzzyTreeModel in the score adjusting algorithms, and is used by the setParameters and getParameters() methods.



Each node has a score, which is a measure of its desirability of being displayed in the tree. When x nodes are to be displayed in the tree, it is always the x highest-scoring nodes that are displayed. A list sorted in descending order of score is maintained. The tree root is always displayed and is therefore not stored in the list. Its score may be changed as a result of score change algorithms, but these score changes have no effect.

Initial scores are generated for each node by the ScoreGenerator class. First, for each node, the product of the size of the subtree of which the node is the root and the number of subtrees isomorphic to it are computed.

The isomorphism relation is defined recursively as follows:

Equivalence classes based on this isomorphism relation are generated, and the order of the node it the equivalence class is determined by its preorder index (its position in a preorder traversal of the tree). Node scores are then adjusted so that when the nodes are sorted based on score, the first members of each class are first, then the second members of each class, and so on. Finally, all node scores are incremented so that the minimum score is at least ScoreGenerator.MIN_NODE_SCORE.


When a node is expanded in a normal click, the percentage score increase required to display minChildrenToExpand of its children is computed. If the node is shift-clicked, the percentage score increase required to display all of its children is computed. This value is then adjusted to be between the minimum and maximum percentage score increase, and is applied to the the node's score and the node's children. Then, the node's parent's score is increased by the percent increase times the increaseReductionFactor, its parent is increased by this value times the increaseReductionFactor, and so on, until increaseThresholdValue is reached. The same type of score increase is applied to the node's grandchildren (the score increase is reduced by increaseReductionFactor), great grandchildren have smaller score increase, and so on. So, score increases are reduced at each level down the tree, until increaseThresholdValue is reached.

The case for collapsing is similar, though score decreases are not applied to ancestors. In addition, the main percentage score decrease is applied to all of the node's children, though they might not be its actual children in the real tree (they may be lower descendants). As with expanding, smaller score decreases are applied at each level, until the score decrease reaches decreaseThresholdValue.

In both expand and collapse, the score of the node that was clicked on is artificially increased to be the highest score, so that it is ensured display in the tree. This increase is undone at the next action (expand, collapse, or change number of display nodes), and is stored in _temporaryUserActionList.


A history of score changes resulting from user actions is maintained. The reason for the history is that user actions can greatly inflate the scores of certain nodes, making it almost impossible for other nodes to ever be displayed. Therefore, after the user history is full (the size of the user history can be set using the environment variable userHistorySize, or the setParameters method), the oldest user history item is undone. That is, score changes that had been made are reversed, the sorted node list is reordered, and nodes are added and removed as necessary.

Known Issues

Sometimes mouse clicks are lost. I put a print statement in the mouse listener code, and it seems like sometimes it's not called. I'm not sure why.

XML documents that reference a DTD won't load in Windows. I think this is because the parser can't find the DTD, because of the current path or something like that.

The input XML document must be a tree. Error checking is not done to make sure of this, though I did comment out the code in DataGraph that added extra edges for a graph.

Possible Improvements

The current data structures assume that the XML file will fit in memory. However, this is an unrealistic assumption, and loading very large XML files in FuzzyTree could be very useful. To do this, the DataGraph data structure should be moved to disk. The sorted node list could be stored in a B+ Tree.

DataGraph is a public, static member of FuzzyTreeModel, and is used by DisplayNode. This means that only one DataGraph can be used in the application. Perhaps it could be made a singleton, or the design be changed so that multiple DataGraphs could be used (though I don't know why this would be useful).

The code for userExpandSubtree and userCollapseSubtree in FuzzyTreeModel are fairly long, and have some common code. This could certainly be redesigned.

FuzzyTreeModel.undoAction can be optimized. See documentation in code.

When expanding a partially expanded node, the code merely ensures that minChildrenToExpand nodes will be displayed. But perhaps it would be better to show minChildrenToExpand more node than those already displayed.

A very cool improvement to MovingSlider could be that it could determine its optimal display range itself, based on its size. When its size increases, it could increase its display range.

Web Accessibility