Gunjan Dang,
Chris North, Human - Computer Interaction Lab,
University of
Maryland, College Park MD 20742
Draft
1.0 5/4/2000
Introduction:
One of the biggest
challenges faced by organizations that make an increasing amount of data
available to the users nowadays, is the need to represent it in a
comprehensible form.
The Census Bureau is a perfect example of an organization that first collects an enormous amount of data, and then needs to disseminate this information to the public. It is necessary that this information be represented in a way that is both convenient to visualize and easy to analyze.
Most conventional database
systems demand that the users possess the required skill set to be able to formulate
queries and also presumes the users’ familiarity with the structure of the
database and other details. This expectation of the end-users can hardly be
considered reasonable.
What is required is a way
that enables the users an easy grasp of the content of the data in its entirety
or maybe just portions that might be relevant, without having to make large
effort. The users might be looking for answers to specific questions, or might
be interested in noting trends or exceptions inherent in the data. Studies have
shown that representing data in an animated visual form and allowing dynamic
user control over it, is an efficient way to facilitate easy comprehension of
the data.
The Census summary data is
mainly represented region-wise, each region being associated with a large
amount of data in the form of its attributes. For e.g., data about counties
might typically consist of a list of all the 3148 counties of the USA, with
related information about each of them such as population, per capita income, distribution
of various ethnic groups in that region, population distribution among various
age groups, median rent, median value, area etc. There are a plethora of
applications that this kind of data would be advantageous for. For instance,
consider a senior citizen looking for a place to settle in after she retires,
or a business considering relocation, or maybe even a foreigner interested in
learning more about the country.
The limitless applications
of the Census data and the wide mix of users that it attempts to cater to, have
been the main motivating factors for the creation of Dynamap, a map-based
generalized tool built for the Census Bureau to facilitate convenient
representation of their data.
Related work:
The concept of dynamic
queries was developed and applied to resolve a number of practical problems by
the Human Computer Interaction Lab. The inception of the Dynamic Query method
was with the development of a tool at HCIL called the Dynamic HomeFinder. This
tool consisted of a Map of DC, with homes displayed as dots on the map. Sliders
were used to represent the query graphically, where each slider represented the
possible range of values that the parameter represented by the slider could
take. Dragging a slider was equivalent to entering a parameter to complete the
query. The results of the query were displayed as the filtering out (or in) of
dots representing the houses. Hence a
visual display of both the query formulation part and the real-time update of
the display was facilitated by this tool. The main limitation of the Dynamic
HomeFinder was that it was not easily scalable to add more sliders or
attributes. It was custom-programmed to deal with a particular database.

Fig 1: The HomeFinder tool
An improvement over the
HomeFinder was the FilmFinder, tool designed to explore the film database, that
generalized dynamic queries to non-spatial databases. It consisted of a
starfield display, a double box range selector to make dynamic queries, and
tight coupling among all the components. The FilmFinder needed to be extended
so as to be able to deal with larger databases, and varied kinds of
information.

Fig 2: The Dynamic
FilmFinder Tool
The Census web site also
presents a number of data access tools, such as the American Fact Finder, which
also supports thematic maps. ESRI
arcview is a popular desktop mapping tool that provides mapping and spatial
analysis techniques.

Fig 3: The American
FactFinder on the Census website
Other examples of related
work include the work done on the health statistics map on data exploration
using dynamic queries for the National Center for Health Statistics. The
interface consisted of a choropleth map of the US that allowed dynamic querying
on the map to filter out areas in accordance with the parameters of the query.
The restriction in this interface was that only a limited number of queries
could be made, also it had limited zooming capability.

Fig 4: Dynamic Queries on a
Health Statistics Map
Dynamap borrows some of its
features from these tools, especially from the health statistics map. It is
however a generalized distributable tool, which has the flexibility that it can
be used with any kind of data that might be associated with geographic regions
(basically the elements of a shape file.). Its main feature is that it allows
dynamic querying on a choropleth map, it also contains other features that are
described in detail in the next section.
It overcomes some of the limitations of the interfaces developed
previously, for instance, the number of sliders is not limited, the attributes
they correspond to are not fixed, neither is there a constraint on the size of
the database that can be loaded using Dynamap.
Dynamap:
Dynamap is a generalized
map-based tool designed to allow easier viewing and better analysis of
map-related Census summary data. The
tool handles two kinds of tasks especially well, open ended exploratory tasks
and specific scenario tasks.
Interface description:
The interface of Dynamap consists of a dialog box to load the shape file input. As the file is loaded into the tool, the map is displayed, and the attributes related to each of the elements of the map appear at the side in the form of adjustable widgets. Each slider represents the range of values (min. to max. ) associated with the attributes. Moving a slider enables the formulation of a query and the map-elements are then filtered ( in or out) depending on the parameters of the query. The real advantage lies in the presence of multiple sliders, the user can formulate complex queries by adjusting more than one slider and view the results on the map. It also possesses a feature to color the map-elements in accordance with any of the available attributes.
For example, consider a situation in which a senior citizen, about to retire, is looking for a suitable location to move to. One of her primary concerns could be to look for places with low- median rent. Dynamap can be colored in accordance with this attribute as shown.

Fig 6: The Dynamap loaded with the States Map colored according to the attribute Median Rent
The darker regions indicate regions with low value of Median rent and the lighter ones higher values.
Besides low median rent, the user might also want to stay at a place where there are more people of her age-group. Hence she could then use Dynamap to find out the states with a larger number of people of her age-group. All she needs to do it adjust the slider to filter regions with large values for the attribute “ AGE_ 65_UP “ as shown in the next figure.

Fig 7: Dynamap showing the combined results of coloring by MEDIAN RENT and filtered by the attribute AGE_65_UP.
The regions “gray-ed” out are the ones that do not fall into the specifications provided by the slider. The ones that still remain colored by the attribute are the ones currently under consideration.
Hence, as shown, the search has been narrowed down to a few states. Let us suppose, she prefers to live closer to the East coast, we zoom into the state selected.. The tool supports zooming and panning capabilities to observe data patterns in smaller or denser regions. The refresh and coloring is much faster when the user has zoomed into a particular region. We click on the region to find out all the details about it, which are displayed in the text-box at the bottom right hand corner.

Fig 8: Zooming into state selected and clicked to view other details in the text-box.
An additional feature of Dynamap can be taken advantage
of if there are multiple datasets in one folder. If this is the case, then a
second map, when loaded, zooms in by the same factor as the previous one. So
now, if the user wants to decide what county to move in, in the state she has
selected, all she has to do is load the county map in the tool. The attributes
of the county map now load as widgets on the side and can be manipulated to
further explore county-specific patterns/ data.

Fig 6: Counties map opens in the same scale as the States map, multiple
dynamic querying and coloring by
attribute on the county map.
Besides handling polygonal geographic regions, Dynamap also has the ability to handle
map-elements of different types such as lines or points on a map. The next figure shows the Highways map of the US and we can just explore the patterns and note that most of the long highways are towards the Central and Western parts of the country. Most of the highways in the Eastern region are divided. Or Figure 8 gives us the information that State of New Mexico is the one in which most females of age above 65 reside ! Hence this tool besides providing a way to get specific information also makes casual exploration of data more interesting, owing to the animation afforded to the dynamic querying process due to the presence of sliders.

Fig
8: Display of map-elements that are in the form of lines

Fig
9: Display of map-elements in the form of points
Design and Implementation:
Dynamap has been built using
ESRI Mapobjects 2.0 and Visual Basic
6.0. The tool loads map-data from a Shapefile workspace. The shapefile
specification consists of the following: a main file (.shp) containing GIS
spatial vector data, an index file(.shx), and a dBASE file (.dbf) containing
data in the form of attributes of the map-elements
The ESRI Mapobjects software
provides a lot of functionality, which can be easily exploited using VB. However, Mapobjects basically focuses on and
provides the means for static representation of the map-data. To extend this
functionality to support an efficient dynamic representation was the major
challenge in this project.
The first task that we faced
was the formation of a composite query, every time the slider moved. which was
quick enough so that update seemed dynamic.
The naïve algorithm just
read all the slider values whenever any slider was scrolled and then formed the
query. However this method was unacceptably slow. So an event was added to the
sliding widget that indicated that it had been touched. The new algorithm,
worked as follows: Every time a slider was touched for the first time, all the
sliders that had been touched were read to form the query. However, now when
the same slider was further scrolled, only that part of the query, which
related to that particular slider, was re-computed. This showed considerable
improvement in performance.
After the efficient query
formulation algorithm was put in place, the only bottleneck to performance was
the map coloring / refresh mechanism. The need to color the map at a rate fast
enough to keep it in sync with the slider motion and maintain the semblance of
a dynamic action, led us to try several algorithms each one improving upon the
previous one. Two map layers were used for this purpose, one gray to indicate
the items that had been filtered out, and a yellow layer on top of it, that
corresponded to the results of the query. The query acted only on the top
(yellow) layer, however all the layers needed to be redrawn, this caused the
map display to seem flashy and a little sluggish.
The optimization used to
improve this algorithm was to refresh and redraw only what was necessary
instead of the entire map. To implement this, the two map layers were used in
such a way so that one acted as a positive query and one as negative query. A
positive query indicated an increase in the number of selected elements (to be
displayed on the map) and the negative query, the filtering out of
map-elements, so that together both the layers cover all the elements of the
map. The negative query is simply the complement of the positive query. Now,
whenever the user tightened the query (moved sliders inwards) only the negative
query layer needed to be redrawn, and if the query was loosened (sliders moved
outwards) only the positive query layer was redrawn. This led to a significant
improvement of performance.
However, this approach
pointed us to a further optimization, namely, the computation of a differential
query. Every time a slider is moved, a query is created that computes the
difference between the previous view of the map and the current view. This
query lists only those elements that were just filtered (or unfiltered) by the
most recent slider change. This change is simply incorporated into the
corresponding layer, which is then redrawn. Every time there is panning and
zooming however, the entire map needs to be refreshed. Also the coloring by attribute
implementation has to be coordinated with filtering part, so that if the user
first colors by attributes and then does filtering, the elements remain colored
by attribute if they haven’t been filtered out (in which case they are gray).
Performance:
The goal of a dynamic representation of an interface is
that the updates/ results appear in sync with the change in query. Dynamap showed a considerable improvement in
performance with the new algorithms. With the query optimization algorithm in
place, and the naïve algorithm where we played with just the top layer of the
map, although the update speed increased, the display was too flashy. After the
algorithm was optimized to include two layers, one each corresponding to the
positive and negative layer respectively, the flashiness disappeared and the
update speed increased two-fold. With the final optimization in place, which
required only one layer to be redrawn, the update when observed with the States
map ( all 51 states visible ) was in real time, i.e., in complete sync with the
slider movement. However with the
Counties map ( with over 3000 counties ), the update speed drops to about a
single update per second. Dynamap gives near real-time performance for up to
almost 500 counties. These measurements were made on a GATEWAY Pentium II
Processor, 450 MHz, 128MB RAM.
Limitations and Future Work:
As Dynamap moves closer to
becoming a complete working tool, there are a number of enhancements that can
be go towards making it more practical and efficient.
One of the immediate things
that should be worked on is to add to Dynamap is the ability to add more data
(in the form of attributes associated with the geographic elements) from other
sources. (E.g., EXCEL Database). Since ESRI Mapobjects2.0 supports this
functionality, this task is achievable. Other possible enhancements could
include adding eccentric labeling to Dynamap, adding a search feature, better
legends, enabling multi-select, displaying details of selected regions in a
table instead of a text box. Of course, further refining the algorithms to make
the update process even faster/better
would be a plus. It would also be of practical advantage to make this tool
portable to the Web.
Usability studies need to be conducted to get feedback on
the efficiency of the tool, i.e., to ascertain whether Dynamap actually
delivers what it promises to.
Conclusions:
Dynamap appears to be a
promising tool and might prove to be yet another success story for the dynamic
querying approach. Besides that it also provides the flexibility that most of
the previous applications lack. It seems to have enough potential to be able to
assist in the Census Bureau’s gargantuan task of convenient data representation
for the general public. We are working to put Dynamap on the Census CDRom
alongside their data to make the data more intelligible for the end-user. There
is also a motivation to put it on the Census website where it can be accessed
by an even greater cross section of the population.
Acknowledgements:
This project is supported by the Census Bureau.