The Current Prototype: Work in Progress - Most recent version at Raytheon STX
Images of Screens
Working Information for the Applet
This applet can be viewed on netscape or on internet explorer. The recommended
version downloads a
plug-in and installs it before running the applet. The plugin allows the
browser to run the applet written in Java 1.1 on Netscape. It also allows users
with Internet Explorer to view the applet as it will appear in Unix environment.
]Since some of the applet interface features still appear a little different on
internet explorer, the plugin version is recomended for previewing. The applet
can now run on these PCs using Netscape and Internet Explorer and on Suns using
Various Options of the interface layout were explored and a hybrid model of the vertical length and color
models was selected for implementation. The one dimensional variables will have histograms with their
length proportional to the number of datasets present. These bars will have redundant color coding. The same color coding scheme will be used to code the data present in each grid of two dimensional variables by overlaying a layer over each grid. The color of the overlay will depend on the number of data sets
present in each of the one dimensional grids.
Mockups of the three alternative schemes for the Interface
Selected Layouts - using redundant color and length coding for 1D data and color coding for 2d data
Vertical Interface with Annual (thick) Histogram Bars
Transparent Grid - Opaque Map
MultiValued Attribute Problem:
Query Previews present visual representations of data distribution patt
erns along certain meaningful parameters. As users use specialized widgets to
select the ranges of the parameters that interest them, the visual
representations change and the user gains a better understanding of the data
distribution patterns. Earlier implementations of the query previews solutions
ran into the problem of multi-valued attributes. Multi-valued attributes is a
term used to define a situation where there are more than one value for a
given attribute, for instance there are more than one actors in a movie. The
first solution to this problem was to duplicate the entries for each instance
of the multiple values. In some cases this lead to a large explosion of the
database and produced noticeably erroneous results (total number of hits). The
next solution was applicable to situations where all the parameters were
range variables and used Euler's formula was used to delete the replications.
Euler's formula computes the number of datasets that actually fulfil a query
by using a few simple arrays of data. Temporal data is considered
one-dimensional and has two arrays associated with it. The first array
specifies the counts of granules for each cell and the second array specifies
the number of granules that cross over from one cell to the next. For instance
the first array specifies the number of granules that have data for a cell
say march 1979 and April 1979 and the next array specifies the number of
granules that have data in both the months, march and April of 1979. If a user
queries how many granules have data for the months of March and April of 1979,
using Euler's formula the answer is the sum of all the granules per cell less
the sum of the crossovers for all the granules. In the case of the
geographic parameter, which is two dimensional, there are two more arrays
corresponding to vertical crossovers and vertex of corner crossovers. The sum
of the vertical edge arrays is subtracted from the net count and the sum of
the vertex array is added back in. The appeal of this solution, in addition to
the fact that it elegantly takes care of the multi-valued attribute problem
is that all the client needs is a set of four arrays of at-most (72 X 36) =
2592 integers (or 331 KB). This is the maximum amount of data that needs to
be transferred to the applet for each query.
The interface widgets were developed and a server side program was created to
return the answers to the queries made by the client side applet. This paper
is a brief description of the techniques that were used as solutions for this
server side application.
The CZCS database with more than 80,000 granules was used as a trial data-set.
The initial solutions have been limited to only two parameters time and
Using this prototype tool, users can preview the data distribution along
prespecified paramaeters and make their queries narrower even while minimizing
the possibility of zero and million hit querires. The parameters can be one
dimensional like time or two dimensional like geographical area. To select
a temporal region of interest the user is presented with a screen that has
a Range Slider, a histogram of the data distribution and a logarithmic scale.
As they slide the double slider to describe the zone of
interest, the length of the bar in the scale reflects the number of datasets (or
granules) that contain data for any part of that time period. The bar is also
color coded according to the amount of data present. A similar screen is used for
all one dimensional attributes. The geographic area selection site uses a
rubber-banding box to select an area or Interest. A translucent color grid is
overlayed on the map. The color of the grid cell depends on
the amount of data that is present in that area. The total amount of data present
in the rubber banding selection box is presented is reflected in the color and
length of the scale.
The interface reacts dynamically to changing the zone of interest by using the
Range Slider. The month at which the slider buttons are positioned is made
visible dynamically. Color in the Range Slider Selected Zone with the
appropriate scale color
A histogram of the data is presented.
The histogram bars that were not selected are now grayed
Users can select bars of interst by clicking on one and draging to the end of the
range of interest. The range slider and the scale update accordingly
Logarithamic Scales developed and used.
In the geographic selection users can
select an area using a rubber-banding box and the scale shows the
number of granules in this area. The bounding box is colored according to the amount of data in the area enclosed by the map.
Two types of tansparent interfaces were created
Transparent Grid - Opaque Map
Transparent Map - Opaque Grid
The transparent map interface was selected. Interface layout improvements were made.
Help Screens were developed for the interface:
Temporal Selection Help Screen
Geographical Selection Help Screen
Serevr Side Database Solutions
Solution 1: The Datacube Table.
Meta-data from the 80,000 granules of CZCS data was reduced to a simple cube.
For the two-parameter case the three dimensions of the cube can be considered
to represent latitude, longitude and time. For a distribution over ten years
at a monthly granularity, and a geographic cell size of five degree, the
data-cube contains 72 X 36 X1 20 = 311040 cells. For the implementation of
the Euler's formula a maximum of four values is required for each of the cell
faces. This can be thought of as four data cubes. The size of the data-cube
thus becomes about 40 MB. The size of the cube is independent of the size of
the data-set but does depend on the size of the "cells" for the parameters.
When the client makes a query, the server side program identifies the part of
the slice of the cube that was relevant to the query and used that to return
the arrays. The data-cube was pre-created by an independent program. Every
time a client applet made a preliminary inquiry by supplying a dataset name,
the datacube for that dataset was loaded. This loading process takes about
three minutes. Pre-loaded solutions were considered impractical because they
are not viable in situations where the tool is being used for previewing
multiple databases, and multiple cubes have to be preloaded. The loading time
and the size of this cube are too large. This solution will not scale up.
Therefore a different technique was tried and adopted.
Solution 2: The DataSet Index Table.
The data-cube was modified into a tabular data-format with an interface with
an ActiveX database. The three faces of the data-cube represent time,
latitude and longitude. The divisions or the cells along each of these
dimensions can be thought of as "bins" corresponding to that dimension.
The tabular format contains a list of all the bin IDS and the array values
associated with that. The data-table contains the following fields -
DsIndex - An identifier that is specific to the dataset. The table is
expected to contain data from more than one datasets and this ID number will
help separate granules from the dataset of interest.
TimeBin - The bin that corresponds to each time interval cell. (For instance
in the czcs case there are 120 TimeBins, twelve for each of the ten years).
TimeEdgeFlag - If this flag has the value of zero the subsequent array in
another cell of the table is the array of number of granules for each cell
and if the value of the flag is one the subsequent array is the array of
ParamBin - This is for future use to specify the value of the bin number of
the third parameter.
LatBin - The bin indetifying number of the latitude bin.
RowCounts - creating row bins also would have made the database too large ,
This hybrid solution was adopted. For every given timebin, parambin and latbin,
there is a row bin. The row bin has a string that can be parsed into a 288
integers - 4 arrays of 72 cells for the five degree case.
When a spatial - query is sent by the client, the temporal and spatial -
lat information is sent to the database as a query and the database returns a
set of row counts. These are parsed into four arrays and these arrays are
truncated using the spatial - longitude part of the query. When a temporal
query is sent in a similar process is followed, except that the timeEdgeFlag
is used to separate the two arrays.