CogBrz-Whiting-MC1
VAST Challenge 2019
Mini-Challenge 1
Team
Members:
Mark Whiting, CognitiveBreeze,
LLC, cognitivebreeze@gmail.com PRIMARY
Student Team: NO
Tools
Used:
PapaParse, SmoothieCharts, JQuery, Tableau
Approximately how many hours were spent working
on this submission in total?
< shrug >
May we post your submission in the Visual
Analytics Benchmark Repository after VAST Challenge 2019 is complete? Yes
Video
https://vimeo.com/346760204/b07a31ed6a
Questions
1 – Emergency responders will base their initial response on the
earthquake shake map. Use visual analytics to determine how their response
should change based on damage reports from citizens on the ground. How would
you prioritize neighborhoods for response? Which parts of the city are hardest
hit? Limit your response to 1000 words and 10 images.
The initial emergency response will
probably be focused on the communities with the closest proximity to the
epicenter of the earthquake, which would likely include Old Town (3), Safe Town
(4), Wilson Forest (7), and possibly Cheddarford (13), Easton (14) and Pepper Mill (12). The responses would also likely be moderated
by factors such as population, buildings, and infrastructure concerns. Old Town would be of high priority due to
water supply and power work, plus Safe Town due to the reactor. There was not a population report on Wilson
Forest and Pepper Mill, but they may be of less concern due to lower population
and fewer structures. Protecting the two
downtown hospitals should also be a priority.
These assessments are derived from review of a lightly overlaid shake
map on the neighborhoods, plus the writeup about the Island.

The nature of these reports makes the assessment of
prioritization and response change rather difficult. The values reported are not grounded in any
baseline, so each rating is purely subjective.
It might have suited the island’s purpose to just have a single valued
report, stating “Yes, we have a problem here”.
So instead of struggling with, for example, what a “2” report really
means in this context, we looked at aggregation of reports to hopefully attain
some consensus on potential problem areas on the island. We also have a problem with small numbers of
reports suggesting a big problem, as opposed to large numbers of report perhaps
suggesting a less severe report. For
example, if 100 reports averaged a “4” from Cheddarford on roadway problems,
how does that compare with 3 reports from Oak Willow stating a “10” on
medical? (These are issues for question
2 as well, but they need to be considered for a response to this
question).
To start, we look at sheer numbers
of reports, to see what might be happening globally across the island. This bar graph shows the report counts from
across all of the neighborhoods, aggregated by hour.

Multi-neighborhood events appear to
be occurring around 2PM on April 6, 8AM on April 8, and 4pm on April 9. Each of these generated hundreds to thousands
of records per hour. Initially, for our
hypothesis, Safe Town (4) and Old Town (3) have plenty of reports, however,
Wilson Forest (7, with 61 at the Apr 8 peak) and Cheddarford (13, with 173 at
the peak) have fewer reports than expected.
One surprise is the number of reports coming out of Scenic Vista (8,
with 2580 at the Apr 8 peak). This would
suggest additional help may be required in that neighborhood.
Other somewhat anomalous reporting
occurs at 11PM on April 8 in Broadview (9, with 1895 reports), 4AM on April 9
in Chaparral (10, with 1375 reports), and spikes in Old Town (3) at 1 AM on
April 9 (4490 reports) and 12N on April 10 (3905 reports). These anomalous reports are difficult to
determine a course of action for, as they do not taper off like the morning
event on April 8 and the afternoon event on April 9. A hypothesis about the events could be that
the main earthquake occurred on April 8, an aftershock occurred on April 9, and
a pre-cursor occurred on April 7.
Another hypothesis is that the 6 anomalous report spurts are delayed
reporting due to problems with the system due to the earthquake. Another hypothesis is that these were
generated due to short-term events that were resolved within the hour reporting
period. A reasonable action would be to check out both the areas with larger
than expected numbers and the anomalous transients.
However, these post-event graphs
are not genuinely helpful as retrospection makes this task much easier. If we were to look at reports as they
streamed into an analysis center there would be even less confidence on
reallocation of resources. The Flowgraph
below shows reports in a streaming format (although in faster than real
time). The colored lines represent each
of the six variables for all of the reports (indicated on the graph). The time below left shows the time of the
report, and the count below that shows both the total count of records received
and the number of records in the time interval (each 5-minute interval as the
data is recorded). The Flowgraph below
is for Old Town only and starts near the beginning of the data.
At 14:40 (2pm), there is the
significant shift in the reports which would be noticeable by human or
automated monitors. Initially reports
appear random, but then followed by consistently 4 and below at this time. And instead of being singleton reports as
shown in the earlier times, the interval counts are slightly higher. Shortly after this time, the reports return
to their normal mixed values. It’s difficult to know what to do about this
change – what accounts for this behavior in the data?
Turning attention to the (hypothesized)
major earthquake, we have many areas with numerous reports between 8-11
AM.
Each of the above may indicate
types of help needed by the various neighborhoods. But these are really uncertain, due to the
uncertainty in interpreting the data.
Broadview might need help in all areas.
Scenic Vista might need help in all areas except medical (the blue line
stays at the bottom, so no reports are coming in for that area). Easton may need help in everything except
medical. East Parton may need more help
with power and less with water and sewer issues.
The following static
depictions show the anomalous reporting occurring surrounding the possible
aftershock on April 9 (note the different times on each), with snapshots of the
six neighborhoods at the end of the time interval where the majority of the
reporting occurs (that is thousands of reports within a five-minute interval).
The majority of the reporting is consistent across the interval, so you can see
that, for example, Chapparal had high power problems,
while Old Town had high problems with everything at noon on 4-10:

2 – Use visual analytics to show uncertainty in the data. Compare
the reliability of neighborhood reports. Which neighborhoods are providing
reliable reports? Provide a rationale for your response. Limit your response to
1000 words and 10 images.
There are several kinds of
uncertainty to consider in this data.
First, as can be seen in the Flowgraphs in Question 1, the subjectivity
of the reports makes some of the reporting very difficult to interpret. There are no standards from which to base
assessments, and there are no norms for individual reports (e.g., who is doing
the reporting and what is their track record).
Second, sparsity and timeliness of reporting also makes assessment
difficult. Several of these aspects can
be seen in the Wilson Forest Flowgraph:
First, only roads, power, and
intensity reports are appearing. Next,
even during the hypothesized earthquake period on 4-8, very few reports are
appearing. Reports stop on the 8th
at 17:45 and restart at 15:25. Then there is another big jump at 18:10 to
23:10, with a restart at 19:40. For a
streaming analysis, there may have been sufficient reporting at the 4-8 9AM
hour to indicate problems with power and roads to authorities.
The following shows all of the
neighborhoods with all of the indicators shown as medians across hours of the
day. They are presented in a video
format to save on page space. Scrubbing
the video will allow quick access to each of the neighborhoods overview.
We interpret “reliability” for this
question as the ability to consistently generate reports that may help assess
the state of the neighborhood. For this
initial assessment, we will ignore volume of reports and just compare sheer
ability to report over time. Eyeballing
the graphs, it appears the following have reasonable reporting over time: Palace Hills (1), Southwest (5), Downtown
(6), and Southton (16). Others show gaps during event times or miss
reports on a value type (particularly medical).
3 – How do conditions change over time? How does uncertainty in
change over time? Describe the key changes you see. Limit your response to 500
words and 8 images.
Looking at Old Town as an exemplar,
we can see the progression in events: 1) the odd event on 4-6 near 2PM with
very low value reports coming through across all value types, 2) the big event
on 4-8 at 8AM resulting in either reports vanishing (as shown here) or many
value types showing high number reports, 3) a recovery on 4-9, followed by the
event at about 3PM. Many neighborhoods
reporting drops off at this point again, some not recovering. Then reporting becomes typical for several
neighborhoods later on the 10th.

4 –– The data for this
challenge can be analyzed either as a static collection or as a dynamic stream
of data, as it would occur in a real emergency. Describe how you analyzed the
data - as a static collection or a stream. How do you think this choice
affected your analysis? Limit your response to 200 words and 3 images.
Our approach to the data in MC1 was to use both
static and streaming analyses. Retrospective analysis provides the best insight into the totality of
the data, however, it does not adequately address the question of how one would reallocate resources
as events proceeded over time. This can only be done through a streaming analysis using tools like the
FlowGraphs, as if you were one of the decision makers on the ground. Line graphs like those in Question 2
provide a nice concise comprehensive depiction, but the Flowgraphs are more realistic for answering the
questions.