CogBrz-Whiting-MC1

VAST Challenge 2019
Mini-Challenge 1

 

 

Team Members:

Mark Whiting, CognitiveBreeze, LLC, cognitivebreeze@gmail.com     PRIMARY


Student Team:  NO

 

Tools Used:

PapaParse, SmoothieCharts, JQuery, Tableau

 

Approximately how many hours were spent working on this submission in total?

< shrug >

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2019 is complete? Yes

 

Video

https://vimeo.com/346760204/b07a31ed6a

 

 

Questions

1 – Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit? Limit your response to 1000 words and 10 images.

 

The initial emergency response will probably be focused on the communities with the closest proximity to the epicenter of the earthquake, which would likely include Old Town (3), Safe Town (4), Wilson Forest (7), and possibly Cheddarford (13),  Easton (14) and Pepper Mill (12).  The responses would also likely be moderated by factors such as population, buildings, and infrastructure concerns.  Old Town would be of high priority due to water supply and power work, plus Safe Town due to the reactor.  There was not a population report on Wilson Forest and Pepper Mill, but they may be of less concern due to lower population and fewer structures.  Protecting the two downtown hospitals should also be a priority.  These assessments are derived from review of a lightly overlaid shake map on the neighborhoods, plus the writeup about the Island.

 

Text Box:

The nature of these reports makes the assessment of prioritization and response change rather difficult.  The values reported are not grounded in any baseline, so each rating is purely subjective.  It might have suited the island’s purpose to just have a single valued report, stating “Yes, we have a problem here”.  So instead of struggling with, for example, what a “2” report really means in this context, we looked at aggregation of reports to hopefully attain some consensus on potential problem areas on the island.  We also have a problem with small numbers of reports suggesting a big problem, as opposed to large numbers of report perhaps suggesting a less severe report.  For example, if 100 reports averaged a “4” from Cheddarford on roadway problems, how does that compare with 3 reports from Oak Willow stating a “10” on medical?  (These are issues for question 2 as well, but they need to be considered for a response to this question). 

To start, we look at sheer numbers of reports, to see what might be happening globally across the island.  This bar graph shows the report counts from across all of the neighborhoods, aggregated by hour.
   

Text Box:

Multi-neighborhood events appear to be occurring around 2PM on April 6, 8AM on April 8, and 4pm on April 9.  Each of these generated hundreds to thousands of records per hour.  Initially, for our hypothesis, Safe Town (4) and Old Town (3) have plenty of reports, however, Wilson Forest (7, with 61 at the Apr 8 peak) and Cheddarford (13, with 173 at the peak) have fewer reports than expected.  One surprise is the number of reports coming out of Scenic Vista (8, with 2580 at the Apr 8 peak).  This would suggest additional help may be required in that neighborhood. 

Other somewhat anomalous reporting occurs at 11PM on April 8 in Broadview (9, with 1895 reports), 4AM on April 9 in Chaparral (10, with 1375 reports), and spikes in Old Town (3) at 1 AM on April 9 (4490 reports) and 12N on April 10 (3905 reports).  These anomalous reports are difficult to determine a course of action for, as they do not taper off like the morning event on April 8 and the afternoon event on April 9.  A hypothesis about the events could be that the main earthquake occurred on April 8, an aftershock occurred on April 9, and a pre-cursor occurred on April 7.  Another hypothesis is that the 6 anomalous report spurts are delayed reporting due to problems with the system due to the earthquake.  Another hypothesis is that these were generated due to short-term events that were resolved within the hour reporting period. A reasonable action would be to check out both the areas with larger than expected numbers and the anomalous transients. 

However, these post-event graphs are not genuinely helpful as retrospection makes this task much easier.  If we were to look at reports as they streamed into an analysis center there would be even less confidence on reallocation of resources.  The Flowgraph below shows reports in a streaming format (although in faster than real time).  The colored lines represent each of the six variables for all of the reports (indicated on the graph).  The time below left shows the time of the report, and the count below that shows both the total count of records received and the number of records in the time interval (each 5-minute interval as the data is recorded).  The Flowgraph below is for Old Town only and starts near the beginning of the data.



 

At 14:40 (2pm), there is the significant shift in the reports which would be noticeable by human or automated monitors.  Initially reports appear random, but then followed by consistently 4 and below at this time.  And instead of being singleton reports as shown in the earlier times, the interval counts are slightly higher.  Shortly after this time, the reports return to their normal mixed values. It’s difficult to know what to do about this change – what accounts for this behavior in the data? 

Turning attention to the (hypothesized) major earthquake, we have many areas with numerous reports between 8-11 AM. 









 

Each of the above may indicate types of help needed by the various neighborhoods.  But these are really uncertain, due to the uncertainty in interpreting the data.  Broadview might need help in all areas.  Scenic Vista might need help in all areas except medical (the blue line stays at the bottom, so no reports are coming in for that area).  Easton may need help in everything except medical.  East Parton may need more help with power and less with water and sewer issues. 

The following static depictions show the anomalous reporting occurring surrounding the possible aftershock on April 9 (note the different times on each), with snapshots of the six neighborhoods at the end of the time interval where the majority of the reporting occurs (that is thousands of reports within a five-minute interval). The majority of the reporting is consistent across the interval, so you can see that, for example, Chapparal had high power problems, while Old Town had high problems with everything at noon on 4-10:

 

 

 

 

 

2 – Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response. Limit your response to 1000 words and 10 images.

 

There are several kinds of uncertainty to consider in this data.  First, as can be seen in the Flowgraphs in Question 1, the subjectivity of the reports makes some of the reporting very difficult to interpret.  There are no standards from which to base assessments, and there are no norms for individual reports (e.g., who is doing the reporting and what is their track record).   Second, sparsity and timeliness of reporting also makes assessment difficult.  Several of these aspects can be seen in the Wilson Forest Flowgraph:




First, only roads, power, and intensity reports are appearing.  Next, even during the hypothesized earthquake period on 4-8, very few reports are appearing.  Reports stop on the 8th at 17:45 and restart at 15:25. Then there is another big jump at 18:10 to 23:10, with a restart at 19:40.  For a streaming analysis, there may have been sufficient reporting at the 4-8 9AM hour to indicate problems with power and roads to authorities. 

The following shows all of the neighborhoods with all of the indicators shown as medians across hours of the day.  They are presented in a video format to save on page space.  Scrubbing the video will allow quick access to each of the neighborhoods overview. 




We interpret “reliability” for this question as the ability to consistently generate reports that may help assess the state of the neighborhood.  For this initial assessment, we will ignore volume of reports and just compare sheer ability to report over time.  Eyeballing the graphs, it appears the following have reasonable reporting over time:  Palace Hills (1), Southwest (5), Downtown (6), and Southton (16).  Others show gaps during event times or miss reports on a value type (particularly medical).

 

3 – How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.

 

Looking at Old Town as an exemplar, we can see the progression in events: 1) the odd event on 4-6 near 2PM with very low value reports coming through across all value types, 2) the big event on 4-8 at 8AM resulting in either reports vanishing (as shown here) or many value types showing high number reports, 3) a recovery on 4-9, followed by the event at about 3PM.  Many neighborhoods reporting drops off at this point again, some not recovering.  Then reporting becomes typical for several neighborhoods later on the 10th. 

 

Text Box:

 

 

4 –– The data for this challenge can be analyzed either as a static collection or as a dynamic stream of data, as it would occur in a real emergency. Describe how you analyzed the data - as a static collection or a stream. How do you think this choice affected your analysis? Limit your response to 200 words and 3 images.


Our approach to the data in MC1 was to use both static and streaming analyses. Retrospective analysis provides the best insight into the totality of the data, however, it does not adequately address the question of how one would reallocate resources as events proceeded over time. This can only be done through a streaming analysis using tools like the FlowGraphs, as if you were one of the decision makers on the ground. Line graphs like those in Question 2 provide a nice concise comprehensive depiction, but the Flowgraphs are more realistic for answering the questions.