Entry Name: "UofC-Bayes-MC1"

VAST Challenge 2019
Mini-Challenge 1

 

Team Members:

Haysn Hornbeck, University of Calgary, hhornbec@ucalgary.ca, PRIMARY
Usman Alim, University of Calgary, ualim@ucalgary.ca

Student Team: NO

Figure 1 - Our full visualization of the dataset for Mini-Challenge 1.

Districts can be selected on the map by clicking them, or by using checkboxes (these were cropped out of the bottom of the image). The line chart represents the best guess for the current message rate in the selected districts, with light blue bands representing 16/84 credible intervals. The gray area represents the likelihood that the current packet rate estimate is above or below the nominal rate. A subset of time can be selected by clicking and dragging on the desired range, and the full view restored by double-clicking. The Raw Feedback section charts the reports from RUMBLE for the selected districts and over the given timeframe. The Relative Repair Focus section attempts to prioritize the emergency response, again over the selected districts and timeframe. The metric used to assign those priorities can be changed via the radio buttons above the graphs. All charts have additional controls hidden in their upper-right, except for the map.

Tools Used:

Approximately how many hours were spent working on this submission in total? 250.

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2019 is complete? YES

Video: TBD

Website: http://uofc-bayes.ca


Questions

1 - Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit?

Figure 2 - Our visualization of the messages being sent per minute, for all districts. Four events in three clusters are visible.

Figure 2 reveals three event clusters in the dataset. The first was a pair of small shocks at approximately 2:30pm on April 6th. Damage was minimal, and primarily concentrated in buildings.

We developed several metrics to assess repair priorities. The most successful of these is

where is the district, is the Dirichlet1 hyperparameter for response , is , and is a scalar parameter. We found conformed best with intuition.

Excluding high-uncertainty districts and focusing on the twenty-four hour window starting minutes before the event, the above metric suggests prioritizing the following districts in the aftermath of that event:

InfrastructureFocus Districts
Sewer / Water Oak Willow, Palace Hills, Northwest, Weston, Easton
Power Oak Willow, Terrapin Springs, Palace Hills, Downtown, Pepper Mill
Roads / Bridges Oak Willow, Palace Hills, Southton, Weston, Downtown
Medical Palace Hills, Southwest, Southton, Downtown, Old Town
Buildings Palace Hills, Northwest, Oak Willow, Weston, Downtown, Pepper Mill

Figure 3 - The data from Terrapin Springs, starting shortly before the main earthquake and extending to the end of the dataset.

The second event was the primary earthquake, at approximately 8:35am on April 8th. Nearly all infrastructure suffered some level of damage, most notably the power systems. Four spikes in message rate occurring well after the event are due to delayed messages being delivered all at once, and signal the time four districts regained communication. The outage in Terrapin Springs was relatively short, as shown in Figure 3, so it was masked by the flood of citizen reports in other districts.

DistrictOutage Window
Chapparal April 8th, 8:35am - April 9th, 4:40am
Old Town April 8th, 9:15am - April 9th, 1:00am
Terrapin Springs April 8th, 9:25am - April 8th, 11:30am
Broadview April 8th, 12:00pm - April 9th, 11:45pm
Scenic Vista April 8th, 12:10pm - April 9th, 9:15am

Figure 4 - The relative repair focus for all districts, between April 8th at 8:00am and April 9th at 9:50am.

The following table is based on Figure 4, our visualization of infrastructure priorities that begins just before this event and ends just before the next. It excludes districts with large uncertainties.

InfrastructureFocus Districts
Sewer / Water Old Town, Broadview, Scenic Vista, Palace Hills, Terrapin Springs
Power Old Town, Terrapin Springs, Chapparal, Scenic Vista, Wilson Forest
Roads / Bridges Scenic Vista, Old Town, Broadview, Easton, Chapparal
Medical Old Town, Palace Hills, Downtown, Broadview, Southwest
Buildings Palace Hills, Chapparal, Broadview, Downtown, Old Town

Figure 5 - Reports for the third event cluster from Easton.

The third event was an aftershock of nearly the same magnitude as the original quake, at approximately 3:00pm on April 9th. There's evidence of catastrophic failure within the sewer, power, and road systems. Two post-event message spikes are visible, indicating more communication outages. All these happened hours after the event, however, so the majority of reports had already been sent. Easton seems to have suffered a partial loss, with Figure 5 showing that at least one report got through during the window.

DistrictOutage Window
Scenic Vista April 9th, 5:25pm - April 10th, 2:30am
Old Town April 9th, 6:10pm - April 10th, 12:00pm
Easton April 9th, 10:50pm - April 10th, 8:00am
Safe Town April 10th, 4:50am - April 10th, 3:05pm
Oak Willow April 10th, 9:25am - April 10th, 7:00pm
Pepper Mill April 10th, 9:55am - April 10th, 7:05pm

The outages on the 10th may be due to a failure originating in Safe Town, with little-to-no connection to the earthquake. Excluding high-uncertainty districts, our metric suggests prioritizing the following districts in the aftermath of the earthquake and secondary event.

InfrastructureFocus Districts
Sewer / Water Old Town, Scenic Vista, Broadview, Terrapin Springs, Chapparal
Power Old Town, Scenic Vista, Chapparal, Palace Hills, Southwest
Roads / Bridges Old Town, Scenic Vista, Chapparal, Oak Willow, Palace Hills
Medical Old Town, Downtown, Southwest, Broadview, Southton
Buildings Scenic Vista, Old Town, Palace Hills, Chapparal, Broadview, Downtown
=====

2 - Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports?

Figure 6 - All reports from Wilson Forest.

Some city districts are sparsely populated, such as Cheddarford and Wilson Forest. As Figure 6 shows, the normal message rate for Wilson Forest is a few per day; compare this with Figure 1, which summarizes all districts.

The level of uncertainty is indicated in several ways. Each feedback graph has error bars associated with it, thanks to the underlying Dirichlet distribution1; its entry in the Relative Repair Focus box-and-whisker plot has wide error bars, as seen in Figure 4; the message rate graph is dominated by spikes followed by exponential decay, as well as a wide credence band; and the colour of the district on the map has been desaturated. Nonetheless, there is good feedback for power systems and road infrastructure, so this district cannot be ignored entirely.

Figure 7 - The raw feedback for Palace Hills, spanning twelve hours and beginning at the main earthquake.

While the feedback from most districts is unimodal, there are exceptions such as Cheddarford, Southton, and Safe Town. Palace Hills is the most extreme of these; during the primary earthquake, the feedback is bimodal for all infrastructure types as Figure 7 shows. The divergence is easy to spot in the feedback graphs. While this could be a sign of dirty data, perhaps via a few people entering inaccurate information into RUMBLE, the robust Gaussian behaviour of each peak suggests instead that Palace Hills has heterogeneous infrastructure.

The state of medical infrastructure is reported inconsistently across districts. Districts without a hospital have almost no reporting on the state of their medical facilities. The "About Our City" document only mentions hospitals, so the mostly likely hypothesis is that there are no clinics or other medical infrastructure. This also means medical facility reports from those areas are likely bad data. Rather than filter them out, the wide error bars of both the Raw Feedback and Repair Focus graphs should flag these districts. Figure 4 demonstrates what this looks like in practice.

=====

3 - How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see.

Figure 8 - Raw feedback at the very beginning and end of the dataset.

The citizens of Saint Himark only appear to use the phone application to file reports during a disaster, as demonstrated by Figure 8, so there is no reliable information on the progress of repairs. However, this also makes isolating each event for analysis much easier.

Figure 9 - Raw feedback from each of the three event clusters, ordered chronologically from left to right.

Figure 9 reproduces the raw feedback from each event cluster. The first cluster has an unusual number of extreme damage reports, in light of the feedback on the severity of the earthquake. This is more likely to be unreliable data than legitimate reports, as the same pattern occurs in Figure 8. Examining each district individually shows the same pattern across all of them, for all types of infrastructure.

Between the second and third event clusters, there is also an uptick in reports of extreme damage, while the severity of the earthquake has slightly decreased. Since the increased reports primarily apply to sewer, power, and transportation infrastructure; only a day has passed since the primary shock; and a significant majority of the extreme reports originate from Old Town and Scenic Vista, this increase is more likely to be legitimate than unreliable data.

The uncertainty of the metrics behave as expected. Narrowing the time interval results in less credence in the data presented, and vice versa. Districts with fewer messages have larger error bars than those with more. The loss of communication does cause an increase in uncertainty, but the drop in message frequency on the timeline makes those scenarios easy to recognize.

=====

4 - The data for this challenge can be analyzed either as a static collection or as a dynamic stream of data, as it would occur in a real emergency. Describe how you analyzed the data - as a static collection or a stream. How do you think this choice affected your analysis?

While the visualization treated the dataset as static, the code was designed to be easily adapted to real-time analysis. All algorithms are only allowed to access past data, to remain compatible with real-time operation. Conjugate priors2 are used where possible, via the Dirichlet and Gamma distributions. Preprocessing of the dataset is limited to calculating the nominal message rate, and generating a cumulative total of reports at each moment of time. The totals allows for performance when totaling reports over a window.

Determining the likelihood that the number of packets is above or below the nominal rate, as well as calculating the repair metric focus, are both done via statistical sampling as methods involving integrals took much too long and did not result in better quality. Caching processed data was also used, to minimize the amount of recalculation. Plotly allows for data streams, so the charts could be converted to update in real-time.


Citations

[1] Frigyik, Bela A., et al. “Introduction to the Dirichlet Distribution and Related Processes.” Department of Electrical Engineering, University of Washignton, UWEETR-2010-0006, no. 0006, 2010, pp. 1–27.

[2] Fink, Daniel. “A Compendium of Conjugate Priors.” Pdf, 1997, 46.