Haysn Hornbeck, University of Calgary, hhornbec@ucalgary.ca, PRIMARY
Usman Alim, University of Calgary, ualim@ucalgary.ca
Student Team: NO
Figure 1 - Our full visualization of the dataset for Mini-Challenge 1.
Districts can be selected on the map by clicking them, or by using checkboxes (these were cropped out of the bottom of the image). The line chart represents the best guess for the current message rate in the selected districts, with light blue bands representing 16/84 credible intervals. The gray area represents the likelihood that the current packet rate estimate is above or below the nominal rate. A subset of time can be selected by clicking and dragging on the desired range, and the full view restored by double-clicking. The Raw Feedback section charts the reports from RUMBLE for the selected districts and over the given timeframe. The Relative Repair Focus section attempts to prioritize the emergency response, again over the selected districts and timeframe. The metric used to assign those priorities can be changed via the radio buttons above the graphs. All charts have additional controls hidden in their upper-right, except for the map.
Approximately how many hours were spent working on this submission in total? 250.
May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2019 is complete? YES
Video: TBD
Website: http://uofc-bayes.ca
1 - Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit?
Figure 2 - Our visualization of the messages being sent per minute, for all districts. Four events in three clusters are visible.
Figure 2 reveals three event clusters in the dataset. The first was a pair of small shocks at approximately 2:30pm on April 6th. Damage was minimal, and primarily concentrated in buildings.
We developed several metrics to assess repair priorities. The most successful of these is
where is the district, is the Dirichlet1 hyperparameter for response , is , and is a scalar parameter. We found conformed best with intuition.
Excluding high-uncertainty districts and focusing on the twenty-four hour window starting minutes before the event, the above metric suggests prioritizing the following districts in the aftermath of that event:
| Infrastructure | Focus Districts |
|---|---|
| Sewer / Water | Oak Willow, Palace Hills, Northwest, Weston, Easton |
| Power | Oak Willow, Terrapin Springs, Palace Hills, Downtown, Pepper Mill |
| Roads / Bridges | Oak Willow, Palace Hills, Southton, Weston, Downtown |
| Medical | Palace Hills, Southwest, Southton, Downtown, Old Town |
| Buildings | Palace Hills, Northwest, Oak Willow, Weston, Downtown, Pepper Mill |
Figure 3 - The data from Terrapin Springs, starting shortly before the main earthquake and extending to the end of the dataset.
The second event was the primary earthquake, at approximately 8:35am on April 8th. Nearly all infrastructure suffered some level of damage, most notably the power systems. Four spikes in message rate occurring well after the event are due to delayed messages being delivered all at once, and signal the time four districts regained communication. The outage in Terrapin Springs was relatively short, as shown in Figure 3, so it was masked by the flood of citizen reports in other districts.
| District | Outage Window |
|---|---|
| Chapparal | April 8th, 8:35am - April 9th, 4:40am |
| Old Town | April 8th, 9:15am - April 9th, 1:00am |
| Terrapin Springs | April 8th, 9:25am - April 8th, 11:30am |
| Broadview | April 8th, 12:00pm - April 9th, 11:45pm |
| Scenic Vista | April 8th, 12:10pm - April 9th, 9:15am |
Figure 4 - The relative repair focus for all districts, between April 8th at 8:00am and April 9th at 9:50am.
The following table is based on Figure 4, our visualization of infrastructure priorities that begins just before this event and ends just before the next. It excludes districts with large uncertainties.
| Infrastructure | Focus Districts |
|---|---|
| Sewer / Water | Old Town, Broadview, Scenic Vista, Palace Hills, Terrapin Springs |
| Power | Old Town, Terrapin Springs, Chapparal, Scenic Vista, Wilson Forest |
| Roads / Bridges | Scenic Vista, Old Town, Broadview, Easton, Chapparal |
| Medical | Old Town, Palace Hills, Downtown, Broadview, Southwest |
| Buildings | Palace Hills, Chapparal, Broadview, Downtown, Old Town |
Figure 5 - Reports for the third event cluster from Easton.
The third event was an aftershock of nearly the same magnitude as the original quake, at approximately 3:00pm on April 9th. There's evidence of catastrophic failure within the sewer, power, and road systems. Two post-event message spikes are visible, indicating more communication outages. All these happened hours after the event, however, so the majority of reports had already been sent. Easton seems to have suffered a partial loss, with Figure 5 showing that at least one report got through during the window.
| District | Outage Window |
|---|---|
| Scenic Vista | April 9th, 5:25pm - April 10th, 2:30am |
| Old Town | April 9th, 6:10pm - April 10th, 12:00pm |
| Easton | April 9th, 10:50pm - April 10th, 8:00am |
| Safe Town | April 10th, 4:50am - April 10th, 3:05pm |
| Oak Willow | April 10th, 9:25am - April 10th, 7:00pm |
| Pepper Mill | April 10th, 9:55am - April 10th, 7:05pm |
The outages on the 10th may be due to a failure originating in Safe Town, with little-to-no connection to the earthquake. Excluding high-uncertainty districts, our metric suggests prioritizing the following districts in the aftermath of the earthquake and secondary event.
| Infrastructure | Focus Districts |
|---|---|
| Sewer / Water | Old Town, Scenic Vista, Broadview, Terrapin Springs, Chapparal |
| Power | Old Town, Scenic Vista, Chapparal, Palace Hills, Southwest |
| Roads / Bridges | Old Town, Scenic Vista, Chapparal, Oak Willow, Palace Hills |
| Medical | Old Town, Downtown, Southwest, Broadview, Southton |
| Buildings | Scenic Vista, Old Town, Palace Hills, Chapparal, Broadview, Downtown |
2 - Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports?
Figure 6 - All reports from Wilson Forest.
Some city districts are sparsely populated, such as Cheddarford and Wilson Forest. As Figure 6 shows, the normal message rate for Wilson Forest is a few per day; compare this with Figure 1, which summarizes all districts.
The level of uncertainty is indicated in several ways. Each feedback graph has error bars associated with it, thanks to the underlying Dirichlet distribution1; its entry in the Relative Repair Focus box-and-whisker plot has wide error bars, as seen in Figure 4; the message rate graph is dominated by spikes followed by exponential decay, as well as a wide credence band; and the colour of the district on the map has been desaturated. Nonetheless, there is good feedback for power systems and road infrastructure, so this district cannot be ignored entirely.
Figure 7 - The raw feedback for Palace Hills, spanning twelve hours and beginning at the main earthquake.
While the feedback from most districts is unimodal, there are exceptions such as Cheddarford, Southton, and Safe Town. Palace Hills is the most extreme of these; during the primary earthquake, the feedback is bimodal for all infrastructure types as Figure 7 shows. The divergence is easy to spot in the feedback graphs. While this could be a sign of dirty data, perhaps via a few people entering inaccurate information into RUMBLE, the robust Gaussian behaviour of each peak suggests instead that Palace Hills has heterogeneous infrastructure.
The state of medical infrastructure is reported inconsistently across districts. Districts without a hospital have almost no reporting on the state of their medical facilities. The "About Our City" document only mentions hospitals, so the mostly likely hypothesis is that there are no clinics or other medical infrastructure. This also means medical facility reports from those areas are likely bad data. Rather than filter them out, the wide error bars of both the Raw Feedback and Repair Focus graphs should flag these districts. Figure 4 demonstrates what this looks like in practice.
3 - How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see.
The citizens of Saint Himark only appear to use the phone application to file reports during a disaster, as demonstrated by Figure 8, so there is no reliable information on the progress of repairs. However, this also makes isolating each event for analysis much easier.
Figure 9 - Raw feedback from each of the three event clusters, ordered chronologically from left to right.
Figure 9 reproduces the raw feedback from each event cluster. The first cluster has an unusual number of extreme damage reports, in light of the feedback on the severity of the earthquake. This is more likely to be unreliable data than legitimate reports, as the same pattern occurs in Figure 8. Examining each district individually shows the same pattern across all of them, for all types of infrastructure.
Between the second and third event clusters, there is also an uptick in reports of extreme damage, while the severity of the earthquake has slightly decreased. Since the increased reports primarily apply to sewer, power, and transportation infrastructure; only a day has passed since the primary shock; and a significant majority of the extreme reports originate from Old Town and Scenic Vista, this increase is more likely to be legitimate than unreliable data.
The uncertainty of the metrics behave as expected. Narrowing the time interval results in less credence in the data presented, and vice versa. Districts with fewer messages have larger error bars than those with more. The loss of communication does cause an increase in uncertainty, but the drop in message frequency on the timeline makes those scenarios easy to recognize.
4 - The data for this challenge can be analyzed either as a static collection or as a dynamic stream of data, as it would occur in a real emergency. Describe how you analyzed the data - as a static collection or a stream. How do you think this choice affected your analysis?
While the visualization treated the dataset as static, the code was designed to be easily adapted to real-time analysis. All algorithms are only allowed to access past data, to remain compatible with real-time operation. Conjugate priors2 are used where possible, via the Dirichlet and Gamma distributions. Preprocessing of the dataset is limited to calculating the nominal message rate, and generating a cumulative total of reports at each moment of time. The totals allows for performance when totaling reports over a window.
Determining the likelihood that the number of packets is above or below the nominal rate, as well as calculating the repair metric focus, are both done via statistical sampling as methods involving integrals took much too long and did not result in better quality. Caching processed data was also used, to minimize the amount of recalculation. Plotly allows for data streams, so the charts could be converted to update in real-time.
[1] Frigyik, Bela A., et al. “Introduction to the Dirichlet Distribution and Related Processes.” Department of Electrical Engineering, University of Washignton, UWEETR-2010-0006, no. 0006, 2010, pp. 1–27.
[2] Fink, Daniel. “A Compendium of Conjugate Priors.” Pdf, 1997, 46.