Entry Name: "UofC-Bayes-MC2"

VAST Challenge 2019
Mini-Challenge 2

 

Team Members:

Haysn Hornbeck, University of Calgary, hhornbec@ucalgary.ca, PRIMARY
Usman Alim, University of Calgary, ualim@ucalgary.ca

Student Team: NO

Click to view at full resolution.

Figure 1 - The two visualizations we generated for Mini-Challenge 2.

The line chart at the top of the heat map visualizes the mean ctm for each grid square, as well as 16/84th percentiles and extremes. A subset can be selected by clicking and dragging, while the full view can be restored by double-clicking. More controls are available in the upper-right. The horizontal slider controls the current moment visualized along the timeline subset. The graph itself has tool-tips which provide additional information.

The line chart at the top of the sensor viewer behaves similarily to the heat map, and has the same controls. Outlier values are highlighted in red. The controls on the right control the sensor to be displayed, whether to display all the data or focus only on anomalies, and set a threshold that automatically marks values as anomalous. There is also an option to display the anomalies from all sensors, but be aware that this performs poorly. The scatter plot has tool-tips that provide additional information.

Tools Used:

Approximately how many hours were spent working on this submission in total? 250.

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2019 is complete? YES

Video: TBD

Website: http://uofc-bayes.ca


Questions

Your task, as supported by visual analytics that you apply, is to help St. Himark's emergency management team combine data from the government-operated stationary monitors with data from citizen-operated mobile sensors to help them better understand conditions in the city and identify likely locations that will require further monitoring, cleanup, or even evacuation. Will data from citizen scientists clarify the situation or make it more uncertain? Use visual analytics to develop responses to the questions below. Novel visualizations of uncertainty are especially interesting for this mini-challenge.

1 - Visualize radiation measurements over time from both static and mobile sensors to identify areas where radiation over background is detected. Characterize changes over time.

Two visualizations were used for this challenge (see Figure 1). The heatmap quantizes and consolidates all sensor readings into a grid with squares one kilometre a side, while the sensor view allows a single sensor's data to be examined. The former provides a broad overview of radiation levels, while the latter focuses on details. The location data was changed from latitude/longitude to Universal Transverse Mercator, which uses metres as the underlying unit.

The dataset can be divided in two, beginning at approximately 12:30pm on April 8th. Before then, there are only a few grid squares with a mean clicks-per-minute measurement above 45. There are numerous "hot spots" with a mean cpm of approximately 40, however. Bridges and the area around them seem to have elevated levels of radiation. Two notable hot spots are in Southwest and Scenic Vista, at approximate UTM coordinates (175500,12500) and (190500,3500) respectively.

After that time, there is a strong increase in radiation levels at the Always Safe plant, and grid squares two kilometres South of it and one kilometre Southwest. At approximately 5:15pm that day an abnormally large hot spot appears at UTM (186500,23500), sometimes with reading higher than those at the plant.

At 10:00am of April 9th, two more anomalous hot spots appear, the lesser at the North end of Jade Bridge and the more extreme along Wilson Forest Highway at UTM (198500,6500); for reference, radiation levels are dropping at the nuclear plant but remain extremely high at the anomaly near the South end of Jade Bridge. At 4:35pm that day, near UTM (191500,3500), another large hot spot arises in Scenic Vista. At 7:45pm, a second extreme anomaly appears along Wilson Forest Highway. Readings there are greater than at the South Jade Bridge anomaly, and both are much larger than at the Always Safe plant. Downtown at 9:15pm, the mean cpm at UTM (174500,14500) rises to 66, creating a new anomaly.

Figure 2 - General trends in radiation levels across the city.

April 10th sees the West Wilson Forest Highway anomaly reach a mean cpm of over 1000, by far the highest on the map. Both the South Jade Bridge and Scenic Vista anomalies have readings twice as large as the nuclear plant, though by the end of the day the former will drop to the same as the plant.

After the cutoff date, both Palace Hills and East Parton show a gradual increase in radiation levels; see Figure 2. The evidence for an overall increase is mixed; between 8:00am on April 6th and midnight on April 11th, the median radiation level for all grid squares increased a negligible amount, from 28 to 30.5, yet during the same span the 16th percentile jumped from 14.5 to 19.5 and the 84th percentile climbed from 37 to 42.

2 - Use visual analytics to represent and analyze uncertainty in the measurement of radiation across the city.

a. Compare uncertainty of the static sensors to the mobile sensors. What anomalies can you see? Are there sensors that are too uncertain to trust?

Several types of noise are present in the sensor data. Some sensors get "stuck" and repeat the same value rather than measure anything in the environment. All sensors also experience salt-and-pepper noise, where one sensor reading will be several orders of magnitude larger than readings immediately before or after. It was necessary to scrub both types of noise in order to analyze the underlying data. All sensors also exhibited Gaussian-like fluctuations in their readings, which are visible in Figure 3 as "fuzzy caterpillars".

Figure 3 - Gradual spikes, plateaus, and missing data. All examples are from mobile sensor #9.

Other anomalies were worth preserving. Several sensors show a "gradual spike", where sensor readings rise significantly higher than background levels over the course of approximately a minute. Sensors will exhibit "plateaus," where their radiation readings will jump from varying around one value to a different one. This is usually found in mobile sensors, though static sensor #11 also has this behaviour. Many sensors also experience missing data, excluding the scrubbed noise mentioned before. Figure 3 provides examples of all three of these anomalies.

Sensors vary in quality, in particular the static sensors have less variation than mobile ones, but no sensor's data was completely ignored. The heatmap uses a conjugate prior1, specifically the Gaussian inverse-gamma2. This helps smooth the underlying data and consolidate the readings from many different sensors. If sensor skew is equally likely to skew upwards as it is downwards, and the magnitudes of that skew are equivalent, the skew will tend to disappear when the readings of multiple sensors are combined. This limits the impact of poor quality sensors.

Conjugate priors assume the underlying distribution is constant, a poor assumption here, so we relaxed that by introducing exponential decay; the prior's weight is attenuated by

where is the time of the last observation that contributed to the prior, and the current time. was chosen so that prior values from twelve hours ago are weighted by . This is an approximation of the ideal approach, where the posterior3 is generated by weighting each prior datapoint individually, but for large datasets with steady packet rates the difference between the two apporoaches is negligible.

b. Which regions of the city have greater uncertainty of radiation measurement? Use visual analytics to explain your rationale.

Figure 4 - Hover labels, animated averages.

Uncertainty in the heatmap is presented in two ways. Hovering over a grid square will reveal a tool-tip that describes the UTM coordinates of its centroid, the mean reading as drawn from the posterior, as well as the standard error of the mean. That last value is calculated by determining the mean of the inverse-gamma posterior, then converting that variance to a standard error of the mean. Unfortunately, this discards all the certainty information in the inverse-gamma. To give a qualitative sampling of that, each grid square is animated by sampling a variance from the inverse-gamma, converting that to a standard error, sampling a mean from the Gaussian posterior, then converting that to a colour. As a result, grid squares will vary in colour over time according to their level of uncertainty, with more uncertainty corresponding to more variation. Figure 4 summarizes these methods.

According to those tools, a grid square located in the North of Old Town, at UTM (181500,24500), consistently shows high uncertainty. At 9am of April 9th, a grid square in the North of the Northwest district, UTM (175500,21500), exhibits high uncertainty due to a lack of sensor readings. Other examples become more plentiful after that point, such as the Eastern Wilson Forest anomaly, but for the majority of grid squares the standard error of the mean is small.

c. What effects do you see in the sensor readings after the earthquake and other major events? What effect do these events have on uncertainty?

There is no direct change in radiation levels due to earthquakes. It is plausible that the events may have changed traffic patterns, and thus the data collected by mobile sensors, but the data provides no strong evidence for that.

3 - Given the uncertainty you observed in question 2, are the radiation measurements reliable enough to locate areas of concern?

a. Highlight potential locations of contamination, including the locations of contaminated cars. Should St. Himark officials be worried about contaminated cars moving around the city?

Figure 5 - A simultaneous view of all anomalies, over the entire dataset.

The sensor display is capable of showing all anomalies simultaneously, as well as marking any reading above a specific value as an anomaly, as demonstrated in Figure 5. Many of these are false alarms, but some are due to mobile sensors driving near the Always Safe plant or a contaminated car passing near a sensor. The Scenic Vista anomaly can be isolated to UTM (191547,3690), an improvement over the heatmap's UTM (191500,3500). A new anomaly is revealed in Northwest, at 4:35pm on April 8th, located at UTM (176314,21316), and its shape suggests it is a contaminated car. It reappears with less intensity on April 9th and April 10th at 6:52am. A partial list of other potential anomalies follows.

Location (UTM)TimeSensorType
Safe Town (189733,17439) April 8th, 1:34pm Mobile #9Plateau
Easton (181278,16957) April 8th, 1:49pm Mobile #8Gradual Spike
Old Town (186573,22978) April 8th, 4:39pm Static #12Gradual Spike
Old Town (186573,22978) April 8th, 4:52pm Static #12Gradual Spike
Old Town (186573,22978) April 8th, 5:03pm Static #12Gradual Spike
Old Town (184214,21551) April 8th, 5:07pm Mobile #8Gradual Spike
Old town (185427,21925) April 8th, 9:42pm Mobile #8Gradual Spike
Old Town (179826,20011) April 8th, 10:23pm Mobile #9Gradual Spike
Downtown (177105,15770) April 9th, 6:33am Mobile #19Gradual Spike
Southton (178317,15052) April 9th, 6:38am Mobile #19Gradual Spike
Downtown (177040,14658) April 9th, 6:46am Mobile #19Gradual Spike
Downtown (177041,15711) April 9th, 6:49am Mobile #19Gradual Spike
Easton (181250,15052) April 9th, 6:49am Mobile #19Gradual Spike
West Parton (180316,12388) April 9th, 6:53am Mobile #4Gradual Spike
West Parton (180314,12384) April 9th, 6:53am Mobile #31Gradual Spike
Jade Bridge (187187,25113) April 9th, 9:49am Mobile #46Gradual Spike
Downtown (174326,14781) April 9th, 6:14pm Mobile #2Gradual Spike
Wilson Forest (196349,6065) April 9th, 9:02pm Mobile #24Plateau
West Parton (180316,12388) April 10th, 6:56am Mobile #4Gradual Spike
West Parton (180301,12376) April 10th, 6:56am Mobile #31Gradual Spike
Old Town (186573,22978) April 10th, 7:49pm Static #12Gradual Spike
Old Town (186573,22978) April 10th, 8:51pm Static #12Gradual Spike

The anomalies outlined above are compatible with car contamination, and as such officials should be worried.

b. Estimate how many cars may have been contaminated when coolant leaked from the Always Safe plant. Use visual analysis of radiation measurements to determine if any have left the area.

Figure 6 - The anomalies of static sensor #15, likely evidence of contaminated cars leaving the nuclear plant.

Static sensor #15 is located right near the entrance to the Always Safe nuclear plant. According to Figure 6, anywhere from 13 to 15 spikes occurred between 4:10 and 4:43pm on April 8th, all of which are consistent with a contaminated car leaving the plant. Numerous gradual spikes are present in the data, most of which occur after noon of April 8th, consistent with contaminated cars traveling well outside the nuclear plant grounds. The anomalies along Jade Bridge and Wilson Forest suggest there is a non-trivial chance at least one of these cars has left the city.

c. Indicated where you would deploy more sensors to improve radiation monitoring in the city. Would you recommend more static sensors or more mobile sensors or both? Use your visualization of radiation measurement uncertainty to justify your recommendation.

Figure 2 reveals some gaps in sensor coverage. The largest three are at the South ends of Cheddarford and Broadview, and near the 12th of July bridge, all of which could easily be remedied by adding a static sensor. Adding sensors to bridges would make it easier to determine if any contamination left the city.

4 - Summarize the state of radiation measurements at the end of the available period. Use your novel visualizations and analysis approaches to suggest a course of action for the city. Use visual analytics to compare the static sensor network to the mobile sensor network. What are the strengths and weaknesses of each approach? How do they support each other?

The heatmap is remarkably complete by the end of the dataset. Some of the data could be considered stale, for instance the last measurement near UTM (171500,19500), in Palace Hills, was 3.3 days before the end of the dataset. At the same time, it is also unlikely that local radiation levels have changed much in that time period, so a new measurement would likely agree with the old. More sensors would help fill in these gaps, but those should not be a short-term priority.

Instead, the first action for the city is to ensure the original spill is cleaned. As of the end of the data, several square kilometres to the South and Southwest of the nuclear plant show elevated radiation readings. The second is to track down every car that could have plausibly contaminated during the accident and ensure it is clean. Anyone in or near cars found to have contamination needs medical treatment, as they likely absorbed a significant dose of radiation. Finally, areas those cars have been in need to be checked to ensure they too are free of contamination.

Longer term, more sensors should be added in the places outlined in the prior question. Sensors near the nuclear plant need to be tied to the entry system, so that contaminated items cannot leave the plant. Some areas of the city show elevated levels of radiation, so a team should sweep those to ensure that is not due to stationary contamination. The city should ensure it has enough funds to calibrate and maintain the sensors it has, as this would help minimize noise issues and reduce the number of blind spots due to stuck sensors.

5 - The data for this challenge can be analyzed either as a static collection or as a dynamic stream of data, as it would occur in a real emergency. Describe how you analyzed the data - as a static collection or a stream. How do you think this choice affected your analysis?

Our visualization for this challenge is less suited to dynamic streaming as the one for challenge 1, as the noise removal algorithms use a window that looks ahead of the current time. Nonetheless, the algorithm can easily be modified to either use a trailing window, or be permitted to alter the designation of data as noise or signal after the initial pass.

With that change in place, the only obstacle to real-time usage is the size of the dataset. This can be eased by archiving data that's more than a few hours old. Our visualization used a frequentist metric combined with conjugate priors to keep the computational cost to a minimum. The latter has already been outlined, and the former is simply a modified Z-score,

where is the sensor reading at time , is a window of observations centred at time and of width , and is the standard deviation of . The median is used instead of the mean to mitigate the effect of outliers.


Citations

[1] Fink, Daniel. “A Compendium of Conjugate Priors.” Pdf, 1997, 46.

[2] Murphy, Kevin P. "Conjugate Bayesian analysis of the Gaussian distribution." def 1.2σ2 (2007): 16.

[3] Gelman, Andrew, and Cosma Rohilla Shalizi. "Philosophy and the practice of Bayesian statistics." British Journal of Mathematical and Statistical Psychology 66.1 (2013): 8-38.