Entry Name:  TJU-Jia-MC1

VAST Challenge 2019
Mini-Challenge 1

 

 

Team Members:

Shichao Jia, Tianjin University, jsc_se@tju.edu.cn     PRIMARY
Jiaqi Wang, Tianjin University, qimelbourne@gmail.com 

Zeyu Li, Tianjin University, lzytianda@tju.edu.cn

Jiawan Zhang, Tianjin University, jwzhang@tju.edu.cn   SUPERVISOR



Student Team:  YES

 

Tools Used:

Python

D3.js

 

Approximately how many hours were spent working on this submission in total?

About 150 hours ( 30 days, and 5 hours/day)

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2019 is complete?

YES

 

Video

https://youtu.be/l7HYLCFERDo  

 

 

Questions

1Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit? Limit your response to 1000 words and 10 images.

We use the heatmap to summarize the multivariate time series. Each cell stands for the uncertainty or the mean value of one dimension data in each hour. Users can select any range of time in the line chart to filter data. Then the rows of the heatmap will be reordered based on the overall severity. We use the following metric to measure the total damage:

In which  stands for one dimension (shake_intensity, swer_and_water, power, roads_and_bridges, medical or buildings),  is the mean value of each cell,  is the selected time range of and  is the set cardinality.

The color of hexagons on the map encodes the severity measurement. Hexagons align with the heatmap show the data distribution of each dimension. Higher levels layout at the outer ring of the hexagons, and lower levels layout inner. The color encodes the frequency of different level for each dimension. This kind of visualization provides a compact depiction of both value and uncertainty. If the data distribute evenly, the uncertainty will be high. Conversely, centralized distribution means low uncertainty. Besides, hard damage will be distributed at the outer ring of hexagons.

q1-1

Figure 1

In Figure 1, we notice there are three salient peaks of line charts, depicting how many citizens upload data in different neighborhoods using the app every hour.  We select two major peaks as examples since they stand for two earthquakes in St. Himark. Notice that there are several delays in the receipt of reports due to the power outages after each earthquake.

We first select a time range during the major earthquake with the number of citizens above forty people. This interaction enables us to filter time intervals that do not belong to the earthquake and includes time intervals during the delays that should belong to the earthquake. In general, we suggest that emergency responders should pay more attention to the neighborhoods closest to the regions where the earthquake happens (such as Old Town, Easton, Safe Town, etc), and the southern regions (such as Scenic Vista, Broadview, Wilson Forest etc) of St. Himark.  Besides, the hexagons beside the heatmap provide sense of uncertainty. Data distribute more evenly means they are more uncertain. Therefore, although the top three neighborhoods are damaged hard, their reports may are not reliable. However, the following five neighborhoods are damaged hard and their reports are reliable. We provide more analysis in Question 2.

q1

Figure 2

More Specifically, by toggling different dimensions, emergency responders can response different targeted events. We show the result in Figure2 using the response maps, each for one dimension. For shake intensity, the damage reports are consistent with the shake map. Old Town, Wilson Forest, Pepper Mill, and Safe Town are the top neighborhoods which shake most. For sewer and water, the result is slightly different.  Scenic Vista, Broadview, Old Town, Easton and Terrapin Springs are top-ranked. For other events, readers can refer to Figure 2.

q1-8

Figure 3

Then, we select a time interval during the last earthquake with the number of citizens above forty to include the two delays (Figure 3). Besides, we show different dimensions in Figure 4. Overall, Old Town, Scenic Vista, and Broadview are always ranked top in different aspects.

q2

Figure 4

2Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response. Limit your response to 1000 words and 10 images.

 

We use normalized information entropy to evaluate the uncertainty of the neighborhood reports. Because it is more suitable for ordinal survey data. The normalized entropy is calculated as follows:

We choose base 11 (total 11 measurement levels from 0 to 10) to normalize information entropy so that the range will be [0, 1], in which  stands for one dimension (shake_intensity, swer_and_water, power, roads_and_bridges, medical or buildings). The higher the entropy is, the more uncertain the reports are, which means citizens have less unified or consistent reports. We use blue color to encode entropy. The dark color means large entropy and light blue means low entropy.

We select a time range to include the last two major earthquakes. Rows in the heatmap will be reordered base on the mean entropy:

in which  is the selected time range of and  is the set cardinality. Mean entropy evaluates the overall uncertainty of selected data for each neighborhood.

q2-4

Figure 5

The result is shown in Figure 5. Notice that data distribute more evenly in the hexagons on the top than those at the bottom. Therefore, we can conclude that Southton, Cheddarford, and Palace Hills etc provide more uncertain reports than others. In contrast, neighborhoods at the end of heatmap provide more reliable reports. These neighborhoods include West Parton, Oak Willow, and Pepper Mill.

3How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.

 

In Figure 6 we can notice that there are three major peaks all the time. Citizens post a lot during Monday afternoon, Wednesday morning, and Thursday afternoon. By looking at the heatmap, we can find the uncertainty during these peaks decrease at the same time. q2-1

We select only the last two earthquakes for more details (Figure 7). Notice that some neighborhoods have more uncertainty about the medical reports. These neighborhoods are Scenic Vista, Weston, Easton, Northwest, East Parton, Pepper Mill, and Chapparal. And These neighborhoods happen to be the regions without hospitals.

Figure 6

q3-3

Figure 7

Besides, we find that there are several breakpoints after each major earthquakes. Supposing that these may be the delays due to power outages, we select each time range for more details. We first select a time range after the first major earthquake (Figure 8). Notice the power of selected four neighborhoods (Broadview, Old Town, Scenic Vista, Chapparal) all have been damaged hard. Besides, Broadview and Scenic Vista have a great emergency on sewer, water, power, roads, and bridges, though their shake intensity is not higher than Old Town. This implies that these neighborhoods may have lagged utilities. q3-6

Figure 8

Then we move the time range to the last two breakpoints (Figure 9). We can find that Old Town and Scenic Vista have both damaged hardest on sewer, water, power, roads, bridges. q3-7

Figure 9

4The data for this challenge can be analyzed either as a static collection or as a dynamic stream of data, as it would occur in a real emergency. Describe how you analyzed the data - as a static collection or a stream. How do you think this choice affected your analysis? Limit your response to 200 words and 3 images.

We current analyze the data as a static collection. However, our system can be applied to the stream. Analyzing the data as the dynamic stream is suitable in a real emergency. In contrast, analyzing the data as a static collection provides a whole picture of the events. In this scenario, most decisions may not differ much whether we analyze the data as static collection or stream. However, it is different when data is delayed. For example, after each major earthquake, several neighborhoods can not upload reports timely due to the power outages. If the application is streaming, we get no data at this time. Therefore, we have no idea what situation these neighborhoods in. The system suddenly can not help users to make decisions. In contrast, we definitely can analyze the data post hoc, and understand the situation. However, this may be outdated after the earthquakes have happened. Therefore, it's really a dilemma in this scenario. More work should be done to think about this scenario.