Entry Name:  "NUIG-Khawaja-MC1"

VAST Challenge 2019
Mini-Challenge 1

 

 

Team Members:

Waqas Khawaja, Data Science Institute, NUI Galway, Ireland  waqas.khawaja@insight-centre.org PRIMARY

Heike Vornhagen, Data Science Institute, NUI Galway, Ireland heike.vornhagen@insight-centre.org

Student Team:  Yes

 

Tools Used:

Tableau

Excel

 

Approximately how many hours were spent working on this submission in total?

60

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2019 is complete? Yes

 

Video

https://drive.google.com/file/d/1bIQmSmmCoa6Si26DQan-R5l4yUjpDaFC/view?usp=sharing

 

 

 

 

Questions

1 – Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit? Limit your response to 1000 words and 10 images.

 

We assume that the initial response of the rescue workers would be based on the earthquake map. Looking at the provided shake intensity map, the following areas seem to be mostly hit by the earthquake.

      Northwest

      Old Town

      Safe Town

      Pepper Mill

 

Figure 1: Provided Shake Intensity Map

 

The provided shake intensity map does not indicate a specific time so we assume that the representation is for the overall time.

 

We then start looking at the user submitted reports starting from the 6th in Figure 2. Based on the high average of medical reports for the first day, we will prioritize Pepper Mill, Easton and Chapparal. We also note that the areas needing medical attention are different from the affected areas in the shake intensity map. The shake intensity (shown by the dots) is relatively low.

 

 

 

Figure 2: User Damage Reports for 6th April

 

 

For the 7th of April, the damage reports are almost consistent in all areas except for a few cases that are missing medical reports as shown in Figure 3. However, we see that the average of damage reports has increased but the shake intensity is still low. For the 7th, we will again prioritize Northwest, Palace Hills and Old Town because of a higher reported damage of medical reports. We also see that a lot of data is missing for Wilson Forest.

 

 

 

Figure 3: Average Damage reports on 7th April

 

We then see that the shake intensity suddenly increases on the 8th. This is when there is a significant increase in the reported damage as well. While these things are evident from the visualization we have been using so far, it is difficult to point out the exact areas as the damage reports are not aggregated.

 

Figure 4: Average damage reports for 8th April

 

For prioritizing rescue efforts on the 8th, we then use the following visualization which combines the overall damage reports and orders them. We conclude that Wilson Forest, Broadview, Scenic Vista, Old Town, Chapparal, and Easton are the areas in need of immediate attention.

 

 

 

 

 

Figure 5: Overall damage area wise on 8th April

 

 

 

 

For the 9th, we see almost the same level of shake intensity but lower damage reported. The lower damage reported may be attributed to the fact that it is a continuation from the previous day. We conclude that Scenic Vista, Old Town, Wilson Forrest and Chapparal should be prioritized based on the responses from citizens.

 

 

 

 

Figure 6: Overall Damage Area Wise 9th April

 

 

 

We then start looking at data for the 10th of April. We note that we have overall low shake intensity except for Old Town but the damage reports are high.

 

 

Figure 7: Average Damage Types on 10 April

 

 

We again prioritize areas based on the overall reported damage. This shows that Old Town and Scenic Vista are the areas needing attention with rescue efforts.

 

 

Figure 8: Overall Damage Area Wise on April 10th

 

 

 

Based on the following visualization, we can say that the areas reporting the highest average damage are Old Town, Scenic Vista, Wilson Forrest and Chapparal. These are obtained by averaging the total damage reported over the complete period of 5 days (6th Apr to 10th April).

 

 

Figure 9: Worst Affected Areas

 

 

2 – Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response. Limit your response to 1000 words and 10 images.

 

1.    Uncertainty

In order to pinpoint uncertainty in the dataset we looked at a number of different graphs to see if any events occurred that did not match our expectations.

 

Figure 1:Total amount of damage per hour/day compared to maximum shake intensity per hour/day for all areas.

 

 

Figure 1 shows the overall damage reports for all locations suddenly increasing as the earthquake hits on April 8. However, though the Shake Intensity remains high for the rest of the day there is a sudden drop in reports. This pattern is repeated on April 9 and 10, sudden spikes of tremors are accompanied by spikes in damage reports, which immediately drop off again. We had expected the damage reports to remain reasonably high as we thought that people would continue reporting damage as they encountered it and as long as it was not attended to. Therefore we decided to look more into which neighbourhoods reported more damage and which did not report any to see if we could find explanations.

 

Figure 2: Heatmap showing number of records (colour) and average damage reported

 

 

From Figure 2 it becomes apparent that there is a lot of data missing in certain locations, this pattern is again repeated on April 9th. This is different to data not being reported (null values – see Figure 3) and may be related to power cuts some of which may be caused by the earthquake. While we looked into averaging out the sum of damage reports / shake intensity after any such ‘missing data block’, we cannot know how accurate these are in relation to the actual numbers that are stored and submitted as a block once power is restored. Some of the power outings are more than 10 hours which would allow for plenty of variety. Furthermore, the late submissions of data blocks may skew the data for the respective day as we have no way to determine which parts to include in a day’s overall data.

As to the ‘null values’, i.e. data that is not being reported, we can presume that the reason may be that there was no data to report and people just didn’t record 0. However, people may be suffering from shock or find themselves under pressure to report on all issues / miss some issues. Without further data (e.g tweets or news reports), this cannot be explained satisfactorily.

Uncertainty also arises as we have no further details regarding who is submitting the data as we have no identification apart from location. Neither do we have numbers for how many people have the necessary sensors in each location which also may have an effect on reliability.

2.    Reliability

Our investigation of reliability is framed by the descriptive text about St Himark and the submitted data readings. Our main measure of reliability is based on the volatility of the data submitted (taking into account uncertainty issues outlined above) using standard deviation.

 

Figure 3: Average shake intensity vs. standard deviation of damage reported

 

From Figure 3 we can see that overall, standard deviation of the damage reported is comparatively low whenever there is a spike in the shake intensity. From this we conclude that the damage data at those points is reliable. However, this visualisation does not allow us to compare specific neighbourhoods.

 

Figure 4: Average shake intensity reported (purple) vs standard deviation (pink)

 

To begin with we looked at the average of all reported data regarding shake intensity and compared this to the standard deviation for each neighbourhood (Figure 4). From this we conclude that Palace Hills and Southwest have the most reliable data. However, they are further away from the epicenter.

 

Figure 5: Average damage reported vs standard deviation

 

 

When it comes to damage reported, those areas further away from the epicenter have the highest standard deviation (Figure 5). Of those closer to the epicenter, Wilson Forest seems to be most reliable with a standard deviation of 1.6. This changes considerably if we just look at the damage reported for buildings (Figure 6) – Wilson Forest has the second highest volatility. Considering that it is sparsely populated this could indicate highly unreliable data.

 

Figure 6: Average building damage reported vs standard deviation for select locations.

 

 

3.    Reliability of different neighbourhood

In order to take a closer look at the reliability of different neighbourhoods we again looked at both damage and shake intensity reports.

 

Figure 7: Shake Intensity vs Damage Reports per neighbourhood. Number of records = size of circle.

 

 

From Figure 7 it is clear that Old Town has reported the highest average damage and shake intensity – as they are closest to the epicenter, this is expected. But areas such as Broadview and Scenic Vista reported very high damage but little shake intensity. Looking more closely at Scenic Vista we discovered that the average damage reported remains roughly the same throughout the week but the number of records increases dramatically on April 9 and 10 (Figure 8).

 

Figure 8: Average strength of damage and number of records reported for Scenic Vista.

 

 

This seems to indicate that the increase in numbers is just caused by the power cuts (areas of no data) and do not necessarily indicate an increase in damage.  So data coming from Scenic Vista might not be unreliable but might not relate to the earthquake. The same pattern can be observed for Broadview which supports our hypothesis that something is happening in these areas but probably not related to the earthquake.

 

Figure 9: Standard deviation for damage and shake intensity reported, Old Town.

 

 

Looking more closely at Old Town shake intensity is quite low except for April 9th, whereas damage report deviation is relatively high which may indicate unreliability in these reports.

 

Figure 10: Strength of damage reports and average of shake intensity for Wilson Forest

 

Wilson Forest has a lot of missing data, especially before the earthquake (Figure 10). Once the shake intensity increases an increase in data being reported can be observed (thickness of blue bars). However as the standard deviation (saturation of blue colour)  for these increases is quite low, we surmise that people may just not report anything before the earthquake hits but that their reports then are reliable.

 

4.    Conclusion

As outlined above we think that uncertainty in the data is caused by a mix of power cuts (no data), missing values (‘Null’ recordings) and a lack of complementary data. This in turn affects the determination of reliability of various neighbourhoods but we feel confident that our analysis can help making decisions as to which reports to attend to.

 

 

 

 

3 – How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.

 

As we have discussed as answer of question one that the initial damage reports bring areas such as Scenic Vista and Boardview into attention even though they are not covered in the intensity map.

 

We see uncertainty from the beginning as there is high medical damage reported despite the low shake intensity. This is only specific to some areas though. We also see that the medical reports are in sync with other types of damage reported on the 7th. We see that the shake intensity is suddenly increased on the 8th and there is a large number of high building damage reports indicating earthquake damage.

 

We observe changing conditions with respect to two areas.

 

 

1.     Damage Reports

 

Figure 1: Damage Type Over Time

 

 

Looking at the average of different types of damage reports spread over days, we see that we have low overall damage reported on the 6th. For the next day, the overall damage is almost spread evenly at a little over five. For the 8th, there is a decrease in reported damage in buildings, medical, and sewer and water but increased damage is reported in the power sector. From our earlier observations, we see that the shake intensity suddenly increased on the 8th but does not resonate well with overall damage reports. We then see an overall increase in the area reports particularly in the water and sewer sector. There is again accumulative increase on the 9th with sudden spikes in the areas of power, roads and bridges, and sewer and water.

 

2.     Geographical Locations

     

Figure 2: Damage Type Over Time for Worst Hit Areas

 

 

From our earlier answer to question 1, we consider the top five areas that have been badly affected and see the reports. We then plot the above visualization that shows names of neighborhoods on the left side and then the bars showing the average number of damage reported. Different colors of the bars represent different types. The horizontal line in each box represents the average shake intensity for that particular area. The columns represent each day.

 

One thing that immediately comes into attention is that Old Town (middle row) is probably the only area where the damage reports and the shake intensity seem to be somewhat consistent. We see an increased damage with increased shake intensity.

 

With respect to uncertainty, we see that Wilson Forest is probably the highest. There is high shake intensity reported for 8th and 9th but there is a lot of data missing for all days except the 8th.

 

 

4 –– The data for this challenge can be analyzed either as a static collection or as a dynamic stream of data, as it would occur in a real emergency. Describe how you analyzed the data - as a static collection or a stream. How do you think this choice affected your analysis? Limit your response to 200 words and 3 images.

We analysed the data as a static collection. This allowed us to make comparisons between different neighbourhoods and types of damage report without real life time constraints. It affected our analysis in that we were able to discuss possible reasons for unexpected events which helped making sense of the data. For example the rise in the number of damage reports coming in from Scenic Vista on April 9 might have been seen to be important if the previous ‘missing data’ had not been properly logged and taken account for.

Using standard deviation for gauging reliability of data would be harder as reliability is also related to time. For example in Figure 1, it appears that data from Wilson Forest is highly unreliable as it has a high standard deviation (2.6)  for that particular recording.

 

Figure 1: Comparing average shake intensity reported with standard deviation, April 8, 1pm

 

 

 

 

However, even looking at the one day shows overall standard deviation at a low 1.3 and hence our understanding of the reliability of the data is quite confident (Figure 2).

 

Figure 2:

 

The other area affected by looking at static data vs dynamic data are of course the power cuts. There are a number of ongoing power cuts of between 60 - 90 minutes as well as those of longer duration caused by the earthquake. It would be very difficult at the beginning of such a power cut to determine if it is a scheduled outing or not, and hence to decide which action to take if any.

 

Figure 3: Missing data for each location

 

 

Figure 3 for example gives an overview of power cuts, the longer of which are accompanied by a rise in number of reports submitted (with the exception of Wilson Forest). But there are also a number of recurring outings in nearly all areas, with dynamic data it would be harder to determine the difference.

We conclude that we would want to use a lot of other data sources to corroborate dynamic data in order to  ensure accuracy of analysis.