Entry
Name: "NUIG-Khawaja-MC1"
VAST Challenge 2019
Mini-Challenge 1
Team Members:
Waqas Khawaja, Data Science Institute, NUI Galway,
Ireland waqas.khawaja@insight-centre.org
PRIMARY
Heike Vornhagen, Data Science Institute, NUI Galway,
Ireland heike.vornhagen@insight-centre.org
Student Team: Yes
Tools Used:
Tableau
Excel
Approximately how many hours were spent working on this
submission in total?
60
May we post your submission in the Visual Analytics
Benchmark Repository after VAST Challenge 2019 is complete? Yes
Video
https://drive.google.com/file/d/1bIQmSmmCoa6Si26DQan-R5l4yUjpDaFC/view?usp=sharing
Questions
1 – Emergency responders will base their initial response on
the earthquake shake map. Use visual analytics to determine how their response
should change based on damage reports from citizens on the ground. How would
you prioritize neighborhoods for response? Which parts of the city are hardest
hit? Limit your response to 1000 words and 10 images.
We
assume that the initial response of the rescue workers would be based on the
earthquake map. Looking at the provided shake intensity map, the following
areas seem to be mostly hit by the earthquake.
●
Northwest
●
Old Town
●
Safe Town
●
Pepper
Mill
|
Figure 1: Provided Shake Intensity Map |
The provided shake intensity map does not indicate a
specific time so we assume that the representation is for the overall time.
We then start looking at the user submitted reports
starting from the 6th in Figure 2. Based on the high average of medical
reports for the first day, we will prioritize Pepper Mill, Easton and
Chapparal. We also note that the areas needing medical attention are different
from the affected areas in the shake intensity map. The shake intensity (shown
by the dots) is relatively low.
|
Figure 2: User Damage Reports for 6th April |
For the 7th of April, the damage reports are almost
consistent in all areas except for a few cases that are missing medical
reports as shown in Figure 3. However, we see that the average of damage
reports has increased but the shake intensity is still low. For the 7th, we
will again prioritize Northwest, Palace Hills and Old Town because of a higher
reported damage of medical reports. We also see that a lot of data is missing
for Wilson Forest.
|
Figure 3: Average Damage reports on 7th April |
We then see that the shake intensity suddenly increases
on the 8th. This is when there is a significant increase in the reported
damage as well. While these things are evident from the visualization we have
been using so far, it is difficult to point out the exact areas as the damage
reports are not aggregated.
|
Figure 4: Average damage reports for 8th April |
For prioritizing rescue efforts on the 8th, we then use
the following visualization which combines the overall damage reports and
orders them. We conclude that Wilson Forest, Broadview, Scenic Vista, Old
Town, Chapparal, and Easton are the areas in need of immediate attention.
|
Figure 5: Overall damage area wise on 8th April |
For the 9th, we see almost the same level of shake
intensity but lower damage reported. The lower damage reported may be
attributed to the fact that it is a continuation from the previous day. We
conclude that Scenic Vista, Old Town, Wilson Forrest and Chapparal should be
prioritized based on the responses from citizens.
|
Figure 6: Overall Damage Area Wise 9th April |
We then start looking at data for the 10th of April. We
note that we have overall low shake intensity except for Old Town but the
damage reports are high.
|
Figure 7: Average Damage Types on 10 April |
We again prioritize areas based on the overall reported
damage. This shows that Old Town and Scenic Vista are the areas needing
attention with rescue efforts.
|
Figure 8: Overall Damage Area Wise on April 10th |
Based on the following visualization, we can say that
the areas reporting the highest average damage are Old Town, Scenic Vista,
Wilson Forrest and Chapparal. These are obtained by averaging the total damage
reported over the complete period of 5 days (6th Apr to 10th April).
|
Figure 9: Worst Affected Areas |
2 – Use visual analytics to show uncertainty in the data.
Compare the reliability of neighborhood reports. Which neighborhoods are
providing reliable reports? Provide a rationale for your response. Limit your
response to 1000 words and 10 images.
1. Uncertainty
In
order to pinpoint uncertainty in the dataset we looked at a number of
different graphs to see if any events occurred that did not match our
expectations.
|
Figure 1:Total amount of
damage per hour/day compared to maximum shake intensity per hour/day for all
areas. |
Figure
1 shows the overall damage reports for all locations suddenly increasing as
the earthquake hits on April 8. However, though the Shake Intensity remains
high for the rest of the day there is a sudden drop in reports. This pattern
is repeated on April 9 and 10, sudden spikes of tremors are accompanied by
spikes in damage reports, which immediately drop off again. We had expected
the damage reports to remain reasonably high as we thought that people would continue
reporting damage as they encountered it and as long as it was not attended to.
Therefore we decided to look more into which neighbourhoods reported more
damage and which did not report any to see if we could find explanations.
|
Figure 2: Heatmap
showing number of records (colour) and average damage reported |
From
Figure 2 it becomes apparent that there is a lot of data missing in certain
locations, this pattern is again repeated on April 9th. This is
different to data not being reported (null values – see Figure 3) and may be
related to power cuts some of which may be caused by the earthquake. While we
looked into averaging out the sum of damage reports / shake intensity after
any such ‘missing data block’, we cannot know how accurate these are in
relation to the actual numbers that are stored and submitted as a block once
power is restored. Some of the power outings are more than 10 hours which
would allow for plenty of variety. Furthermore, the late submissions of data blocks
may skew the data for the respective day as we have no way to determine which
parts to include in a day’s overall data.
As to
the ‘null values’, i.e. data that is not being reported, we can presume that
the reason may be that there was no data to report and people just didn’t
record 0. However, people may be suffering from shock or find themselves under
pressure to report on all issues / miss some issues. Without further data (e.g
tweets or news reports), this cannot be explained satisfactorily.
Uncertainty
also arises as we have no further details regarding who is submitting the data
as we have no identification apart from location. Neither do we have numbers
for how many people have the necessary sensors in each location which also may
have an effect on reliability.
2. Reliability
Our
investigation of reliability is framed by the descriptive text about St Himark
and the submitted data readings. Our main measure of reliability is based on
the volatility of the data submitted (taking into account uncertainty issues
outlined above) using standard deviation.
|
Figure 3: Average shake
intensity vs. standard deviation of damage reported |
From
Figure 3 we can see that overall, standard deviation of the damage reported is
comparatively low whenever there is a spike in the shake intensity. From this
we conclude that the damage data at those points is reliable. However, this
visualisation does not allow us to compare specific neighbourhoods.
|
Figure 4: Average shake
intensity reported (purple) vs standard deviation (pink) |
To
begin with we looked at the average of all reported data regarding shake
intensity and compared this to the standard deviation for each neighbourhood
(Figure 4). From this we conclude that Palace Hills and Southwest have the
most reliable data. However, they are further away from the epicenter.
|
Figure 5: Average damage
reported vs standard deviation |
When
it comes to damage reported, those areas further away from the epicenter have
the highest standard deviation (Figure 5). Of those closer to the epicenter,
Wilson Forest seems to be most reliable with a standard deviation of 1.6. This
changes considerably if we just look at the damage reported for buildings
(Figure 6) – Wilson Forest has the second highest volatility. Considering that
it is sparsely populated this could indicate highly unreliable data.
|
Figure 6: Average building
damage reported vs standard deviation for select locations. |
3. Reliability of different neighbourhood
In
order to take a closer look at the reliability of different neighbourhoods we
again looked at both damage and shake intensity reports.
|
Figure 7: Shake
Intensity vs Damage Reports per neighbourhood. Number of records = size of
circle. |
From
Figure 7 it is clear that Old Town has reported the highest average damage and
shake intensity – as they are closest to the epicenter, this is expected. But
areas such as Broadview and Scenic Vista reported very high damage but little
shake intensity. Looking more closely at Scenic Vista we discovered that the
average damage reported remains roughly the same throughout the week but the
number of records increases dramatically on April 9 and 10 (Figure 8).
|
Figure 8: Average
strength of damage and number of records reported for Scenic Vista. |
This
seems to indicate that the increase in numbers is just caused by the power
cuts (areas of no data) and do not necessarily indicate an increase in
damage. So data coming from Scenic Vista
might not be unreliable but might not relate to the earthquake. The same
pattern can be observed for Broadview which supports our hypothesis that
something is happening in these areas but probably not related to the
earthquake.
|
Figure 9: Standard
deviation for damage and shake intensity reported, Old Town. |
Looking
more closely at Old Town shake intensity is quite low except for April 9th,
whereas damage report deviation is relatively high which may indicate
unreliability in these reports.
|
Figure 10: Strength of
damage reports and average of shake intensity for Wilson Forest |
Wilson
Forest has a lot of missing data, especially before the earthquake (Figure
10). Once the shake intensity increases an increase in data being reported can
be observed (thickness of blue bars). However as the standard deviation
(saturation of blue colour) for these
increases is quite low, we surmise that people may just not report anything
before the earthquake hits but that their reports then are reliable.
4. Conclusion
As
outlined above we think that uncertainty in the data is caused by a mix of
power cuts (no data), missing values (‘Null’ recordings) and a lack of
complementary data. This in turn affects the determination of reliability of
various neighbourhoods but we feel confident that our analysis can help making
decisions as to which reports to attend to.
3 – How do conditions change over time? How does uncertainty
in change over time? Describe the key changes you see. Limit your response to
500 words and 8 images.
As we have discussed
as answer of question one that the initial damage reports bring areas such as
Scenic Vista and Boardview into attention even though they are not covered in
the intensity map.
We
see uncertainty from the beginning as there is high medical damage reported
despite the low shake intensity. This is only specific to some areas though.
We also see that the medical reports are in sync with other types of damage
reported on the 7th. We see that the shake intensity is suddenly increased on
the 8th and there is a large number of high building damage reports indicating
earthquake damage.
We
observe changing conditions with respect to two areas.
1.
Damage
Reports
|
Figure 1: Damage Type Over Time |
Looking
at the average of different types of damage reports spread over days, we see
that we have low overall damage reported on the 6th. For the next day, the
overall damage is almost spread evenly at a little over five. For the 8th,
there is a decrease in reported damage in buildings, medical, and sewer and
water but increased damage is reported in the power sector. From our earlier
observations, we see that the shake intensity suddenly increased on the 8th
but does not resonate well with overall damage reports. We then see an overall
increase in the area reports particularly in the water and sewer sector. There
is again accumulative increase on the 9th with sudden spikes in the areas of
power, roads and bridges, and sewer and water.
2.
Geographical
Locations
|
Figure 2: Damage Type Over Time for
Worst Hit Areas |
From
our earlier answer to question 1, we consider the top five areas that have
been badly affected and see the reports. We then plot the above visualization
that shows names of neighborhoods on the left side and then the bars showing
the average number of damage reported. Different colors of the bars represent
different types. The horizontal line in each box represents the average shake
intensity for that particular area. The columns represent each day.
One
thing that immediately comes into attention is that Old Town (middle row) is
probably the only area where the damage reports and the shake intensity seem
to be somewhat consistent. We see an increased damage with increased shake
intensity.
With
respect to uncertainty, we see that Wilson Forest is probably the highest.
There is high shake intensity reported for 8th and 9th but there is a lot of
data missing for all days except the 8th.
4 –– The data for this challenge can be analyzed either as a
static collection or as a dynamic stream of data, as it would occur in a real
emergency. Describe how you analyzed the data - as a static collection or a
stream. How do you think this choice affected your analysis? Limit your
response to 200 words and 3 images.
We analysed the data as a static collection.
This allowed us to make comparisons between different neighbourhoods and types
of damage report without real life time constraints. It affected our analysis
in that we were able to discuss possible reasons for unexpected events which
helped making sense of the data. For example the rise in the number of damage
reports coming in from Scenic Vista on April 9 might have been seen to be
important if the previous ‘missing data’ had not been properly logged and
taken account for.
Using standard deviation for gauging
reliability of data would be harder as reliability is also related to time.
For example in Figure 1, it appears that data from Wilson Forest is highly
unreliable as it has a high standard deviation (2.6) for that particular recording.
|
Figure 1: Comparing average shake intensity reported
with standard deviation, April 8, 1pm |
However, even looking at the one day shows
overall standard deviation at a low 1.3 and hence our understanding of the
reliability of the data is quite confident (Figure 2).
|
Figure 2: |
The other area affected by looking at static
data vs dynamic data are of course the power cuts. There are a number of
ongoing power cuts of between 60 - 90 minutes as well as those of longer
duration caused by the earthquake. It would be very difficult at the beginning
of such a power cut to determine if it is a scheduled outing or not, and hence
to decide which action to take if any.
|
Figure 3: Missing data for each location |
Figure 3 for example gives an overview of
power cuts, the longer of which are accompanied by a rise in number of reports
submitted (with the exception of Wilson Forest). But there are also a number
of recurring outings in nearly all areas, with dynamic data it would be harder
to determine the difference.
We conclude that we would want to use a lot
of other data sources to corroborate dynamic data in order to ensure accuracy of analysis.