Wentao Zhang, Peking University, zhwt@pku.edu.cn PRIMARY
Yifei Xia, Peking University, yfxia@pku.edu.cn
Yikai Ding, Peking University, 1600012703@pku.edu.cn
Xiaoru Yuan, Peking University, xiaoru.yuan@pku.edu.cn
Student
Team: YES
Tableau
Excel
D3.js
EarthquakeAware-Community, developed by
the Visualization and Visual Analytics Group of Peking University.
Approximately how many hours were spent working on
this submission in total?
100h.
May we post your submission in the Visual Analytics
Benchmark Repository after VAST Challenge 2019 is complete? YES
Video
Questions
1 – Emergency responders will
base their initial response on the earthquake shake map. Use visual analytics
to determine how their response should change based on damage reports from
citizens on the ground. How would you prioritize neighborhoods for response?
Which parts of the city are hardest hit? Limit your response to 1000 words and
10 images.
First we have an overview of the earthquakes by our
visualization system.
Overview of the prequake:

The number of reports per
5 minutes rose twice. According to the figure above, at 14:30 on April 6, it
rose up to 50 from 10, and maintained for 1.5 hours. And at 16:00, it rose
again to 150. Then decreased to a normal level. This process lasted about 5.5
hours.
Most of the increment
were from District 3, 8 and 9. For more details, in the radial plot at the
bottom left corner, the damage in 3, 9 was quite high in all the 5 dimensions,
but the medical reports in 8 is almost 0. We find that the damage of medical
field depended on the existence of hospitals. There are hospitals in 3 and 9,
but not in 8.
There is also a straight
line between dimension buildings and roads. We highlight it in the figure below
for more detail. The reports on this line were mainly about the damage of
buildings and roads, and most from 8, 9, 10, 13 as well. This prequake hurt the
buildings and roads in those districts, and also affect the traffic there,
which will be shown in MC3.

After lasso other
dimensions, we find that the road damage in the lower part of the map(8, 9, 10,
13) is more serious. For power, buildings, and water, the damage in the
neighborhood 3 is the most serious.
Overview of the
majorquake:

At 8:40 on April 8, the
number of records pre 5 min dramatically increased up to 1500, and generally
decreased, then rose again at 11:30, 12:00, finally back to the normal level
after 17:00.
From the scatter plot we
can see that the data with a higher shake intensity value is close to a circle.
Through the lasso, we can see that the data except this circle basically comes
from other regions, indicating that the neighborhood 3 has indeed suffered
great damage.

The black parts, 3, 8, 9
and 14, which had a very huge amount of reports, and showed very high damage.
More specifically, look at the extra bars between the majorquake and
aftershock, those are re-connections from 3, 8, 9, 10. The report from them
disappear for more than 10 hours after this quake, and suddenly showed showed
up at the bars. It might a loss of association because of the hurt power
system.
We picked those districts
(with golden border) to show you the details, in the following 4 figures.

Moreover, power outages
have also been detected in other areas (4, 7, 12, 14, 17), but they may not be
in the earthquake time or the power outage time is not long.
We can also use lasso to
analyze the specifics of each dimension. As shown in the figure, neighborhood 8
was mainly damaged in roads dimension. To water and power is 2, to buildings is
1. Neighborhood 3 and 9 are seriously damaged in all dimensions

Overview of the
aftershock:

At 15:00 on April 9, the
number of reports rose up to about 500, and fell to normal level after about 4
hours. This aftershock is similar to the majorquake in shake_intensity, but
different in the affection. District 8 and 9 and most hurt in this quake, but
the reports only disappeared in neighborhood 3 and 8.

We can see through the
projection scatter plot that the neighborhood 9 reports are more focused on the
center, which means that 9 suffered more damage than 8, while the medical
damage of 8 is less.
Through lasso, we can
also find that road damage mainly occurs in the lower part of the city, and
water and power damage mainly occurs in the upper part of the city.
The loss of association
in 3 and 8: (2 groups , each contains 2 figures showing the start and end time
of loss.)
neighborhood 3: 15:00 on April 9 to
24:00 on April 10
neighborhood 8: 17:00 on April 9 to
2:00 on April 10

In fact, there are so
many unbelievable phenominems hiding in the data of MC 1, for example, we have
generated the distribution of records in each time period (width 30 min or 1
hour), it is approximate to a normal distribution during the shaking time, and
random in the rest. But more than that, we found 2 peaks in district 1 and 16,
during the major, after shake, while the other 17 districts only have 1 peak.
According to the data, we
find 2 peaks means 2 different clusters in the 5-dimension space (without
shake_intensity) and it is shown in the figure below by 2 highlight circle.

2 – Use visual analytics to
show uncertainty in the data. Compare the reliability of neighborhood reports.
Which neighborhoods are providing reliable reports? Provide a rationale for
your response. Limit your response to 1000 words and 10 images.
First we assume that for one specific aspect of damage, people’s report
values conform to Normal Distribution at each period, i.e.
N~N(μ, σ)
where μ is the corresponding mean damage value, while σ is the
unbiased estimation of standard variance. For simplicity, we define values
outside 1σ as unreliable data. Hence, one user report(containing 6 damage
dimensions) is considered unreliable if most of its dimensions are thought to
be unreliable respectively.
The assumption above is supported by the following evidences:
1.
For each of the three quakes, we observe distribution of reports of each
area at a time resolution of 30 minutes. As shown in the graph below, damage in
buildings in district 3, 4, 9, 10, 11 conforms to Normal Distribution quite
well, in spite of their discrete possible values.
2.
Distribution goes away from Normal Distribution when there’s no
earthquake. However, considering the low proportion of those data, it is
adequate to ignore their effects.

For a detailed analysis, we
choose time resolution as 30 minutes, and threshold as 4 to 6. Any report that
has unreliable dimensions of more than threshold would be considered unreliable
as a whole. Then we quantify reliability of a whole district after this
parameter: ratio of reliable reports’ proportion in each area. The chart below
shows reliability of all 19 districts when threshold is set to 4, 5 and 6
respectively.

|
Tolarence |
Neighborhoods |
Number |
|
High |
(except
7,8,10) |
17 |
|
Mid |
1,2,4,13,15 |
6 |
|
Low |
4,18 |
2 |
Furthermore, if the analysis
above shows quality of reports, the overall reliability of a district primarily
lies in the report number density (instead of a total report number), which
means number of reports per area. This is not difficult to understand, as the
more data are reported in unit area, the more reliable the district’s reports
will be.
Specifically, we now analyze a
snapshot around major quake at 8 Apr. morning, so as to find out the most reliable
districts. First, we discard districts of reports fewer than 100, because
number of reports is a prerequisite of reliability. District 3, 14, 9, 18, 2,
11 and 8 has more than 100 reports during this period.
Districts we think most reliable:

(Periods selected in charts below are all the same as above if not claimed.)
These two districts also uploaded
reliable data around the aftershock at 9 Apr. afternoon.

Unreliable districts are listed
as below.
District 1 & 7:

As the two district are
relatively distant from the quake center, report values of shake intensity are
in severe polarization.
Other unlisted districts are
normal ones between reliable and unreliable. Either there are just a few
reports, or the reports are in polarization.

Notice that
district 3, 8, 9, 10 and 11 had power cut, resulting in short-time report
bursts afterwards.

We can conclude that reliability cannot
deducted merely from normal distribution test and mean report number density.
It is also important to take details like power cut into account, especially
when report numbers increases.
3 – How do conditions change
over time? How does uncertainty in change over time? Describe the key changes
you see. Limit your response to 500 words and 8 images.
Generally
speaking, damage conditions consists of two parts: quake time and normal time.
We would focus on conditions’ change with time during periods around the
quakes.

First of all, the number of records per 5 minutes
after an earthshake During prequake, the reported damage is lower in value than
the noise, but more in number. as shown above. That why the average damage
lowered down. In the contrary, during the rest 2 earthquakes, the average
values rose very quickly.

The Correlation between different dimensions
Because of the noise, it’s interesting to think about the correlation. For
instance we pick buildings and power, and group the time into 1-hour slots,
calculate the covariance and rel (correlation coefficient) to explore how
earthquake affected it. As shown in the top half figure below, it’s a rel-t
graph, and |rel| < 0.2 if there is no earthquake, but be a high positive
value when shaking. To prove the independence , we draw the curve of
integration as well, it’s almost horizontal in the most of time. In fact, these
dimensions : buildings, power, sewer_and_water, roads_and_bridges, medical ,
are all well independent, unless an earthquake occurs. And this is a support
for using a the sum of them as a combinational dimension as a comprehensive
coefficient.

The last one is the uncertainty and reliability.
As we any analyzed before, the proportion of reliable reports is the
reliability of a district, this is obviously a variable changing over time. The
figure below shows how reliability changes over time for each district
Red : reliability < 70%
Green : Reliability > 70%
We can find the approximate time of each
earthquake in this figure, because the reliability changed a lot at each point.
There is also one thing worth a mentioning, the change at the third quake
(aftershock) is not as significant as it was at the first 2 quakes. In fact the
aftershock is very similar to the majorquake, and is so close to it in time as
well, so that citizens were not able to return to a safe life before the third
one occured. They 2 seems to be successed events. This figure povides a new
tool to compare these districts.

Besides, we would concern about the change of
damages in each aspect. It would be easy to find out damage condition’s change
using our system. Take sewer and power as an example, select report value 6 to
10 to focus on greater damages. Select points that are closed to
“sewer_and_water” button, then strong report values of sewer and power would be
shown on the map. During the major quake, obviously, district 14, 12, 4 and 18
are the strongest hit.

When it comes to the aftershock, as is shown
below, we can see that severe reports have greatly decreased. Sewer and water
in district 4 and 16 are still strongly hit, while that in district 14, 18 and
12 may have been repaired.

4 –– The data for this challenge can be analyzed either as a static
collection or as a dynamic stream of data, as it would occur in a real
emergency. Describe how you analyzed the data - as a static collection or a
stream. How do you think this choice affected your analysis? Limit your
response to 200 words and 3 images.
Generally we analyze data
statically, while at the same time the system supports dynamic data processing,
and can alert when abnormal reports flood in. At a static view, we can analyze
former events carefully, such as exploring the three shocks’ effect on
different dimensions of city. Meanwhile, it would be more important to alarm in
time when unusual condition happens when data is treated as a stream.