Entry Name:  "PKU-Yuan-MC1"

VAST Challenge 2019
Mini-Challenge 1

 

 

Team Members:

Wentao Zhang, Peking University, zhwt@pku.edu.cn     PRIMARY

Yifei Xia, Peking University,  yfxia@pku.edu.cn

Yikai Ding, Peking University, 1600012703@pku.edu.cn

Xiaoru Yuan, Peking University, xiaoru.yuan@pku.edu.cn

Student Team:  YES

 

Tools Used:

Tableau

Excel

D3.js

EarthquakeAware-Community, developed by the Visualization and Visual Analytics Group of Peking University.

 

Approximately how many hours were spent working on this submission in total?

100h.

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2019 is complete? YES

 

Video

https://youtu.be/1ijmNJedsYY

 

 

 

Questions

1Emergency responders will base their initial response on the earthquake shake map. Use visual analytics to determine how their response should change based on damage reports from citizens on the ground. How would you prioritize neighborhoods for response? Which parts of the city are hardest hit? Limit your response to 1000 words and 10 images.

First we have an overview of the earthquakes by our visualization system.

Overview of the prequake:

https://lh4.googleusercontent.com/L3hQmBCRELqET53F0PfgHpImF2WYbSYUujaUlsUgg_2GtVbFZnM0I-_Ute0wcsP5Z3JkSyZ9uNSV-VDaHdvwStg_zE0JvPW5uBBAOPcLwngCc9cEleMO-CMH2vYxh6k5cZSJEjIy

The number of reports per 5 minutes rose twice. According to the figure above, at 14:30 on April 6, it rose up to 50 from 10, and maintained for 1.5 hours. And at 16:00, it rose again to 150. Then decreased to a normal level. This process lasted about 5.5 hours.

Most of the increment were from District 3, 8 and 9. For more details, in the radial plot at the bottom left corner, the damage in 3, 9 was quite high in all the 5 dimensions, but the medical reports in 8 is almost 0. We find that the damage of medical field depended on the existence of hospitals. There are hospitals in 3 and 9, but not in 8.

There is also a straight line between dimension buildings and roads. We highlight it in the figure below for more detail. The reports on this line were mainly about the damage of buildings and roads, and most from 8, 9, 10, 13 as well. This prequake hurt the buildings and roads in those districts, and also affect the traffic there, which will be shown in MC3.

After lasso other dimensions, we find that the road damage in the lower part of the map(8, 9, 10, 13) is more serious. For power, buildings, and water, the damage in the neighborhood 3 is the most serious.

Overview of the majorquake:

https://lh3.googleusercontent.com/snE7Nw1IsK9Rr89RhHuw7tHYRIr0G7bsOAO5DWKeTSBG65z4cutxRe0-pvIQI8tzAktDvIO4bHI-dLDe0az5G-ia0RnHoSK5-sUKVWJNV_hVviMiHO8MsB-GU9ILF4nIX8urrEEa

At 8:40 on April 8, the number of records pre 5 min dramatically increased up to 1500, and generally decreased, then rose again at 11:30, 12:00, finally back to the normal level after 17:00.

From the scatter plot we can see that the data with a higher shake intensity value is close to a circle. Through the lasso, we can see that the data except this circle basically comes from other regions, indicating that the neighborhood 3 has indeed suffered great damage.

The black parts, 3, 8, 9 and 14, which had a very huge amount of reports, and showed very high damage. More specifically, look at the extra bars between the majorquake and aftershock, those are re-connections from 3, 8, 9, 10. The report from them disappear for more than 10 hours after this quake, and suddenly showed showed up at the bars. It might a loss of association because of the hurt power system.

We picked those districts (with golden border) to show you the details, in the following 4 figures.

Moreover, power outages have also been detected in other areas (4, 7, 12, 14, 17), but they may not be in the earthquake time or the power outage time is not long.

We can also use lasso to analyze the specifics of each dimension. As shown in the figure, neighborhood 8 was mainly damaged in roads dimension. To water and power is 2, to buildings is 1. Neighborhood 3 and 9 are seriously damaged in all dimensions

Overview of the aftershock:

https://lh4.googleusercontent.com/tR9lZYL9D-3Fr5YxJkxuARww78oaYB5ZXQOtdLfLafrEc1Nzqh7nXXTcWuoD5CHVlfbuCH499CCytokf71LX1Bbes_8bT6fAKCjYPvSHJlwgs8j_olA0OTYiYfZ20cRX1iR_vfYp

At 15:00 on April 9, the number of reports rose up to about 500, and fell to normal level after about 4 hours. This aftershock is similar to the majorquake in shake_intensity, but different in the affection. District 8 and 9 and most hurt in this quake, but the reports only disappeared in neighborhood 3 and 8.

We can see through the projection scatter plot that the neighborhood 9 reports are more focused on the center, which means that 9 suffered more damage than 8, while the medical damage of 8 is less.

Through lasso, we can also find that road damage mainly occurs in the lower part of the city, and water and power damage mainly occurs in the upper part of the city.

The loss of association in 3 and 8: (2 groups , each contains 2 figures showing the start and end time of loss.)

neighborhood 3: 15:00 on April 9 to 24:00 on April 10

neighborhood 8: 17:00 on April 9 to 2:00 on April 10

In fact, there are so many unbelievable phenominems hiding in the data of MC 1, for example, we have generated the distribution of records in each time period (width 30 min or 1 hour), it is approximate to a normal distribution during the shaking time, and random in the rest. But more than that, we found 2 peaks in district 1 and 16, during the major, after shake, while the other 17 districts only have 1 peak.

According to the data, we find 2 peaks means 2 different clusters in the 5-dimension space (without shake_intensity) and it is shown in the figure below by 2 highlight circle.

https://lh3.googleusercontent.com/j7FHxGp8J8Spdgk0bD_8eE2DcBY7RFlqGwv0DZO5Cpg3dxGU398O9VcESU75SGIJq1ikmIvG_0W3fFp3eaPoI_SuESVXWeNLkWAOr9Q6c5sdjFiVP6oa7m-MMpu-H8Jwsjb0LNjZ

2Use visual analytics to show uncertainty in the data. Compare the reliability of neighborhood reports. Which neighborhoods are providing reliable reports? Provide a rationale for your response. Limit your response to 1000 words and 10 images.

First we assume that for one specific aspect of damage, people’s report values conform to Normal Distribution at each period, i.e.

N~N(μ, σ)

where μ is the corresponding mean damage value, while σ is the unbiased estimation of standard variance. For simplicity, we define values outside 1σ as unreliable data. Hence, one user report(containing 6 damage dimensions) is considered unreliable if most of its dimensions are thought to be unreliable respectively.

The assumption above is supported by the following evidences:

1.  For each of the three quakes, we observe distribution of reports of each area at a time resolution of 30 minutes. As shown in the graph below, damage in buildings in district 3, 4, 9, 10, 11 conforms to Normal Distribution quite well, in spite of their discrete possible values.

2.  Distribution goes away from Normal Distribution when there’s no earthquake. However, considering the low proportion of those data, it is adequate to ignore their effects.

https://lh4.googleusercontent.com/WeO17TDAumD49-yJsAuxDSPOE6462MFHbJa5hKzMOQSSiNLPoPN6FX7hR9PX2pGdqiUFjh2Q2l-dgB3_AsqwzBBsCCs58re8BWJM4WWaLbooEPGwtW5Ay4XVV_uKrPRJLR84tgsB

For a detailed analysis, we choose time resolution as 30 minutes, and threshold as 4 to 6. Any report that has unreliable dimensions of more than threshold would be considered unreliable as a whole. Then we quantify reliability of a whole district after this parameter: ratio of reliable reports’ proportion in each area. The chart below shows reliability of all 19 districts when threshold is set to 4, 5 and 6 respectively.

https://lh3.googleusercontent.com/RjPnlYrWAjazSG89fIk2FdAADBve0NKWKfx2olFv7TA8nw_oyyoCgqhAx6WZMtw-UFyp-BcDlpcRUBWMbcsucMH8oiwGmr7AdKlPcmWs6ZwanId9MZY65US4QZedf_AjC6Lj-V1h

 

Tolarence

Neighborhoods

Number

High

(except 7,8,10)

17

Mid

1,2,4,13,15

6

Low

4,18

2

Furthermore, if the analysis above shows quality of reports, the overall reliability of a district primarily lies in the report number density (instead of a total report number), which means number of reports per area. This is not difficult to understand, as the more data are reported in unit area, the more reliable the district’s reports will be.

Specifically, we now analyze a snapshot around major quake at 8 Apr. morning, so as to find out the most reliable districts. First, we discard districts of reports fewer than 100, because number of reports is a prerequisite of reliability. District 3, 14, 9, 18, 2, 11 and 8 has more than 100 reports during this period.

Districts we think most reliable

https://lh4.googleusercontent.com/GUa2JKDDLfiK_XWty8iiM1zW_IOQ40DNVV6CmWYvX7DxMSkmuUT8SwcZHffGPOv51nOKumn-hQw1-MUR_qhgq-zBJoxiYa5KHixUU4n8bZYRgpyMR9TWG0N3_oO3gudzRb-0ZL6O

(Periods selected in charts below are all the same as above if not claimed.)

These two districts also uploaded reliable data around the aftershock at 9 Apr. afternoon.

https://lh6.googleusercontent.com/uPvhUabv_WgBUttWWQwlylwsycX43cEfrXqv2rvMJCgX7Z_94qZ2RB7bLD48lDIeJiRcM7CNTk_1YyMXHsmVakYVhjzLuJH4s1X2d90lTWbCbnXycoMiyp2HezPiscDTNXcdT4sB

Unreliable districts are listed as below.

District 1 & 7:

https://lh5.googleusercontent.com/KsumdDaOvg3FvC-74IRu1gQZRVv9_5QVm_MwC14MfJCsBLx78zKqhH3CNrRDx0QuCrJGTE_jPmPLhdwodc8ShJLy9z5SWF0JkeHawSDhSLOrezNMAQB6exEmSyCt3ZUMzP-kbX0s

As the two district are relatively distant from the quake center, report values of shake intensity are in severe polarization.

Other unlisted districts are normal ones between reliable and unreliable. Either there are just a few reports, or the reports are in polarization.

https://lh3.googleusercontent.com/GdcwF9mCHt5B4DXSjUU7LAWU5qN6UmzWQ05-8g1xrWCVB-Gb9-jvB8wj-RKVoDCJ4rUj8dFYzkVPjwcq9JR3u6XqD4zN_jB_eMAqaSr2mmVSfm28BpA9wEm_9hq_q9zzjqtWpkQO

Notice that district 3, 8, 9, 10 and 11 had power cut, resulting in short-time report bursts afterwards.

https://lh4.googleusercontent.com/skEoQc90iUfnWzQOb565xFft7F4r0Vajl3E3F20ma-QhA2P7ne6vWeAXNTsYnUS_9dEdOKyP7QZsJeycvaQi6musGLQYMIh18W1krD9ySN5fD3ZcB9oO50eMPeKyox3KQP4qs_1z

We can conclude that reliability cannot deducted merely from normal distribution test and mean report number density. It is also important to take details like power cut into account, especially when report numbers increases.

3How do conditions change over time? How does uncertainty in change over time? Describe the key changes you see. Limit your response to 500 words and 8 images.

Generally speaking, damage conditions consists of two parts: quake time and normal time. We would focus on conditions’ change with time during periods around the quakes.

https://lh3.googleusercontent.com/BWCvqIHnAgse7HfnrU5vZU7qr9KeamWwJsxP41pyNiGhgQOUWJFDaki4d6w65LhuYLUpLk9o58hyBBy0LZYzS2n6ebgyIUfXt5e_fhetccJLGr2MNOl8KXlfrEGoWqyuNckkmQia

First of all, the number of records per 5 minutes after an earthshake During prequake, the reported damage is lower in value than the noise, but more in number. as shown above. That why the average damage lowered down. In the contrary, during the rest 2 earthquakes, the average values rose very quickly.

https://lh3.googleusercontent.com/m0ESTvC1UavpowY4oTT-NFDaHVNtE6PE3a7jHT7-Awp0NeoCrz5ILIo7KjFPObIAZhSzXH3fNCKZW-rULKwHHNHAmjSKO4knyynfYolF1Ik7hTGam2rfGRvkgy9eDuxLolf8r2EY

The Correlation between different dimensions Because of the noise, it’s interesting to think about the correlation. For instance we pick buildings and power, and group the time into 1-hour slots, calculate the covariance and rel (correlation coefficient) to explore how earthquake affected it. As shown in the top half figure below, it’s a rel-t graph, and |rel| < 0.2 if there is no earthquake, but be a high positive value when shaking. To prove the independence , we draw the curve of integration as well, it’s almost horizontal in the most of time. In fact, these dimensions : buildings, power, sewer_and_water, roads_and_bridges, medical , are all well independent, unless an earthquake occurs. And this is a support for using a the sum of them as a combinational dimension as a comprehensive coefficient.

https://lh6.googleusercontent.com/tXuTgGRrRb3WkOW6JDBcRXAgJ9qbeIB10jHRnBgHn1dk8QXtk_GALiK5qivpiCs1h0z1DuYCt8JntZUUby83Otbm6soDD1u8xTZVaN0lJa6Ua6HW6PVvWjyr5efFF5IWpyCojKPF

The last one is the uncertainty and reliability. As we any analyzed before, the proportion of reliable reports is the reliability of a district, this is obviously a variable changing over time. The figure below shows how reliability changes over time for each district

Red : reliability < 70%

Green : Reliability > 70%

We can find the approximate time of each earthquake in this figure, because the reliability changed a lot at each point. There is also one thing worth a mentioning, the change at the third quake (aftershock) is not as significant as it was at the first 2 quakes. In fact the aftershock is very similar to the majorquake, and is so close to it in time as well, so that citizens were not able to return to a safe life before the third one occured. They 2 seems to be successed events. This figure povides a new tool to compare these districts.

https://lh5.googleusercontent.com/jognF_ayJyAGUVR_nFwMr3ucdPT1UR5iropnoUz7xkF5AwOJAn5mcopk6NZwLqy_tgqfiMOUA4tR_Mn-3KfRGWmM9OHf2rWv1QmjRTpH-bLK3uSJGKRSVJr_SKyF5Gun0dFolzbi

Besides, we would concern about the change of damages in each aspect. It would be easy to find out damage condition’s change using our system. Take sewer and power as an example, select report value 6 to 10 to focus on greater damages. Select points that are closed to “sewer_and_water” button, then strong report values of sewer and power would be shown on the map. During the major quake, obviously, district 14, 12, 4 and 18 are the strongest hit.

https://lh3.googleusercontent.com/qUwG5PvrdsrCyAPIyj8YqwmKJ4aIWGjHugOvNK-Yh1N5rhZwUYwv7_h_7BOW6QYuiY4D43p96GQBexvZV4BL565Gnc7vMzBMJNK2v5tsgNcwzl8kuwPPk59kye4KKt8SbdlKck4W

When it comes to the aftershock, as is shown below, we can see that severe reports have greatly decreased. Sewer and water in district 4 and 16 are still strongly hit, while that in district 14, 18 and 12 may have been repaired.

https://lh5.googleusercontent.com/808k67OmTv8TGqW5lmfexOe_iiLja_HiUABQru-ovsF5cGVP6A6aHKFtBPFBzPObCrkc0wzXTCLO_jXHv7i8G_wB7gdOqB4SyOt6MbF1OZr9R_cfsRB_ZorAgBa_1cqX551-H6TT

4The data for this challenge can be analyzed either as a static collection or as a dynamic stream of data, as it would occur in a real emergency. Describe how you analyzed the data - as a static collection or a stream. How do you think this choice affected your analysis? Limit your response to 200 words and 3 images.

Generally we analyze data statically, while at the same time the system supports dynamic data processing, and can alert when abnormal reports flood in. At a static view, we can analyze former events carefully, such as exploring the three shocks’ effect on different dimensions of city. Meanwhile, it would be more important to alarm in time when unusual condition happens when data is treated as a stream.