Entry Name: “UBA-Pustilnik-MC3”

VAST Challenge 2019
Mini-Challenge 3

 

 

Team Members:

 Martín Pustilnik, University of Buenos Aires, mpustil@gmail.com PRIMARY

Mariano Besio, University of Buenos Aires, marianobesio@gmail.com


Student Team:

YES

Tools Used:



SQLServerExpress

R

Wordcloud2

Python

NLTK

VADERSentiment

D3.js

Bootstrap

Custom data visualization developed by the student team for the challenge

Approximately how many hours were spent working on this submission in total?

480 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2019 is complete?

YES 

Video

 uba-pustilnik-mc3-video.wmv

Questions

1 – Using  visual analytics, characterize conditions across the city and recommend how resources should be allocated at 5 hours and 30 hours after the earthquake.  Include evidence from the data to support these recommendations. Consider how to allocate resources such as road crews, sewer repair crews, power, and rescue teams. Limit your response to 1000 words and 12 images.

We consider that the earthquake took place 56 hours after the initial data timestamp, that is April 8th between 8 a.m. and 9 a.m. To support this, we include evidence in Fig. 1 showing a visualization of the social media message stream. The plots display the hourly message flow grouped by categories and augmented with localized sentiment information. Entries were classified in categories related to each resource available: “HELP” for rescue teams, “SEWER” for repair crews, “FIRE DAMAGE” for firefighters, “POWER” for power repair teams and “EARTHQUAKE” for earthquake detection.

fig1
Fig. 1


In order to allocate resources 5 hours after the earthquake, we analyzed hour 61 in Fig. 2 with special emphasis on the messages category and the most significant variations in the last hour.  The stacked bar plot shows that the main registered variation belongs to "POWER" category, followed by "SEWER". Additionally, word-clouds associated with these categories are talking about bridges, water, closed routes, streets and action requests in the Downtown and Scenic Vista neighborhoods. Therefore, we recommend to allocate rescue teams and repair crews to Downtown and power repair teams to Scenic Vista.

fig2
Fig. 2


In order to allocate resources 30 hours after the time of the earthquake, we analyzed hour 86 in Fig. 3 and we noticed that the situation has changed little from the previous hour. In this scenario, the recommended action is to keep the resource allocation from previous hour as seen in Fig. 4, or to disengage resources (i.e. rescue team) since the situation seems to be normalized.

fig3
Fig. 3


2 – Identify at least 3 times when conditions change in a way that warrants a re-allocation of city resources.  What were the conditions before and after the inflection point? What locations were affected? Which resources are involved? Limit your response to 1000 words and 10 images.



We identified the following times when conditions change in a way that warrants a re-allocation of resources:



April 8th between 8 a.m. and 9 a.m. - 56 hours after the initial timestamp

As shown in Fig. 1 (hour 55), the conditions were stable with some messages related to category “POWER” but in good manners and mood. Then, as depicted  in Fig. 2 (hour 56), an earthquake is detected and a mood change is identified in the background color of the plot. To assist the identification of mood changes, we use a green to white/white to red color scale in the background of the plot. In this way, the more positive the messages in the time frame are, the greener the graphic background is and the more negative the messages, the redder the graphic background is. The conditions change and people start sending messages related to category “EARTHQUAKE” specially in Downtown and Cheddarford area. The resources involved in this situation should be rescue teams and firefighters. Also, as it can be seen in the word-cloud, people are asking for information on what is going on.

fig2
Fig. 1
fig2
Fig. 2


April 8th between 1 p.m. and 2 p.m. - 61 hours after the initial timestamp

Fig. 3 (hour 60) shows the conditions were related to “POWER” category issues and “FIRE DAMAGE”. Then, as shown in Fig. 4 (hour 61), a problem related to roads, water and bridges is detected in the vicinity of Downtown and Scenic Vista and categories “HELP”, “SEWERS” and “POWER” are triggered in the visualization. The resources involved in this situation includes rescue teams and road / sewer repair crews.

fig2
Fig. 3
fig4
Fig. 4


April 8th between 8 p.m. and 9 p.m. - 68 hours after the initial timestamp

As shown in Fig. 5 (hour 67), the conditions were related to categories “HELP” and “POWER” issues mostly in Downtown and Old Town areas. Then, in Fig. 6 (hour 68), messages stop relating to words selected in each category but mood gets worse. In this scenario, recommendation is to take the opportunity to different resources teams to regroup and rest.

fig5
Fig. 5
fig6
Fig. 6


April 9th between 8 a.m. and 9 a.m. - 80 hours after the initial timestamp

As shown in Fig. 7 (hour 79), the conditions were stable with some messages related to category “POWER” but no changes surpassing the threshold established. Then, at hour=80 (Fig. 8), messages related to the category “POWER” talking about power plant, food and water are detected in word-cloud of Scenic Vista and Southwest. In this scenario, the recommendation is to send power teams to those areas.

fig7
Fig. 7
fig8
Fig. 8


3 – Take the pulse of the community.  How has the earthquake affected life in St. Himark? What is the community experiencing outside the realm of the first two questions? Show decision makers summary information and relevant/characteristic examples. Limit your response to 800 words and 8 images.



To take the pulse of the community we analyze sentiment in messages posted in St. Himark Y*INT. Even if not all messages are reliable, the average sentiment can be calculated to characterize the community at a precise moment in time. Fig. 1 displays all the data and mood changes along the full crisis period. It is noticeable that mood was good before the crisis and then is disrupted when earthquake-related problems arose. We complement this analysis with  word-clouds that are useful to understand the global sentiment at a given time. For example, in Fig. 2 after 14 hours of data, we can see a minor earthquake report that was not taken seriously, given that many words in the cloud are related to activities like dancing and joking about earthquakes.

fig1
Fig. 1
fig2
Fig. 2


After the earthquake, we detect a growing in negativity as the first hours pass. This can be seen in the word clouds after 58 hours of data (Fig. 3), where many messages were related to the nuclear plant, bridges, and the earthquake. Also, in Fig. 4 (after 59 hours of data), in which word clouds highlights “inspection”, “precautions” and “actions”.

fig3
Fig. 3
fig4
Fig. 4


Moreover, there seems to be several information requests from people in social media and this could be improved from authorities: Fig. 5 (hour 68).

fig5
Fig. 5


4 –– The data for this challenge can be analyzed either as a static collection or as a dynamic stream of data, as it would occur in a real emergency.  Describe how you analyzed the data - as a static collection or a stream. How do you think this choice affected your analysis? Limit your response to 200 words and 3 images.



We analyzed this data as a dynamic stream, as it would occur in a real emergency. This decision necessitated a way to simulate time so we could see a summary of all information available. Time simulation was solved using a slide bar that allows moving back and forth in time at the cost of distorting early data. This distortion is related to the lack of context when showing only the first available messages. In a real life scenario, data stream and time frame length would be fixed so it could be solved just moving to a specific time in the past or let the stream update the current data.



Dynamic streams required using pre-trained models for sentiment analysis and choose on beforehand words associated to each category. Additionally, dynamic re-scaling  was used to display information as data arrives. Another challenge while dealing with dynamic streams is data bursts and processing times. In order to manage this, we decided to sacrifice precision and  discretize time using one hour time frames.