VAST Challenge 2019
Mini-Challenge 3
No
80 hours
Yes
The City has been using Y*INT to communicate with its citizens, even post-earthquake. However, City officials needs additional information to determine the best way to allocate emergency resources across all neighborhoods of St. Himark. Your task, using your visual analytics on the community Y*INT data, is to determine the types of problems that are occurring across the St. Himark. Then, advise the City on how to prioritize the distribution of resources. Keep in mind that not all sources on Y*INT are reliable, and that priorities may change over time as the state of neighborhoods also changes.
Limit your response to 1000 words and 12 images.
Plotting the frequency of messages that mention quaking, shaking, or otherwise moving objects shows three distinct times when an earthquake of some form likely occurs. The foreshock occurred at 2:33PM on April 6th 2020, the mainshock at 8:36AM on April 8th 2020, and the aftershock at 3:05PM on April 9th 2020. The mainshock is assumed to be the event referred to as "the earthquake".
The foreshock is first announced by "Earthquake Prediction Center" and is described as being minor.
The mainshock is first announced by "EarthquakeSeers" and is identified to be quite severe, with reports of structural damage and both volunteer and official responders dispatched.
The aftershock is not officially announced by any source and there is no indication of its severity.
The Old Town area has the highest frequency of message activity on Y*INT at the 5-hour mark after the earthquake, when typically Downtown has the most activity. The biggest issues facing the city at this time are water contamination from sewer line breaks and hospital closures. Many bridges are still out of service and there are scattered reports of power outages and structurally compromised buildings.
In northern locations, including Old Town, the predominant complaints involve water contamination, likely caused by broken sewer lines. Any sewer repair crews and clean water delivery efforts would best be directed there.
In southern locations, such as Scenic Vista, there seems to be more of an issue with power outages and fewer mentions of water contamination issues, indicating that any additional power utility repair services should be directed there.
At 30 hours after the earthquake there are still ongoing discussions of water contamination and power outages but at a lower frequency. There are more seemingly official announcements of water distribution stations by 30 hours. There are also more mentions of water in messages that are actually about potential flooding conditions which weren't as prevalent at 5 hours after the earthquake.
In more northwestern areas, such as Downtown, there are still discussions ongoing about dealing with the contaminated drinking water supply, but fewer mentions of power outages. The contaminated water issue doesn't seem resolved but it does seem that citizens have largely found alternative solutions.
In more south eastern areas, such as Broadview, there are messages related to water that indicates potential flooding. The problem caused by broken sewer lines is becoming less of an issue of smell than one of potentially contaminated flood waters. Having responders investigate flood control measures and redirecting sewer maintenance to these regions may be timely.
Limit your response to 1000 words and 10 images.
The news station announced that heavy rain was forecast for the morning of April 10th. Counts of messages from users talking about current rain, as opposed to the future chance of rain, confirm that rains started around 4:30 in the morning.
A comparison of the location of the messages involving reports of flooding or landslides before and after the rains begin show that the focus moves from the northwest to the southwest neighborhoods. Additionally, many of the messages before the rains that mention flooding clarify that their basement is flooding despite not having any rain, implying that the flooding there may have occurred due to broken water lines. Therefore, responders addressing the increase in flooding reports in the southwest should assume that people need to be removed from at risk areas before significant landslides occur. And there is likely no need to redirect any personnel responsible for working on the water lines.
Plotting utility outage mentions over time reveals that there is a significant drop in discussion related to utility outages in all neighborhoods that coincides exactly with the timing of the aftershock. It seems unlikely that the aftershock helped to resolve any of the utility outage problems so a more likely explanation may be that conditions worsened due to the aftershock and other issues became the focal point.
The most likely guess would be that collapsing buildings shifted everyone's priorities to finding shelter or evacuating. Message frequency of concepts related to trying to escape did not subside following the aftershock, but they also didn't experience a large increase either. However, this marks a point at which any city resources working on restoring infrastructure may want to switch attention to encouraging evacuation and helping with search and rescue efforts if able.
Starting around 2:30PM after the earthquake it was announced that the HSS team would be deployed and discussion about the nuclear power plant increased. There are messages that inquire or speculate about a leak at the plant but there are no confirmation messages from any potential source or discussion about radiation in general. However, given that the location of the discussion regarding the nuclear plant begins to focus in on the Always Safe neighborhood after the earthquake it could be an indication that previously idle discussion about the safety of nuclear power is shifting to more legitimate concerns. Given that the Always Safe plant isn't inviting or asking for outside help it may be worth the city independently investigating the plant to make sure it doesn't become an additional problem to deal with.
Limit your response to 800 words and 8 images.
There are a few main figures in the social media scene of St. Himark that account for a lot of the high profile discussion (frequently reposted) on the platform. Their interactions were examined by exploring the network of mentions they were involved in and viewing their message streams. Top players include:
The primary sources of information on the Y*INT platform were determined by looking for highly reposted messages around the time of determined events or involved in particular topics.
Additional network between marketing campaign posts and from the DOT replying to citizens were found but didn't reveal actionable patterns.
Determining sentiment in this type of data set was not very useful because of the predominately negative polarity of the topics being tweeted. But there is a very general pattern of sentiment dropping at the time of the earthquake on Wednesday and staying low from then on. Morale in the city was definitely low, with only a handful of topics on social media causing positive bumps, including Lacki Dasical's musical performance and the circus elephants being used to help clear rubble.
Interestingly, even a major disaster and substantial disruption does not appear to have diminished the amount of angry insults and commercial offers that tend to be posted to the Y*INT platform in the early morning hours (around 1AM), the result of which can be seen as dips in the sentiment plot above.
Generally speaking, the community of St. Himark was largely able to use the social network to connect people in need with resources even though official channels were not as present on the Y*INT system. Examples include the school announcing itself as a having available shelter, some individuals offering their own homes as shelter, and some citizens responding to the local museum's call for help in saving artifacts from flooding.
Limit your response to 200 words and 3 images.
The data was analyzed as a static collection. Because of this approach, we were able to determine that most of the messages in the collection were gibberish and focus our analysis on patterns exhibited in non-gibberish messages. The process of information extraction relied on both out-of-the-box rules in our software and custom rules that were written by a linguist. Identifying topics relied on machine learning and the two processes were integrated in the identification of categories. With a model created in this manner, we could now apply it to comparable streaming data with reasonable expectations of performance of similar quality.
It would have been more difficult to find inflection points in metrics like utility outages or topics in a timeline fashion if using only time series visualization. To address streaming data, we would have taken the approach of monitoring a window of each metric and flagging times when the metric had a sufficient variance from its typical historic values. This would result in a higher false positive rate and so would require more humans in the loop to evaluate whether an issue was real.
We wouldn't have had a metric of how many times a message was ultimately reposted available if the data was analyzed as a stream. To compensate for this, we would have built a model of the typical influence level of users to predict how likely it was for a message to be reposted to identify higher profile information.