Chen Guo, James Madison University guo4cx@jmu.edu PRIMARY
Xiang Liu, Purdue University xiang35@purdue.edu
Evie Cai, West Lafayette High School eviecai03@gmail.com
Yingjie Victor Chen, Purdue University victorchen@purdue.edu
Zhenyu Cheryl Qian, Purdue University qianz@purdue.edu
Rui Li, Jiangnan University lrcoolb@jiangnan.edu.cn
Student Team: NO
For spelling correction: Bing Spell Check, SymSpellpy.
The backend: Python, Gensim, Stanford NLP Tools http://www-nlp.stanford.edu/
The front end was built upon HTML5, D3.js, pyLDAvis.js, textplorer.js
Approximately how
many hours were spent working on this submission in total?
200 hours
May we post your
submission in the Visual Analytics Benchmark Repository after VAST Challenge
2019 is complete? YES
Video
1) YouTube Link: https://youtu.be/fOFR_BT7_74
2) Download from TopicInk: Visualizing Disaster Textual Data using LDA Topic Modeling
Questions
The City has been using Y*INT to communicate with
its citizens, even post-earthquake. However, City officials needs additional
information to determine the best way to allocate emergency resources across
all neighborhoods of St. Himark. Your task, using
your visual analytics on the community Y*INT data, is to determine the types of
problems that are occurring across the St. Himark.
Then, advise the City on how to prioritize the distribution of resources. Keep in mind that not all sources on Y*INT
are reliable, and that priorities may change over time as the state of
neighborhoods also changes.
MC 3.1 – Using visual analytics, characterize conditions
across the city and recommend how resources should be allocated at 5 hours and
30 hours after the earthquake. Include
evidence from the data to support these recommendations. Consider how to allocate resources such as
road crews, sewer repair crews, power, and rescue teams. Limit your response to
1000 words and 12 images.
We adopt the LDA modeling tool to identify 12 distinct content topics in the messages, with a coherent value of 0.58 for the current analysis after thorough subjective evaluations of different models. After examining the salient terms, mentioners, hashtags, NER entities as well as a qualitative analysis of the dominant replies in each topic, we find the following 12 frames ranked by its prevalence in the messages (Fig.1): The earthquake happened at 4/8/20 8:36 am. EarthQuakeSeers posted a message “ALERT: A 6.7 earthquake just occurred off the NE shore of the town of St. Himark. This could be severe. Expect heavy damage. “ It caused major damage in the old town and the safe town. People in other towns such as Scenic Vista, Terrapin Springs, Easton, West Parton, Palace Hills, Southton noticed things moved and felt sharp shaking. How to allocated resources at 5 hours: We filtered messages at 5 hours after the earthquake. The word cloud view shows the frequency of keyword occurrence. The map view displays the spatial distribution of messages. The size of each circle is corresponding to the number of messages posted in each location. We found that people were talking about water, bridge, safe, power, inspection, nuclear, and fire.

Topic 1: house status and information in earthquake, such as communication lines, apps, stations, emergency communication network, and rescues.
Topic 2: people need help to find shelters, cats, ferrets, etc. Shelters were crowded. Bottled water, blankets, first aid and food were needed.
Topic 3: Transportation information. Bridges were collapsed and closed for safety inspection.
Topic 4: Building status. The shelters, Lacki’s building, some houses, etc were collapsed. People were trapped in collapsed buildings, but not answering their phones.
Topic 5: Nuclear power plant and always safe company. Always Safe nuclear power plant has shut down for inspection after the earthquake. Power was out after the quake and nuclear power was restored in many spots one day later.
Topic 6: The distribution, quantity, and storage related information on food, clothes, tents and other relief supplies; reserve aid manufacturers and their daily production capacity information.
Topic 7: People were running out of supplies. They need to stock up on meds, food, water, gas, etc. A lot of people got lost and some were missing, especially a singer named Lacki Dasical.
Topic 8: Rumors spread after the earthquake. People heard that the city would evacuate. They were worried that there would be a tsunami. This topic contains water contamination and sewer broken problems as well.
Topic 9: What was happening during the earthquake? Things were moved like crazy and sirens went off from the power plant.
Topic 10: The rescue teams. SHM and HSS cooperation makes the city safer. The city is getting better with roads and bridges re-opended.
Topic 11: Food and water are needed. Information on the fire department, library, and local news.
Topic 12: Many people panicked, and fatalities rumors spread among the crowd. They were frustrated about not being able to enter their red-tagged homes to access their valuables.
Our suggestion: The government should distribute bottled water in all majorly affected towns and repair the broken water and sewer pipes as soon as possible.
Evidence: A representive message is as follows:
4/8/20 13:05, Water is contaminated. Serious reactions reported in the following neighborhoods Old Town,Safe Town,Scenic Vista,Broadview,Chapparal, 10, TVHostBrad, Downtown, 20339

We also filtered and analyzed the messages posted at 30 hours after the earthquake and found the following patterns:

MC 3.2 – Identify at least 3 times when
conditions change in a way that warrants a re-allocation of city
resources. What were the conditions
before and after the inflection point?
What locations were affected?
Which resources are involved? Limit your response to 1000 words and 10
images.
In order to determine the timestamps of supply reorganization, we use a vertical stream graph to analyze time changes of the messages. The spikes in the stream graph represent the increasing messages around that time which also indicate that emergencies occurred. Meanwhile, the context of keywords was looked at in the message display. A small analysis trend was discovered in the stream graph of topics/keywords. During hours with a bottleneck effect (for example, 12 PM, where there are significantly fewer messages than the hours before or after), more urgent messages were sent that often indicated a time period of re-allocation of resources.
In addition, many buildings in the residences around the Southwest region have structural damages and need support. In fact, many users such as JFleetChambers from Southwest retweeted the message “Our neighborhood has been hit hard. All the old brick buildings have collapsed or are heavily damaged. # neighborhood.” Rescue teams and repair crews will need to have been sent to these locations around Southwest. However, a portion of firefighters should be reallocated to West Parton. DixonWhale31 and many others sent out messages about being trapped in elevators. “St Himark Fire Departhgementhge : If you are thgerapped in an elevator wrokait for us to come rescue you , do nothge athgethgeempthge to climb outhge on your owrokn . Ithge may wrokork in the movies , buthge in real life it is very dangerous.” The last resource that needs to be reallocated is transportation. C15Davis, from Chapparal, sent out a message “Department of Transportation : We need your help . Dachsunds are blocking the main road in Neighborhood 10 . Pick up the dachsunds if you see them so we can clear rubble. Bring them to the Galactic Truth Church at 2nd and Main.” While there is already a rescue team there, the Department of Transportation is necessary for cleaning up the debris.



MC 3.3 – Take the pulse of the
community. How has the earthquake
affected life in St. Himark? What is the community
experiencing outside the realm of the first two questions? Show decision makers
summary information and relevant/characteristic examples. Limit your response
to 800 words and 8 images.
The system contains a scoial network view and a frequency bar chart regarding the most frequently mentioned individuals and companies. We also use Stanford NER to identify the common named entities. The social network illustrates a few active users who are the influential figures in the disaster discussion: @AlwaysSafePowerCompany, @ChloeJohnson, and @TVHostBrad. They posted and retweeted a lot of messages. Some users asked for help on the platform, such as @DerekNolan and @ VanessaCorwin.
By clicking on the labels on the mentioner frequency bar chart, messages related to the corresponding mentioner are shown in the message view. Furthermore, the map, word cloud, and social network view are also presented to the users. Taking advantage of the listed information, we could analyze how this earthquake affected life in St. Himark and answer the question with regard to the community experiencing as follows. The mentioner network is constructed based on 2590 unique YInt accounts connected through mentions.

@AlwaysSafePowerCompany: Based on the frequency map, the always safe power company is the most frequently mentioned name after the disaster. Even though the company announces that citizens are safe and the situations are handled very well, the messages regarding @AlwaysSafePowerCompany contain a lot of complaints from the survivors, especially the always safe power compnay's rescuing strategy. Based on the messages (see the message view in Fig. 17), the always safe power company didn’t apply thoughtful strategy and allocate sufficient resources to recuse the survivors after the disaster. They also didn’t set up enough water stations which causes a shortage of water. People in danger can't also contact the emergency number.
The shortage of water could also be verified from the word cloud view. We can tell that water and help are the most prominent words. The word cloud of occurrence and the corresponding massage view also indicate food is in shortage. All the information mentioned reveal the complaints of the survivors.


After the earthquake, rumor has panicked local people. Even small rumors can make things worse. Topic 8 is all about rumors and things people heard from friends or neighbors. By clicking on topic 8, we are able to see the time changes of this topic in the stream graph, all the messages belong to this topic in the message view, spatial distributions of messages, as well as the word co-occurrences from the word cloud. We can see that a lot of people were saying "the city is evacuating". Someone heard from friends that there were 100 fatalities or even 548 fatalities. Some were worried that there would be a tsunami. No officials came out to provide accurate information or stop rumors. The government should provide accurate reports with disaster-related information to reduce the ambiguity of information through websites, apps, SMS messages, radio broadcasts, etc. So the public will not be panic and feel safer.

MC 3.4 –– The data for this
challenge can be analyzed either as a static collection or as a dynamic stream
of data, as it would occur in a real emergency.
Describe how you analyzed the data - as a static collection or a
stream. How do you think this choice
affected your analysis? Limit your response to 200 words and 3 images.
We mainly analyzed the data as a static collection. A streaming view was also created to help analysts to explore the dynamic changes of keywords, hashtags, mentioners, and NER entities (Fig. 20). It is still a challenge for us to display the number of topics or sub-topics in real time, and the temporal LDA model we use makes the animation really slow. The advantages of the streaming view are that it can maintain situation awareness and provide valuable insights shortly after the emergency has happened. The disadvantages of the stream processing are that the system only shows a fixed number of variables from the stream, and may lose some temporal information of each topic. Additionally, we used machine learning algorithms to fix typos in the messages as well as train models to identify the topics in the textural data. Since the processing is a single pass over the data, streaming analysis is not a good fit for the model training use case, and static analysis can be an effective complement to streaming analysis. Therefore, we developed two views: the streaming view to show the dynamic changes of keywords, hashtags, mentioners, and NER entities in real time, and the analytic view to explore the topics and discover interesting patterns from the text data. In the analytic view, analysts can also view the temporal changes of topics/keywords/mentioners/NER entities over time through brushing the timeline in the stream graph (Fig. 20).
