PhD Proposal: Local News and Events Detection using Tweets in Twitter
Twitter, one of the most popular micro-blogging services, allows users to publish short messages on a wide variety of subjects such as news, events, stories, ideas, and opinions, called Tweets. The popularity of Twitter, to some extent, arises from its capability of letting users promptly and conveniently contribute tweets to convey diverse information. Specifically, with people discussing what is happening outside in the real world by posting tweets, Twitter captures invaluable information about real-world news and events, spanning a wide scale from large national or international stories like a presidential election to small local stories such as a local farmers market. Detecting and extracting small news and events for a local place is a challenging problem and the focus of this proposal. In this proposal, we will explore several directions to extract and detect local news and events using tweets in Twitter: a) how to identify local influential people on Twitter for potential news seeders; b) how to recognize unusualness in tweet volume as signals of potential local events; c) how to overcome the data sparsity of local tweets to detect more and smaller undergoing local news and events.In the first part, we investigate how to measure the spatial influence of Twitter users by their interactions and thereby identify the locally influential users, which we found are usually good news and event seeders in practice. In order to do this, we built a large-scale directed interaction graph of Twitter users. Such a graph allows us to exploit PageRank based ranking procedures to select top local influential local people after innovatively incorporate geographical distance to the transition matrix used for the random walking.In the second part, we study how to recognize the unusualness in tweet volume at a local place as signals of potential ongoing local events. The intuition is that if there is suddenly an abnormal change in the number of tweets at a location (e.g., a significant increase), it may imply a potential local event. We therefore present DeLLe, a methodology for automatically Detecting Latest Local Events from geotagged tweet streams (i.e., tweets that containing GPS points). With the help of novel spatiotemporal tweet count prediction models, DeLLe first finds unusual locations which have aggregated unexpected number of tweets in the latest time period and then calculates, for each such unusual location, a ranking score to identify the ones most likely having ongoing local events by addressing the temporal burstiness, spatial burstiness and topical coherence.In the third part, we explore how to overcome the data sparsity of local tweets when trying to discover more and smaller local news or events. Local tweets are those whose location fall inside a local place. They are very sparse in Twitter, which hinders the detection of small local news or events that have only a handful of tweets. A system, called Firefly, is proposed to enhance the local live tweet stream by tracking the tweets of a large body of local people, and further perform a locality-aware keyword based clustering for event detection. The intuition is that local tweets are published by local people, and tracking their tweets naturally yields a source of local tweets. However, in practice, only 20% Twitter users provide information about where they come from. Thus, a social network-based geotagging procedure is subsequently proposed to estimate locations for Twitter users whose locations are missing.
Chair: Dr. Hanan Samet Dept. rep: Dr. Udaya Shankar Members: Dr. Larry Davis Dr. David Mount