TwitterStand: Separating the Wheat from the Chaff in Breaking News

Twitter is an electronic medium that allows a large user populace to communicate with each other simultaneously. Inherent to Twitter is an asymmetrical relationship between friends and followers thereby providing an interesting social network-like structure among the users of Twitter. Twitter messages, called tweets, are restricted to 140 characters and thus are usually very focused. We investigate the use of Twitter to build a news processing system from Twitter tweets. The idea is to capture tweets that correspond to late breaking news. The result is analogous to a distributed news wire service. The difference is that the identities of the contributors/reporters are not known in advance and there may be many of them. The tweets are not sent according to a schedule. The tweets occur as news is happening and are noisy while usually arriving at a high throughput rate. Some of the issues include removing the noise, determining tweet clusters of interest bearing in mind that the methods must be online, and determining the relevant location associated with the tweets (and accessing it with a map query interface) rather than the locations from where the tweets are sent.

NSF Grant IIS-09-48548

System Site: See also

Relevant Publications:

  1. J. Sankaranarayanan, H. Samet, B. Teitler, M.D. Lieberman, J. Sperling
    TwitterStand: News in tweets.
    In D. Agarwal, W. G. Aref, C.-T. Lu, M. F. Mokbel, P. Scheuermann, C. Shahabi, and O. Wolfson, editors, Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42-51, Seattle, WA, November 2009.[link]
    Categories: [spatio-textual search engine, Twitter]

  2. G. Quercini, H. Samet, J. Sankaranarayanan, M. D. Lieberman
    Determining the spatial reader scopes of news sources using local lexicons.
    In A. El Abbadi, D. Agrawal, M. Mokbel, and P. Zhang, editors, Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 43-52, San Jose, CA, November 2010.[link]
    Categories: [spatio-textual search engine]