KeyGraph
A Graph Analytical Approach for Fast Topic Detection
People:
Scalability and accuracy challenges when processing large noisy data
collections have led to a new generation of approaches to solve Topic
Detection and Tracking (TDT) tasks. Traditional TDT approaches process
the data at the document (data record) level. The next generation
operates at the keyword (feature) level and exploits relationships such
as word correlation, word co-occurrence, and word temporal
distribution. LDA topic models are an exemplar of this approach. We
present KeyGraph - a novel and efficient method that improves on this
next generation of topic detection methods. KeyGraph applies graph
analytical methods to efficiently discover topics and their features
(representative keywords). Constellations of keywords are then used to
cluster related documents. We show that KeyGraph has similar accuracy
when compared to the gold standard approaches for topic detection.
Further, KeyGraph can successfully
- Filter noise and identify events in noisy social
media.
- The running time of KeyGraph significantly outperforms
other keyword based approaches such as LDA topic models on large
collections.
Download source
code in Java:
http://keygraph.codeplex.com
Publication:
- H. Sayyadi, M. Hurst, and A. Maykov. "Event
Detection and Story Tracking in Social Streams". to Appear in
Proceeding of 3rd Int'l AAAI Conference on Weblogs and Social Media
(ICWSM09), May 17 - 20, 2009, San Jose, California.(pdf)
- H. Sayyadi, L. Raschid. "A Graph Analytical
Approach for
Fast Topic Detection". under preparation.
An example of KeyGrpah and extracted topics/events:
Here
is also the number of documents per day for topic US Presidential
Election found by KeyGraph (each color shows a subevent) versuse Google
Trends(here) for the query "2008 Presidential Election":