KeyGraph

A Graph Analytical Approach for Fast Topic Detection

People:

Scalability and accuracy challenges when processing large noisy data collections have led to a new generation of approaches to solve Topic Detection and Tracking (TDT) tasks. Traditional TDT approaches process the data at the document (data record) level. The next generation operates at the keyword (feature) level and exploits relationships such as word correlation, word co-occurrence, and word temporal distribution. LDA topic models are an exemplar of this approach. We present KeyGraph - a novel and efficient method that improves on this next generation of topic detection methods. KeyGraph applies graph analytical methods to efficiently discover topics and their features (representative keywords). Constellations of keywords are then used to cluster related documents. We show that KeyGraph has similar accuracy when compared to the gold standard approaches for topic detection. Further, KeyGraph can successfully
  1. Filter noise and identify events in noisy social media. 
  2. The running time of KeyGraph significantly outperforms other keyword based approaches such as LDA topic models on large collections.
Download source code in Java:
http://keygraph.codeplex.com

Publication:


An example of KeyGrpah and extracted topics/events:
KeyGraph


Here is also the number of documents per day for topic US Presidential Election found by KeyGraph (each color shows a subevent) versuse Google Trends(here) for the query "2008 Presidential Election":
keygraphvsgoogle