A Graph Analytical Approach for Fast Topic Detection


Topic detection with large and noisy data collections such as social media must address both scalability and accuracy challenges. KeyGraph is an efficient method that improves on current solutions by considering keyword cooccurrence. We show that KeyGraph has similar accuracy when compared to state-of-the-art approaches on small, well-annotated collections, and it can successfully filter irrelevant documents and identify events in large and noisy social media collections. An extensive evaluation using Amazon's Mechanical Turk demonstrated the increased accuracy and high precision of KeyGraph, as well as superior runtime performance compared to other solutions.

Download source code in Java:


An example of KeyGrpah and extracted topics/events:

Here is also the number of documents per day for topic US Presidential Election found by KeyGraph (each color shows a subevent) versuse Google Trends(here) for the query "2008 Presidential Election":