Context-Aware Voice Search with Hierarchical Recurrent Neural Networks [PDF]

We tackle the novel problem of navigational voice queries posed on the XFINITY entertainment system, where viewers interact with a voice-enabled remote controller to specify the program (i.e., Game of Thrones) to watch. This is a difficult problem for several reasons: such queries are short, which offers few opportunities for deciphering user intent. Furthermore, ambiguity is exacerbated by underlying speech recognition errors. We address these challenges by modeling voice search sessions to capture the contextual dependencies in query sequences, which is accomplished with a probabilistic framework in which recurrent and feedforward neural network modules are organized in a hierarchical manner. We demonstrate the benefits of our context-aware model on a large-scale real dataset, which significantly outperforms models without context as well as the current deployed product.


Deep Learning Approaches for Question Answering

In this project, we propose a novel pairwise learning-to-rank approach with neural networks for question answering. Our approach enjoys flexibility in that independent pointwise neural network models can be used as underlying plug-in components. We examine its effectiveness on the two main categories of deep learning models: one is a convolutional neural network based model, and the other one is a LSTM-based model. Experiments on both TrecQA and WikiQA datasets show our pairwise ranking approach achieves the state-of-the-art performance, without the need for external knowledge sources or feature engineering.

Our team's related paper: SIGIR'17, CIKM'16, EMNLP'15, SemEval'16

Mining Temporal Characteristics for Tweet Search

Th¬Ćere is an emerging consensus that time is an important indicator of relevance for the task of searching a stream of social media posts. In this project, we studied two types of temporal signals for Tweet search: one is to infer the distribution of relevant documents through the distribution of document timestamps from the results of an initial query; the other way is to estimate the distribution of relevance directly from the term statistics in time dimension.

Related Paper: ICTIR'16, ECIR'16, ECIR'15, Our team's other paper: SIGIR'14

Infrastructure for Supporting Exploration and Discovery in Web Archives

We present an open-source platform, Warcbase, for managing web archives built on the distributed datastore HBase. Our system provides a flexible data model for storing and managing raw content as well as metadata and extracted knowledge. Tight integration with Hadoop provides powerful tools for analytics and data processing. Relying on HBase for storage infrastructure simplifies the development of scalable and responsive applications.

Project Homepage: Warcbase, Related Paper: TempWeb'14

Spatio-Textual Similarity Join

Given a collection of geo-tagged objects with associated textual descriptors(i.e. tweets), we study the problem of the spatio-textual similarity join (STJoin) that identify all pairs of similar objects that are close in distance. We propose two approaches to tackle this problem. One approach is to start with a spatial data structure, traverse regions and apply a previous algorithm for identifying similar pairs of textual documents called All-Pairs. An alternative approach is to construct a global index but partition postings spatially and modify the All-Pairs algorithm to prune candidates based on distance. We evaluate these approaches on two real-world datasets and observe suprising performance gains in multi-threaded setting.

NewsStand System (Developed by Prof.Samet), Related Paper: SIGSPATIAL'14