Multi-Task Learning with Neural Networks for Context-aware Voice Search

We tackle the challenge of understanding voice queries posed on an entertainment platform, where consumers direct speech input at their “voice remotes”. Such queries range from specific program navigation (i.e., watch a movie) to requests with vague intents and even queries that have nothing to do with watching TV. We present successively richer neural network architectures with multi-task learning for general query understanding.


Related Papers: KDD'18, CIKM'17, SIGIR'18


Temporal Context Modeling for Tweet Search

We explore techniques to model two different types of temporal signals for tweet search: pseudo trend and query trend. The pseudo trend is to estimate the distribution of relevant documents through timestamps of a list of initial retrieved documents, while the query trend is to estimate the distribution of relevant documents directly from the collection statistics of query terms, obviating the need for an initial retrieval. Various methods, including continuous HMM, RNN-based models, and regression models, are proposed to model and combine these two types of signals to improve document ranking.

Related Papers: ICTIR'17, SIGIR NeuIR'17, ICTIR'16, ECIR'16, ECIR'15, SIGIR'14

Multi-Perspective Relevance Matching with Hierarchical ConvNets for Tweet Search

Despite substantial interest in applications of neural networks to information retrieval, neural ranking models have only been applied to standard ad hoc retrieval tasks over web documents. We propose a novel neural ranking model specifically designed for tweet search, where we identify document length, informal language, and heterogeneous relevance signals as features that distinguish documents in our domain.

Related Papers: arXiv'18


Multi-Perspective Semantic Matching for Question Answering

We propose a novel pairwise learning-to-rank approach with neural networks for question answering. Our approach enjoys flexibility in that independent pointwise neural network models can be used as underlying plug-in components. We examine its effectiveness on the two main categories of deep learning models: one is a convolutional neural network based model, and the other one is a LSTM-based model. Experiments on both TrecQA and WikiQA datasets show our pairwise ranking approach achieves the state-of-the-art performance, without the need for external knowledge sources or feature engineering.

Related papers: SIGIR'17, CIKM'16, EMNLP'15, SemEval'16



Infrastructure for Supporting Exploration and Discovery in Web Archives

We present an open-source platform, Warcbase, for managing web archives built on the distributed datastore HBase. Our system provides a flexible data model for storing and managing raw content as well as metadata and extracted knowledge. Tight integration with Hadoop provides powerful tools for analytics and data processing. Relying on HBase for storage infrastructure simplifies the development of scalable and responsive applications.

Project Homepage: Warcbase, Related Paper: TempWeb'14

Spatio-Textual Similarity Join

Given a collection of geo-tagged objects with associated textual descriptors(i.e. tweets), we study the problem of the spatio-textual similarity join (STJoin) that identify all pairs of similar objects that are close in distance. We propose two approaches to tackle this problem. One approach is to start with a spatial data structure, traverse regions and apply a previous algorithm for identifying similar pairs of textual documents called All-Pairs. An alternative approach is to construct a global index but partition postings spatially and modify the All-Pairs algorithm to prune candidates based on distance. We evaluate these approaches on two real-world datasets and observe suprising performance gains in multi-threaded setting.

NewsStand System (Developed by Prof.Samet), Related Paper: SIGSPATIAL'14