Skip to main content

Carlea Holl-Jensen||


Elsayed, T., Ture, F., Lin, J. (October 2010)
Brute-Force Approaches to Batch Retrieval: Scalable Indexing with MapReduce, or Why Bother?

Modern information retrieval research has evolved a standard workflow that involves first indexing a document collection and then running ad hoc queries sequentially to evaluate retrieval effectiveness using standard test collections. This paper explores how aspects of this workflow might change in a MapReduce cluster-based environment. First, we present and evaluate two algorithms for inverted indexing that take advantage of the programming model's sorting mechanism to different extents. The running times of both algorithms scale linearly in terms of collection size up to 102 million web pages. Second, we show that it is possible to efficiently perform batch query evaluation with MapReduce by scanning all postings lists in parallel, as opposed to sequentially accessing each postings list. Third, we explore an approach that forgoes inverted indexing altogether and simply computes all query-document scores from document vectors themselves. Experimental results challenge us to think differently about previous assumptions in information retrieval, and show that brute force approaches are surprisingly compelling under certain circumstances: parallel scan of postings can effectively take advantage of large clusters and parallel scan of documents fits naturally with ranking functions that use document-level features.

User Interface and Visualization for Electronic Health Records: SharpC at Maryland Screenshot

User Interface and Visualization for Electronic Health Records: SharpC at Maryland
More information

Tech Reports
Video Reports
Annual Symposium

Seminars + Events
HCIL Seminar Series
Annual Symposium
HCIL Service Grants
Events Archives
HCIL Conference Travel Award
Job Openings
For the Press
HCIL Overview
Become a Member
Collaborating Groups + People
Academic Visitors
Join our Mailing List
Contact Us
Visit Us
HCIL Store
Give the HCIL a Hand
HCIL T-shirts for Sale
Our Lighter Side
HCIL Memories Page
Faculty/ Staff
Ph.D. Alumni
Past Members
Research Areas
Design Process
Digital Libraries
Physical Devices
Public Access
Research Histories
Faculty Listed by Research
Project Highlights
Project Screenshots
Publications and TRs
Studying HCI
Masters in HCI
PhD in HCI
Visiting Scholars
Class Websites
Sponsor our Research
Sponsor our Annual Symposium
Active Sponsorship
Industrial Visitors