Skip to main content

Carlea Holl-Jensen||cholljen@umd.edu


HCIL-2008-28

Lin, J. (July 2008)
Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pages 419-428, October 2008, Honolulu, Hawaii. [Published Version]
HCIL-2008-28

This paper explores the challenge of scaling up language processing algorithms to increasingly large datasets. While cluster computing has been available in industrial environments for several years, academic researchers have fallen behind in their ability to work on large datasets. We discuss two challenges contributing to this problem: lack of a suitable programming model for managing concurrency and difficulty in obtaining access to hardware. Hadoop, an open-source implementation of Google’s MapReduce framework, provides a compelling solution to both issues. Its simple programming model hides systemlevel details from the developer, and its ability to run on commodity hardware puts cluster computing within reach of many academic research groups. This paper illustrates these points with a case study on building word cooccurrence matrices from large corpora. We conclude with an analysis of an alternative computing model based on renting instead of buying computer clusters.



Temporal Visualizations Screenshot

Temporal Visualizations
More information

Tech Reports
Video Reports
Annual Symposium

News
Seminars + Events
Calendar
HCIL Seminar Series
Annual Symposium
HCIL Service Grants
Events Archives
Awards
HCIL Conference Travel Award
Job Openings
For the Press
HCIL Overview
Become a Member
Collaborators
Collaborating Groups + People
Academic Visitors
Join our Mailing List
Contact Us
Visit Us
HCIL Store
Give the HCIL a Hand
HCIL T-shirts for Sale
Our Lighter Side
HCIL Memories Page
Faculty/ Staff
Students
Ph.D. Alumni
Past Members
Research Areas
Communities
Design Process
Digital Libraries
Education
Physical Devices
Public Access
Visualization
Research Histories
Faculty Listed by Research
Project Highlights
Project Screenshots
Publications and TRs
Videos
Books
Products
Presentations
Studying HCI
Masters in HCI
PhD in HCI
Visiting Scholars
Class Websites
Sponsor our Research
Sponsor our Annual Symposium
Active Sponsorship
Industrial Visitors