Hu, C., Rose, A., Bederson, B. (March 2009)
Text location in scanned documents is important for selection, search, and other interactions with visual presentations of scanned books. In this paper, we describe a work flow to extract and verify text locations using commercial software, along with free software products and human proofing. Our method uses Adobe Acrobat’s OCR functionality, but can be easily adapted to other OCR software products. To help mid-sized digital libraries, we are making our solution available as open source software.