You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format. However, this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the Department of Computer Science of the University of Maryland at College Park under terms that include this permission. All other rights are reserved by the author(s).
Keywords: Document understanding, logos, document databases, similarity invariants
The problem of logo recognition is of great interest in the document domain, especially for document databases. By recognizing the logo we obtain semantic information about the document which may be useful in deciding whether or not to analyze the textual components. Given a logo-like region from a document image and a logo database, we would like to determine if the region corresponds to a logo in the database. Similarly, if we are given a logo-like region and a document database, we wish to determine if there are any documents in the database of similar origin. Both problems require indexing into a possibly large model space.
In this paper, we present a multi-level approach to logo recognition which uses text and contour features to prune the database and similarity invariants to obtain a more refined match. We outline our methods for page segmentation, feature extraction and indexing and demonstrate our approach on a database of approximately sixty logos.
The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs.
Keywords: Document Understanding, Handwriting, Recovery
THESIS - Advisor: Dr. Azriel Rosenfeld
Many document image understanding problems require a more comprehensive examination of document features than is typically deemed necessary for recognition tasks. We believe that these problems require a detailed analysis of stroke and sub-stroke features in the document image with the goal of obtaining information about the environment or process which created the document and establishing a context for understanding.
We introduce the concept of recovery into the document domain. We provide a ``stroke platform'' representation which establishes a verifiable ``link to the pixels'' and demonstrate its usefulness for recovery tasks. This representation allows us to overcome many of the problems associated with the rapid, irreversible abstraction associated with traditional document processing methods and provides the basic framework for our analysis of handwritten documents. By obtaining a detailed description of the document and its properties, we are able to establish a context for analysis and validate assumptions about the domain.
This dissertation presents our work on several document image understanding problems including: 1) demonstrating the successful use of the stroke platform for the problem of interpreting and reconstructing junctions and endpoints; 2) exploring the effects of the handwriting process on the document by the development of a model for instrument grasp and a study of its effects on pressure features, 3) posing and providing an approach to the problem of recovering temporal information from static images of handwriting, 4) addressing various sub-tasks of the problem of processing form documents, and 5) extending the detailed analysis philosophy to demonstrate its feasibility in related document domains.
In this paper, we present a multi-level approach to logo recognition which uses text and contour features to prune the database and similarity invariants to obtain a more refined match. We outline our methods for page segmentation, feature extraction and indexing and demonstrate our approach on a database of approximately sixty logos.
The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs.