next up previous
Next: Eigenproblems and Matrix Studies Up: res06 Previous: Optimization

Information Retrieval

The semi-discrete decomposition, developed with Shmuel Peleg for image compression, has proved quite useful in latent semantic indexing, a method of document retrieval [C20] [J48].

Methods for document summarization based on hidden Markov models and matrix decompositions are studied in [J62]. We demonstrated the success of the methods for summarizing medical documents in [C31]. Our methods have been quite successful in the DUC (Document Understanding Conference) and TREC competitions [C25],[C26],[C27],[C30],[C32],[C33], and recently they performed as well as human summarizers in an evaluation on summarizing multi-lingual document sets [C34]; this shows that our summarizer is quite good, but also that the evaluation metrics are quite primitive!

A full retrieval system that processes a query, clusters the resulting documents, and creates summaries of each cluster is presented in [T22] and available at http://stiefel.cs.umd.edu:8080/qcs/

[J48]
Tamara G. Kolda and Dianne P. O'Leary, ``A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval," ACM Transactions on Information Systems, 16 (1998) 322-346.
[J62]
Judith D. Schlesinger, John M. Conroy, Mary Ellen Okurowski, and Dianne P. O'Leary, ``Machine and Human Performance for Single- and Multi-Document Summarization," IEEE Intelligent Systems (special issue on Natural Language Processing) 18(1), 2003, 46-54.

[C20]
Tamara G. Kolda and Dianne P. O'Leary, ``Latent Semantic Indexing via a Semi-Discrete Matrix Decomposition," in The Mathematics of Information Coding, Extraction and Distribution, George Cybenko, Dianne P. O'Leary, and Jorma Rissanen, eds., IMA Volumes in Math. and Its Applics., Springer-Verlag, New York, 1999, 73-80.

[C25]
J. M. Conroy, J. D. Schlesinger, D. P. O'Leary, and M. E. Okurowski, ``Using HMM and Logistic Regression to Generate Extract Summaries for DUC," DUC 01 Conference Proceedings, 2001.

[C26]
Lynn Carlson, John M. Conroy, Daniel Marcu, Dianne P. O'Leary, Mary Ellen Okurowski, Anthony Taylor, and William Wong, ``An Empirical Study of the Relation between Abstracts, Extracts, and the Discourse Structure of Texts," DUC 01 Conference Proceedings, 2001.

[C27]
J. D. Schlesinger, M. E. Okurowski, J. M. Conroy, D. P. O'Leary, A. Taylor, J. Hobbs, H. T. Wilson, ``Understanding Machine Performance in the Context of Human Performance for Multi-document Summarization," DUC 02 Conference Proceedings, 2002. http://duc.nist.gov/
[C30]
Daniel M. Dunlavy, John M. Conroy, Judith D. Schlesinger, Jade Goldstein, Sarah A. Goodman, Mary Ellen Okurowski, Dianne P. O'Leary, and Hans van Halteren, ``Performance of a Three-Stage System for Multi-Document Summarization," DUC 03 Conference Proceedings, 2003.

[C32]
John M. Conroy, Judith D. Schlesinger, Jade Goldstein, and Dianne P. O'Leary, ``Left-Brain/Right-Brain Multi-Document Summarization," DUC 04 Conference Proceedings, 2004.
[C33]
John M. Conroy, Judith D. Schlesinger, Dianne P. O'Leary, and Jade Goldstein, ``Back to Basics: CLASSY 2006," DUC 06 Conference Proceedings, 2006. http://duc.nist.gov/

[C34]
John M. Conroy, Dianne P. O'Leary, and Judith D. Schlesinger, ``CLASSY Arabic and English Multi-Document Summarization", in Multi-Lingual Summarization Evaluation 2006.
http://www.isi.edu/$\sim$cyl/MTSE2006/MSE2006/papers/index.html,

[C35]
John M. Conroy, Judith D. Schlesinger and Dianne P. O'Leary, ``Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score", Proceedings of the ACL'06/COLING'06, 2006.

[T22]
Daniel M. Dunlavy, Dianne P. O'Leary, John M. Conroy, and Judith D. Schlesinger, ``QCS: A System for Querying, Clustering, and Summarizing Documents," SANDIA Technical Report, July 2006.


next up previous
Next: Eigenproblems and Matrix Studies Up: res06 Previous: Optimization
Dianne O'Leary 2006-10-09