next up previous contents
Next: Quantum Computing Up: res10 Previous: Signal Processing and Control   Contents

Information Retrieval

The semi-discrete decomposition, developed with Shmuel Peleg for image compression, has proved quite useful in latent semantic indexing, a method of document retrieval [C17] [J48].

Methods for document summarization based on hidden Markov models and matrix decompositions are studied in [J62]. We demonstrated the success of the methods for summarizing medical documents in [C26]. Our methods have been quite successful in the DUC (Document Understanding Conference) and TREC competitions [C20],[C21],[C22],[C25],[C27],[C28],[C29] and recently they performed as well as human summarizers in an evaluation on summarizing multi-lingual document sets [C30]; this shows that our summarizer is quite good, but also that the evaluation metrics are quite primitive! Further information about our summarization work is available in [C31],[C32],[C33],[C34].

A full retrieval system that processes a query, clusters the resulting documents, and creates summaries of each cluster is presented in [J82] and available at http://stiefel.cs.umd.edu:8080/qcs/

[C17]
Tamara G. Kolda and Dianne P. O'Leary, ``Latent Semantic Indexing via a Semi-Discrete Matrix Decomposition," in The Mathematics of Information Coding, Extraction and Distribution, George Cybenko, Dianne P. O'Leary, and Jorma Rissanen, eds., IMA Volumes in Math. and Its Applics., Springer-Verlag, New York, 1999, 73-80.

[C20]
J. M. Conroy, J. D. Schlesinger, D. P. O'Leary, and M. E. Okurowski, ``Using HMM and Logistic Regression to Generate Extract Summaries for DUC," DUC 01 Conference Proceedings, 2001. http://duc.nist.gov/
[C21]
Lynn Carlson, John M. Conroy, Daniel Marcu, Dianne P. O'Leary, Mary Ellen Okurowski, Anthony Taylor, and William Wong, ``An Empirical Study of the Relation between Abstracts, Extracts, and the Discourse Structure of Texts," DUC 01 Conference Proceedings, 2001. http://duc.nist.gov/
[C22]
J. D. Schlesinger, M. E. Okurowski, J. M. Conroy, D. P. O'Leary, A. Taylor, J. Hobbs, H. T. Wilson, ``Understanding Machine Performance in the Context of Human Performance for Multi-document Summarization," DUC 02 Conference Proceedings, 2002. http://duc.nist.gov/
[C25]
Daniel M. Dunlavy, John M. Conroy, Judith D. Schlesinger, Jade Goldstein, Sarah A. Goodman, Mary Ellen Okurowski, Dianne P. O'Leary, and Hans van Halteren, ``Performance of a Three-Stage System for Multi-Document Summarization," DUC 03 Conference Proceedings, 2003.
[C26]
Daniel M. Dunlavy, John M. Conroy, Timothy J. O'Leary, and Dianne P. O'Leary, ``Clustering and Summarizing Medline Abstracts", BISTI 2003 Symposium on Digital Biology: The Emerging Paradigm, National Institutes of Health Biomedical Information Science and Technology Initiative (BISTI), 2003.
[C27]
John M. Conroy, Judith D. Schlesinger, Jade Goldstein, and Dianne P. O'Leary, ``Left-Brain/Right-Brain Multi-Document Summarization," DUC 04 Conference Proceedings, 2004.
[C28]
John M. Conroy, Judith D. Schlesinger, Dianne P. O'Leary, and Jade Goldstein, ``Back to Basics: CLASSY 2006," DUC 06 Conference Proceedings, 2006. http://duc.nist.gov/

[C30]
John M. Conroy, Dianne P. O'Leary, and Judith D. Schlesinger, ``CLASSY Arabic and English Multi-Document Summarization", in Multi-Lingual Summarization Evaluation 2006.
http://www.isi.edu/$\sim$cyl/MTSE2006/MSE2006/papers/index.html,

[C31]
John M. Conroy, Judith D. Schlesinger and Dianne P. O'Leary, ``Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score", Proceedings of the ACL'06/COLING'06, 2006.
[C32]
John M. Conroy, Judith D. Schlesinger, and Dianne P. O'Leary, ``CLASSY 2007 at DUC 2007," Document Understanding Conference DUC 2007, HLT-NAACL, Rochester, NY, April 26, 2007.

[C33]
Nitin Madnani, Rebecca Passonneau, Necip Fazil Ayan, John M. Conroy, Bonnie J. Dorr, Judith L. Klavans, Dianne P. O'Leary, and Judith D. Schlesinger, ``Measuring Variability in Sentence Ordering for News Summarization," 11th European Workshop on Natural Language Generation (ENLG07) Schloss Dagstuhl, Germany, June 17-20, 2007.

[C34]
Judith D. Schlesinger, Dianne P. O'Leary, and John M. Conroy, ``Arabic/English Multi-document Summarization with CLASSY - The Past and the Future," CICLing Conference on Intelligent Text Processing and Computational Linguistics, Haifa, Israel, February 17-23, 2008. in Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science Volume 4919, Springer Berlin, (2008) 568-581. http://dx.doi.org/10.1007/978-3-540-78135-6_49

[J48]
Tamara G. Kolda and Dianne P. O'Leary, ``A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval," ACM Transactions on Information Systems, 16 (1998) 322-346.
[J62]
Judith D. Schlesinger, John M. Conroy, Mary Ellen Okurowski, and Dianne P. O'Leary, ``Machine and Human Performance for Single- and Multi-Document Summarization," IEEE Intelligent Systems (special issue on Natural Language Processing) 18(1), 2003, 46-54.
[J82]
Daniel M. Dunlavy, Dianne P. O'Leary, John M. Conroy, and Judith D. Schlesinger, ``QCS: A System for Querying, Clustering, and Summarizing Documents," Information Processing and Management, 43:6 (2007), pp. 1588-1605. DOI:10.1016/j.ipm.2007.01.003


next up previous contents
Next: Quantum Computing Up: res10 Previous: Signal Processing and Control   Contents
Dianne O'Leary 2010-06-16