Banner
ImageImageImage

Events

February 15, 2008
Semantic-based Language Modeling Approach to Biomedical Literature Retrieval and Mining


Xiaohua (Davis) Zhou
College of Information Science and Technology, Drexel University

Library and Information Science Colloquium
Noon - 1:30pm
Faculty Lounge, Room 323, SCILS, Rutgers University

 

Abstract:

Statistical language modeling is one of the state-of-the-art approaches to information retrieval and text mining. Due to the data sparsity, it is required to smooth the trained language models. Biomedical literature contains a large number of biological and medical terms which often have many synonyms and can be very ambiguous without context. Thus, the direct use of language modeling in biomedical text may cause serious sparsity problem and hurt the performance. In this talk, a context-sensitive semantic smoothing (CSSS) method will be discussed. CSSS automatically extracts meaningful topic signatures (e.g. ontological concepts and multiword phrases) from documents and then statistically maps them into traditional keywords for smoothing purpose. A formal evaluation on the TREC Genomics collection and the OHSUMED collection showed that CSSS could significantly improve the effectiveness of traditional language models in biomedical
literature retrieval, classification, and clustering. The maximum improvement reached 19.9% (average precision), 17.3% (micro-F1), 135% (NMI score) for retrieval, classification, and clustering, respectively.

Biographical Sketch:

Davis Zhou is a PhD candidate in the College of Information Science and Technology at Drexel University, where he has been studying since 2003. He has a BS in Automatic Control, a BA in International Trade, and an MS in Management Science and Engineering, all from Shanghai Jiao Tong University. His dissertation research is concerned with semantic-based language modeling, and he has been involved in a variety of research projects in this general domain while at Drexel.
He is the co-author of four refereed journal articles and fourteen refereed conference proceedings papers, and is the implementer  of the Dragon Toolkit (http://www.dragontoolkit.org), a comprehensive Information Retrieval/Text Mining research tool.

[See List of Events]

Apply to SCILS
Undergraduate admissions >>
Graduate admissions >>
News & Events
September 4, 2008
September 10, 2008
September 10, 2008
Placement Services
More from Placement Services »
Centers, Institutes, & Affiliates
Site Login Copyright © 2008 Rutgers University. All Rights Reserved.
Home - School of Communication, Information and Library Studies Rutgers University - http://www.rutgers.edu/