Changes

Jump to: navigation, search

Enhancing cluster labeling using Wikipedia

149 bytes added, 20:25, January 30, 2014
Text replace - "|collected_datatype=" to "|data_source="
{{Publication
|type=Conference paper
|title=Enhancing cluster labeling using Wikipedia
|authors=David Carmel, Haggai Roitman, Naama Zwerdling
|published_in=SIGIR '09 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
|type=Conference paper
|peer_reviewed=yes
|language=English
|month=July
|year=2009
|month=July
|dates=19-23
|pages=139-146
|conference_location=Boston, MA, United states
|publisher=Association for Computing Machinery
|pagesurl=139-146http://dl.acm.org/citation.cfm?id=1571967|peer_reviewed=Yes|added_by_wikilit_team=Added on initial load|article_language=English|abstract=This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text pure statistical methods have difficulty in identifying them as good descriptors. Furthermore our experiments show that for more than 85\% of the clusters in our test collection the manual label (or an inflection or a synonym of it) appears in the top five labels recommended by our system. Copyright 2009 {ACM|doi=10.}"1145/1571941.1571967|gscites=14099639641125736698|topics=Ranking and clustering systems|domains=Computer science
|research_questions=This work investigates cluster labeling enhancement by uti-
lizing Wikipedia, the free on-line encyclopedia. We describe
dent judges and the top evaluated candidates are recom-
mended for labeling.
|topics=Ranking and clustering systems
|domains=Computer science
|theory_type=Design and action
|wikipedia_coverage=Sample data
|theories=Undetermined
|research_design=Experiment
|collected_datatypedata_source=Archival records, Experiment responses, Wikipedia pages
|collected_data_time_dimension=Cross-sectional
|unit_of_analysis=Article
|wikipedia_data_extraction=CloneDump
|wikipedia_page_type=Article
|wikipedia_language=Not specified