Difference between revisions of "Enhancing cluster labeling using Wikipedia"

From WikiLit
Jump to: navigation, search
m (gscites field added)
Line 15: Line 15:
 
|abstract=This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text pure statistical methods have difficulty in identifying them as good descriptors. Furthermore our experiments show that for more than 85\% of the clusters in our test collection the manual label (or an inflection or a synonym of it) appears in the top five labels recommended by our system. Copyright 2009 {ACM.}"
 
|abstract=This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text pure statistical methods have difficulty in identifying them as good descriptors. Furthermore our experiments show that for more than 85\% of the clusters in our test collection the manual label (or an inflection or a synonym of it) appears in the top five labels recommended by our system. Copyright 2009 {ACM.}"
 
|doi=10.1145/1571941.1571967
 
|doi=10.1145/1571941.1571967
 +
|gscites=14099639641125736698
 
|topics=Ranking and clustering systems
 
|topics=Ranking and clustering systems
 
|domains=Computer science
 
|domains=Computer science

Revision as of 16:31, September 5, 2012

Publication (help)
Enhancing cluster labeling using Wikipedia
Authors: David Carmel, Haggai Roitman, Naama Zwerdling [edit item]
Citation: SIGIR '09 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval  : 139-146. 2009 July 19-23. Boston, MA, United states. Association for Computing Machinery.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: 10.1145/1571941.1571967.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Enhancing cluster labeling using Wikipedia is a publication by David Carmel, Haggai Roitman, Naama Zwerdling.


[edit] Abstract

This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text pure statistical methods have difficulty in identifying them as good descriptors. Furthermore our experiments show that for more than 85\% of the clusters in our test collection the manual label (or an inflection or a synonym of it) appears in the top five labels recommended by our system. Copyright 2009 {ACM.}"

[edit] Research questions

"This work investigates cluster labeling enhancement by uti- lizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candi- date labels from Wikipedia in addition to important terms that are extracted directly from the text. The“labeling qual- ity” of each candidate is then evaluated by several indepen- dent judges and the top evaluated candidates are recom- mended for labeling."

Research details

Topics: Ranking and clustering systems [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Clone [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"Cluster labeling withWikipedia is extremely successful, as shown by our results, especially in collections of documents whose topics are covered well by Wikipedia concepts. For domain specific collections, with topics that are not com- pletely covered by Wikipedia, the proposed candidates may hurt the system’s performance due to their irrelevance to the documents’ topics. For such collections, an intelligent decision should be made regarding the use of Wikipedia or another external resource; alternatively, a choice could be made to focus only on inner terms for labeling. The deci- sion should be made by analyzing the given collection with respect to Wikipedia. Developing such a collection specific decision making as part of the labeling framework is left for further research."

[edit] Comments

""Cluster labeling withWikipedia is extremely successful, as shown by our results, especially in collections of documents whose topics are covered well by Wikipedia concepts." p. 146"


Further notes[edit]

Facts about "Enhancing cluster labeling using Wikipedia"RDF feed
AbstractThis work investigates cluster labeling enThis work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text pure statistical methods have difficulty in identifying them as good descriptors. Furthermore our experiments show that for more than 85\% of the clusters in our test collection the manual label (or an inflection or a synonym of it) appears in the top five labels recommended by our system. Copyright 2009 {ACM.}"nded by our system. Copyright 2009 {ACM.}"
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
Comments"Cluster labeling withWikipedia is extremely successful, as

shown by our results, especially in collections of documents

whose topics are covered well by Wikipedia concepts." p. 146
ConclusionCluster labeling withWikipedia is extremelCluster labeling withWikipedia is extremely successful, as

shown by our results, especially in collections of documents whose topics are covered well by Wikipedia concepts. For domain specific collections, with topics that are not com- pletely covered by Wikipedia, the proposed candidates may hurt the system’s performance due to their irrelevance to the documents’ topics. For such collections, an intelligent decision should be made regarding the use of Wikipedia or another external resource; alternatively, a choice could be made to focus only on inner terms for labeling. The deci- sion should be made by analyzing the given collection with respect to Wikipedia. Developing such a collection specific decision making as part of the labeling framework is left for further research.ng framework is left for

further research.
Conference locationBoston, MA, United states +
Dates19-23 +
Doi10.1145/1571941.1571967 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Enhancing%2Bcluster%2Blabeling%2Busing%2BWikipedia%22 +
Has authorDavid Carmel +, Haggai Roitman + and Naama Zwerdling +
Has domainComputer science +
Has topicRanking and clustering systems +
MonthJuly +
Pages139-146 +
Peer reviewedYes +
Publication typeConference paper +
Published inSIGIR '09 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval +
PublisherAssociation for Computing Machinery +
Research designExperiment +
Research questionsThis work investigates cluster labeling enThis work investigates cluster labeling enhancement by uti-

lizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candi- date labels from Wikipedia in addition to important terms that are extracted directly from the text. The“labeling qual- ity” of each candidate is then evaluated by several indepen- dent judges and the top evaluated candidates are recom- mended for labeling.candidates are recom-

mended for labeling.
Revid7,625 +
TheoriesUndetermined
Theory typeDesign and action +
TitleEnhancing cluster labeling using Wikipedia
Unit of analysisArticle +
Urlhttp://dl.acm.org/citation.cfm?id=1571967 +
Wikipedia coverageSample data +
Wikipedia data extractionClone +
Wikipedia languageNot specified +
Wikipedia page typeArticle +
Year2009 +