Browse wiki

Jump to: navigation, search
Using Wikipedia knowledge to improve text classification
Abstract Text classification has been widely used tText classification has been widely used to assist users with the discovery of useful information from the Internet. However, traditional classification methods are based on the "Bag of Words" (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. In this paper, we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the BOW representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification. Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm.ts with respect to the baseline algorithm.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm
Conclusion Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm
Data source Experiment responses  + , Wikipedia pages  +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Using%2BWikipedia%2Bknowledge%2Bto%2Bimprove%2Btext%2Bclassification%22  +
Has author Pu Wang + , Jian Hu + , Hua-Jun Zeng + , Zheng Chen +
Has domain Computer science +
Has topic Text classification + , Semantic relatedness +
Issue 3  +
Pages 265-281  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in Knowledge and Information Systems +
Research design Experiment  +
Research questions we automatically construct a thesaurus of we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the BOW representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification.evious approaches for text classification.
Revid 11,019  +
Theories Undetermined
Theory type Design and action  +
Title Using Wikipedia knowledge to improve text classification
Unit of analysis Article  +
Url http://www.springerlink.com/content/26l87nxu72024625/fulltext.pdf  +
Volume 19  +
Wikipedia coverage Other  +
Wikipedia data extraction Dump  +
Wikipedia language English  +
Wikipedia page type Article  +
Year 2009  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:32:27  +
Categories Text classification  + , Semantic relatedness  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:32:09  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.