Difference between revisions of "Using Wikipedia knowledge to improve text classification"

From WikiLit
Jump to: navigation, search
m (added_by_wikilit_team field added)
m (Text replace - "([ ][|]research_design=)([^ ]*Experiment)([^ ]*[ ][|]collected_datatype=)([^ ]*)([^ ]*[ ])" to "\1\2\3Experiment responses, \4\5")
Line 21: Line 21:
 
|theories=Undetermined
 
|theories=Undetermined
 
|research_design=Experiment
 
|research_design=Experiment
|collected_datatype=Wikipedia pages
+
|collected_datatype=Experiment responses, Wikipedia pages
 
|collected_data_time_dimension=Cross-sectional
 
|collected_data_time_dimension=Cross-sectional
 
|unit_of_analysis=Article
 
|unit_of_analysis=Article

Revision as of 18:43, October 18, 2013

Publication (help)
Using Wikipedia knowledge to improve text classification
Authors: Pu Wang, Jian Hu, Hua-Jun Zeng, Zheng Chen [edit item]
Citation: Knowledge and Information Systems 19 (3): 265-281. 2009.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Using Wikipedia knowledge to improve text classification is a publication by Pu Wang, Jian Hu, Hua-Jun Zeng, Zheng Chen.


[edit] Abstract

Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, traditional classification methods are based on the "Bag of Words" (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. In this paper, we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the BOW representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification. Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm.

[edit] Research questions

"we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the BOW representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification."

Research details

Topics: Text classification, Semantic relatedness [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Other [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Clone [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm"

[edit] Comments

"Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm"


Further notes[edit]

Facts about "Using Wikipedia knowledge to improve text classification"RDF feed
AbstractText classification has been widely used tText classification has been widely used to assist users with the discovery of useful information from the Internet. However, traditional classification methods are based on the "Bag of Words" (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. In this paper, we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the BOW representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification. Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm.ts with respect to the baseline algorithm.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsExperimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm
ConclusionExperimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Using%2BWikipedia%2Bknowledge%2Bto%2Bimprove%2Btext%2Bclassification%22 +
Has authorPu Wang +, Jian Hu +, Hua-Jun Zeng + and Zheng Chen +
Has domainComputer science +
Has topicText classification + and Semantic relatedness +
Issue3 +
Pages265-281 +
Peer reviewedYes +
Publication typeJournal article +
Published inKnowledge and Information Systems +
Research designExperiment +
Research questionswe automatically construct a thesaurus of we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the BOW representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification.evious approaches for text classification.
Revid9,922 +
TheoriesUndetermined
Theory typeDesign and action +
TitleUsing Wikipedia knowledge to improve text classification
Unit of analysisArticle +
Urlhttp://www.springerlink.com/content/26l87nxu72024625/fulltext.pdf +
Volume19 +
Wikipedia coverageOther +
Wikipedia data extractionClone +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2009 +