A negative category based approach for Wikipedia document classification

From WikiLit
Jump to: navigation, search
Publication (help)
A negative category based approach for Wikipedia document classification
Authors: Meenakshi Sundaram Murugeshan, K. Lakshmi, Saswati Mukherjee [edit item]
Citation: International Journal of Knowledge Engineering and Data Mining 1 : 84-97. 2010 April.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1504/IJKEDM.2010.032582.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
A negative category based approach for Wikipedia document classification is a publication by Meenakshi Sundaram Murugeshan, K. Lakshmi, Saswati Mukherjee.


[edit] Abstract

Profile based methods have been successfully used for the classification of unstructured texts. This paper presents a profile based method for Wikipedia XML document classification. We have used profiles built using negative category information. Our approach exploits the structure of Wikipedia documents to build profiles. Two class profiles are built; one based on the whole content and the other based on the initial description of the Wikipedia documents. In addition, we have also explored the option of using the terms in the section and subsection titles. The effectiveness of cosine and fractional similarity measures in classifying XML documents is compared. The importance of combining two profile based classifiers is experimentally shown to have worked better than individual classifiers.

[edit] Research questions

"This paper presents a profile based method for Wikipedia XML document classification. This research aims on exploiting profile-based classification. The focus of the work is on improving the profile creation thereby improving the performance of classification."

Research details

Topics: Text classification [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"This paper presents a method of Wikipedia classification. Since NCD based profile creation proved to perform well for non-overlapping categories, we have experimented with this method, coupled with the method that exploits IDES and title terms for profile creation. The IDES of the Wikipedia documents which contain domain specific terms helped to improve the performance of overall classification. Combination of two classifiers has shown better results than any of the classifiers taken individually. We also plan to extend this method, by exploring more Wikipedia specific structures such as links in a document."

[edit] Comments

"Secondary (INEX: Wikipedia articles)"


Further notes[edit]

Facts about "A negative category based approach for Wikipedia document classification"RDF feed
AbstractProfile based methods have been successfulProfile based methods have been successfully used for the classification of unstructured texts. This paper presents a profile based method for Wikipedia XML document classification. We have used profiles built using negative category information. Our approach exploits the structure of Wikipedia documents to build profiles. Two class profiles are built; one based on the whole content and the other based on the initial description of the Wikipedia documents. In addition, we have also explored the option of using the terms in the section and subsection titles. The effectiveness of cosine and fractional similarity measures in classifying XML documents is compared. The importance of combining two profile based classifiers is experimentally shown to have worked better than individual classifiers.worked better than individual classifiers.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsSecondary (INEX: Wikipedia articles)
ConclusionThis paper presents a method of Wikipedia This paper presents a method of Wikipedia classification. Since NCD based profile creation proved to perform well for non-overlapping categories, we have experimented with this method, coupled with the method that exploits IDES and title terms for profile creation. The IDES of the Wikipedia documents which contain domain specific terms helped to improve the performance of overall classification. Combination of two classifiers has shown better results than any of the classifiers taken individually. We also plan to extend this method, by exploring more Wikipedia specific structures such as links in a document.ic structures such as links in a document.
Data sourceExperiment responses + and Wikipedia pages +
Doi10.1504/IJKEDM.2010.032582 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22A%2Bnegative%2Bcategory%2Bbased%2Bapproach%2Bfor%2BWikipedia%2Bdocument%2Bclassification%22 +
Has authorMeenakshi Sundaram Murugeshan +, K. Lakshmi + and Saswati Mukherjee +
Has domainComputer science +
Has topicText classification +
MonthApril +
Pages84-97 +
Peer reviewedYes +
Publication typeJournal article +
Published inInternational Journal of Knowledge Engineering and Data Mining +
Research designExperiment +
Research questionsThis paper presents a profile based methodThis paper presents a profile based method for Wikipedia XML document classification. This research aims on exploiting profile-based classification. The focus of the work is on improving the profile creation thereby improving the performance of classification.proving the performance of classification.
Revid10,637 +
TheoriesUndetermined
Theory typeDesign and action +
TitleA negative category based approach for Wikipedia document classification
Unit of analysisArticle +
Urlhttp://inderscience.metapress.com/content/m538150712242802/ +
Volume1 +
Wikipedia coverageSample data +
Wikipedia data extractionDump +
Wikipedia languageNot specified +
Wikipedia page typeArticle +
Year2010 +