A negative category based approach for Wikipedia document classification
|A negative category based approach for Wikipedia document classification|
|Authors:||Meenakshi Sundaram Murugeshan, K. Lakshmi, Saswati Mukherjee|
|Citation:||International Journal of Knowledge Engineering and Data Mining 1 : 84-97. 2010 April.|
|Publication type:||Journal article|
|Google Scholar cites:||Citations|
|Added by Wikilit team:||Added on initial load|
|Article:||Google Scholar BASE PubMed|
|Other scholarly wikis:||AcaWiki Brede Wiki WikiPapers|
|Web search:||Bing Google Yahoo! — Google PDF|
Profile based methods have been successfully used for the classification of unstructured texts. This paper presents a profile based method for Wikipedia XML document classification. We have used profiles built using negative category information. Our approach exploits the structure of Wikipedia documents to build profiles. Two class profiles are built; one based on the whole content and the other based on the initial description of the Wikipedia documents. In addition, we have also explored the option of using the terms in the section and subsection titles. The effectiveness of cosine and fractional similarity measures in classifying XML documents is compared. The importance of combining two profile based classifiers is experimentally shown to have worked better than individual classifiers.
"This paper presents a profile based method for Wikipedia XML document classification. This research aims on exploiting profile-based classification. The focus of the work is on improving the profile creation thereby improving the performance of classification."
|Theory type:||Design and action|
|Wikipedia coverage:||Sample data|
|Data source:||Experiment responses, Wikipedia pages|
|Collected data time dimension:||Cross-sectional|
|Unit of analysis:||Article|
|Wikipedia data extraction:||Dump|
|Wikipedia page type:||Article|
|Wikipedia language:||Not specified|
"This paper presents a method of Wikipedia classification. Since NCD based profile creation proved to perform well for non-overlapping categories, we have experimented with this method, coupled with the method that exploits IDES and title terms for profile creation. The IDES of the Wikipedia documents which contain domain specific terms helped to improve the performance of overall classification. Combination of two classifiers has shown better results than any of the classifiers taken individually. We also plan to extend this method, by exploring more Wikipedia specific structures such as links in a document."
"Secondary (INEX: Wikipedia articles)"