Browse wiki

Jump to: navigation, search
Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge
Abstract When humans approach the task of text cateWhen humans approach the task of text categorization, they interpret the specific wording of the document in the much larger context of their background knowledge and experience. On the other hand, state-of-the-art information retrieval systems are quite brittle-they traditionally represent documents as bags of words, and are restricted to learning from individual word occurrences in the (necessarily limited) training set. For instance, given the sentence Wal-Mart supply chain goes real time" how can a text categorization system know that Wal-Mart manages its stock with RFID technology? And having read that "Ciprofloxacin belongs to the quinolones group" how on earth can a machine know that the drug mentioned is an antibiotic produced by Bayer? In this paper we present algorithms that can do just that. We propose to enrich document representation through automatic use of a vast compendium of human knowledge-an encyclopedia. We apply machine learning techniques to Wikipedia the largest encyclopedia to date which surpasses in scope many conventional encyclopedias and provides a cornucopia of world knowledge. Each Wikipedia article represents a concept and documents to be categorized are represented in the rich feature space of words and relevant Wikipedia concepts. Empirical results confirm that this knowledge-intensive representation brings text categorization to a qualitatively new level of performance across a diverse collection of datasets.e across a diverse collection of datasets.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments "Empirical evaluation definitively confirmed the value of encyclopedic knowledge for text categorization across a range of datasets." p. 1306
Conclusion We succeeded to make use of an encyclopediWe succeeded to make use of an encyclopedia without deep language understanding and without relying on additional common-sense knowledge bases. This was made possible by applying standard text classification techniques to match document texts with relevant Wikipedia articles. Empirical evaluation definitively confirmed the value of encyclopedic knowledge for text categorization across a range of datasets. Recently, the performance of the best text categorization systems became similar, as if a plateau has been reached, and previous work mostly achieved improvements of up to a few percentage points. Using Wikipedia allowed us to reap much greater benefits, with double-digit improvements observed on a number of datasets.ovements observed on a number of datasets.
Conference location Menlo Park, CA, USA +
Data source Experiment responses  + , Wikipedia pages  +
Dates 16-20 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Overcoming%2Bthe%2Bbrittleness%2Bbottleneck%2Busing%2BWikipedia%3A%2Benhancing%2Btext%2Bcategorization%2Bwith%2Bencyclopedic%2Bknowledge%22  +
Has author Evgeniy Gabrilovich + , Shaul Markovitch +
Has domain Computer science +
Has topic Text classification +
Month July  +
Pages 1301-6  +
Peer reviewed Yes  +
Publication type Conference paper  +
Published in AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2 +
Publisher AAAI Press +
Research design Experiment  +
Research questions state-of-the-art information retrieval sysstate-of-the-art information retrieval systems are quite brittle--they traditionally represent documents as bags of words, and are restricted to learning from individual word occurrences in the (necessarily limited) training set. In this paper we present algorithms that can resolve that. We propose to enrich document representation through automatic use of a vast compendium of human knowledge--an encyclopedia.ndium of human knowledge--an encyclopedia.
Revid 10,898  +
Theories Undetermined
Theory type Design and action  +
Title Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge
Unit of analysis Article  +
Url http://www.citeulike.org/group/382/article/2157092  +
Volume 2  +
Wikipedia coverage Main topic  +
Wikipedia data extraction Dump  +
Wikipedia language Not specified  +
Wikipedia page type Article  +
Year 2006  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:29:57  +
Categories Text classification  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:30:15  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.