Browse wiki

Jump to: navigation, search
Wikipedia-based semantic interpretation for natural language processing
Abstract Adequate representation of natural languagAdequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.A model is easy to explain to human users.
Added by wikilit team Added on initial load  +
Collected data time dimension Longitudinal  +
Conclusion We succeeded to make automatic use of an eWe succeeded to make automatic use of an encyclopedia without deep language under- standing, specially crafted inference rules or relying on additional common-sense knowledge bases. This was made possible by applying standard text classi¯cation techniques to match document texts with relevant Wikipedia articles. Empirical evaluation con¯rmed the value of Explicit Semantic Analysis for two com- mon tasks in natural language processing. Compared with the previous state of the art, using ESA results in signi¯cant improvements in automatically assessing semantic related- ness of words and texts. Speci¯cally, the correlation of computed relatedness scores with human judgements increased from r = 0:56 to 0:75 (Spearman) for individual words and from r = 0:60 to 0:72 (Pearson) for texts. In contrast to existing methods, ESA o®ers a uniform way for computing relatedness of both individual words and arbitrarily long text fragments. Using ESA to perform feature generation for text categorization yielded con- sistent improvements across a diverse range of datasets. Recently, the performance of the best text categorization systems became similar, and previous work mostly achieved small improvements. Using Wikipedia as a source of external knowledge allowed us to improve the performance of text categorization across a diverse collection of datasets.n across a diverse collection of datasets.
Data source Archival records  +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Wikipedia-based%2Bsemantic%2Binterpretation%2Bfor%2Bnatural%2Blanguage%2Bprocessing%22  +
Has author Evgeniy Gabrilovich + , Shaul Markovitch +
Has domain Computer science +
Has topic Semantic relatedness +
Pages 443-498  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in Journal of Artificial Intelligence Research +
Research design Design science  +
Research questions Here we propose a novel method, called ExpHere we propose a novel method, called Explicit Semantic Analysis (ESA), for ¯ne-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts.text in terms of Wikipedia-based concepts.
Revid 11,074  +
Theories Undetermined
Theory type Design and action  +
Title Wikipedia-based semantic interpretation for natural language processing
Unit of analysis Article  +
Url http://dl.acm.org/citation.cfm?id=1622728  +
Volume 34  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Dump  +
Wikipedia language Not specified  +
Wikipedia page type Article  +
Year 2009  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:36:42  +
Categories Semantic relatedness  + , Computer science  + , Publications with missing comments  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:32:40  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.