Difference between revisions of "Wikipedia-based semantic interpretation for natural language processing"

From WikiLit
Jump to: navigation, search
(Made revision to research design)
m (Text replace - "|added_by_wikilit_team=Unspecified" to "|added_by_wikilit_team=Added on initial load")
Line 9: Line 9:
 
|url=http://dl.acm.org/citation.cfm?id=1622728
 
|url=http://dl.acm.org/citation.cfm?id=1622728
 
|peer_reviewed=Yes
 
|peer_reviewed=Yes
|added_by_wikilit_team=Unspecified
+
|added_by_wikilit_team=Added on initial load
 
|article_language=English
 
|article_language=English
 
|abstract=Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as {WordNet}, or on huge manual efforts such as the {CYC} project. Here we propose a novel method, called Explicit Semantic Analysis {(ESA)}, for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using {ESA} results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the {ESA} model is easy to explain to human users. 2009 {AI} Access Foundation. All rights reserved.
 
|abstract=Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as {WordNet}, or on huge manual efforts such as the {CYC} project. Here we propose a novel method, called Explicit Semantic Analysis {(ESA)}, for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using {ESA} results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the {ESA} model is easy to explain to human users. 2009 {AI} Access Foundation. All rights reserved.

Revision as of 23:49, October 16, 2013

Publication (help)
Wikipedia-based semantic interpretation for natural language processing
Authors: Evgeniy Gabrilovich, Shaul Markovitch [edit item]
Citation: Journal of Artificial Intelligence Research 34 : 443-498. 2009.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Wikipedia-based semantic interpretation for natural language processing is a publication by Evgeniy Gabrilovich, Shaul Markovitch.


[edit] Abstract

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as {WordNet}, or on huge manual efforts such as the {CYC} project. Here we propose a novel method, called Explicit Semantic Analysis {(ESA)}, for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using {ESA} results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the {ESA} model is easy to explain to human users. 2009 {AI} Access Foundation. All rights reserved.

[edit] Research questions

"Here we propose a novel method, called Explicit Semantic Analysis (ESA), for ¯ne-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts."

Research details

Topics: Semantic relatedness [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Design science [edit item]
Data source: [edit item]
Collected data time dimension: Longitudinal [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Clone [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"We succeeded to make automatic use of an encyclopedia without deep language under- standing, specially crafted inference rules or relying on additional common-sense knowledge bases. This was made possible by applying standard text classi¯cation techniques to match document texts with relevant Wikipedia articles. Empirical evaluation con¯rmed the value of Explicit Semantic Analysis for two com- mon tasks in natural language processing. Compared with the previous state of the art, using ESA results in signi¯cant improvements in automatically assessing semantic related- ness of words and texts. Speci¯cally, the correlation of computed relatedness scores with human judgements increased from r = 0:56 to 0:75 (Spearman) for individual words and from r = 0:60 to 0:72 (Pearson) for texts. In contrast to existing methods, ESA o®ers a uniform way for computing relatedness of both individual words and arbitrarily long text fragments. Using ESA to perform feature generation for text categorization yielded con- sistent improvements across a diverse range of datasets. Recently, the performance of the best text categorization systems became similar, and previous work mostly achieved small improvements. Using Wikipedia as a source of external knowledge allowed us to improve the performance of text categorization across a diverse collection of datasets."

[edit] Comments


Further notes[edit]

Facts about "Wikipedia-based semantic interpretation for natural language processing"RDF feed
AbstractAdequate representation of natural languagAdequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as {WordNet}, or on huge manual efforts such as the {CYC} project. Here we propose a novel method, called Explicit Semantic Analysis {(ESA)}, for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using {ESA} results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the {ESA} model is easy to explain to human users. 2009 {AI} Access Foundation. All rights reserved.I} Access Foundation. All rights reserved.
Added by wikilit teamAdded on initial load +
Collected data time dimensionLongitudinal +
ConclusionWe succeeded to make automatic use of an eWe succeeded to make automatic use of an encyclopedia without deep language under-

standing, specially crafted inference rules or relying on additional common-sense knowledge bases. This was made possible by applying standard text classi¯cation techniques to match document texts with relevant Wikipedia articles. Empirical evaluation con¯rmed the value of Explicit Semantic Analysis for two com- mon tasks in natural language processing. Compared with the previous state of the art, using ESA results in signi¯cant improvements in automatically assessing semantic related- ness of words and texts. Speci¯cally, the correlation of computed relatedness scores with human judgements increased from r = 0:56 to 0:75 (Spearman) for individual words and from r = 0:60 to 0:72 (Pearson) for texts. In contrast to existing methods, ESA o®ers a uniform way for computing relatedness of both individual words and arbitrarily long text fragments. Using ESA to perform feature generation for text categorization yielded con- sistent improvements across a diverse range of datasets. Recently, the performance of the best text categorization systems became similar, and previous work mostly achieved small improvements. Using Wikipedia as a source of external knowledge allowed us to improve

the performance of text categorization across a diverse collection of datasets.
n across a diverse collection of datasets.
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Wikipedia-based%2Bsemantic%2Binterpretation%2Bfor%2Bnatural%2Blanguage%2Bprocessing%22 +
Has authorEvgeniy Gabrilovich + and Shaul Markovitch +
Has domainComputer science +
Has topicSemantic relatedness +
Pages443-498 +
Peer reviewedYes +
Publication typeJournal article +
Published inJournal of Artificial Intelligence Research +
Research designDesign science +
Research questionsHere we propose a novel method, called ExpHere we propose a novel method, called Explicit Semantic Analysis (ESA),

for ¯ne-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts.text in terms

of Wikipedia-based concepts.
Revid9,690 +
TheoriesUndetermined
Theory typeDesign and action +
TitleWikipedia-based semantic interpretation for natural language processing
Unit of analysisArticle +
Urlhttp://dl.acm.org/citation.cfm?id=1622728 +
Volume34 +
Wikipedia coverageSample data +
Wikipedia data extractionClone +
Wikipedia languageNot specified +
Wikipedia page typeArticle +
Year2009 +