Browse wiki

Jump to: navigation, search
Wisdom of crowds versus wisdom of linguists - measuring the semantic relatedness of words
Abstract In this article, we present a comprehensivIn this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the ‘wisdom of linguists’ (i.e., classical wordnets) or by the ‘wisdom of crowds’ (i.e., collaboratively constructed knowledge sources like Wikipedia). The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that ‘wisdom of crowds’ based resources are not superior to ‘wisdom of linguists’ based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available1 for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications.al language processing (NLP) applications.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Conclusion Correlation with human judgments Contrary Correlation with human judgments Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that (i) ‘wisdom of crowds’ based resources are not generally superior to ‘wisdom of linguists’ based resources. We further find that (ii) concept vector based measures consistently display superior performance compared to other measure types, and (iii) that the results on German datasets confirm the results for English. The restored competitiveness of ‘wisdom of linguists’ based resources is due to a generalized concept vector based measure (ZG07) introduced in this article. This measure is applicable to any knowledge source offering a textual representation of a concept. We showed how such textual representations can be inferred from semantic relations in wordnets without glosses. The performance gains that can be obtained with the generalized concept vector based measure strongly depend on the amount of additional information that the knowledge source offers in the textual representations. Solving word choice problems As this task depends much on the coverage of a knowledge source, results are different for English and German. On the English dataset, we find (i) little differences between ‘wisdom of linguists’ or ‘wisdom of crowds’ knowledge sources. On the German dataset the ‘crowds’ outperform the ‘linguists’ by a wide margin due to the much higher coverage of the SemRel measures using the German Wikipedia. We find that (ii) concept vector based measures using Wikipedia as a knowledge source perform consistently well, and outperform all other measure types with respect to accuracy and coverage on the English as well as the German dataset. However, a more detailed analysis of the word choice datasets with respect to the expected difficulty for a SemRel measure is necessary before we can draw final conclusions.sary before we can draw final conclusions.
Data source Experiment responses  + , Archival records  + , Wikipedia pages  +
Doi 10.1017/S1351324909990167 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Wisdom%2Bof%2Bcrowds%2Bversus%2Bwisdom%2Bof%2Blinguists%2B-%2Bmeasuring%2Bthe%2Bsemantic%2Brelatedness%2Bof%2Bwords%22  +
Has author Torsten Zesch + , Iryna Gurevych +
Has domain Computer science +
Has topic Semantic relatedness +
Issue 1  +
Pages 25  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in Natural Language Engineering +
Research design Experiment  +
Research questions In this article, we present a comprehensivIn this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the ‘wisdom of linguists’ (i.e., classical wordnets) or by the ‘wisdom of crowds’ (i.e., collaboratively constructed knowledge sources like Wikipedia). The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness.oaches to evaluating semantic relatedness.
Revid 11,109  +
Theories Undetermined
Theory type Design and action  +
Title Wisdom of crowds versus wisdom of linguists - measuring the semantic relatedness of words
Unit of analysis Article  +
Url http://www.journals.cambridge.org/abstract_S1351324909990167  +
Volume 16  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Dump  +
Wikipedia language English  +
Wikipedia page type Article  +
Year 2009  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:36:46  +
Categories Semantic relatedness  + , Computer science  + , Publications with missing comments  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:33:08  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.