Browse wiki

Jump to: navigation, search
Automatising the learning of lexical patterns: an application to the enrichment of WordNet by extracting semantic relationships from Wikipedia
Abstract This paper describes an automatic approachThis paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 2600 new relationships that did not appear in WordNet originally. The precision of these relationships depends on the degree of generality chosen for the patterns and the type of relation, being around 60-70% for the best combinations proposed.60-70% for the best combinations proposed.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments Wikipedia pages - Websites (WordNet)
Conclusion The algorithm has been evaluated with the The algorithm has been evaluated with the whole Simple English Wikipedia entries, as available on September 27, 2005. Each of the entries was disambiguated using the procedure described in [63]. An evaluation of 360 entries, performed by two human judges, indicates that the precision of the disambiguation is 92% (87% for polysemous words). The high figure should not come as a surprise, given that, as can be expected, it is an easier problem to disambiguate the title of an encyclopedia entry (for which there exist much relevant data) than a word inside unrestricted text. The next step consisted in extracting, from each Wikipedia entry e, a list of sentences containing references to other entries f which are related with e inside WordNet. This resulted in 485 sentences for hyponymy, 213 for hyperonymy, 562 for holonymy and 509 for meronymy. When analysing these patterns, however, we found that, both for hyperonymy and meronymy, most of the sentences extracted only contained the name of the entry f (the target of the relationship) with no contextual information around it. The reason was unveiled by examining the web pages: • In the case of hyponyms and holonyms, it is very common to express the relationship with natural language, with expressions such as A dog is a mammal, or A wheel is part of a car. • On the other hand, when describing hyperonyms and meronyms, their hyponyms and holonyms are usually expressed with enumerations, which tend to be formatted as HTML bullet lists. Therefore, the sentence splitter chunks each hyponym and each holonym as belonging to a separate sentence. All the results in these experiments have been evaluated by hand by two judges. The total inter-judge agreement reached 95%. In order to unify the criteria, in the doubtful cases, similar relations were looked inside WordNet, and the judges tried to apply the same criteria as shown by those examples. The cases in which the judges disagree have not been taking into consideration for calculating the accuracy.onsideration for calculating the accuracy.
Data source Experiment responses  + , Websites  + , Wikipedia pages  +
Doi 10.1016/j.datak.2006.06.011 +
Google scholar url  +
Has author Maria Ruiz-Casado + , Enrique Alfonseca + , Pablo Castells +
Has domain Computer science +
Has topic Other natural language processing topics +
Issue 3  +
Pages 484-499  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in Data and Knowledge Engineering +
Research design Experiment  +
Research questions n this paper, we present a procedure for an this paper, we present a procedure for automatically enriching an existing lexical semantic network with new relationships extracted from on-line encyclopedic information. The approach followed is mainly based in the use of lexical patterns that model each type of relationship and natural language processing resources. The semantic network chosen is WordNet [10], given that it is currently used in many applications, although the procedure is general enough to be used with other ontologies. The encyclopedia used is the Wikipedia, a collaborative web-based resource which is being constantly updated by its usersh is being constantly updated by its users
Revid 10,675  +
Theories Undetermined
Theory type Design and action  +
Title Automatising the learning of lexical patterns: an application to the enrichment of WordNet by extracting semantic relationships from Wikipedia
Unit of analysis Article  +
Url  +
Volume 61  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Live Wikipedia  +
Wikipedia language English  +
Wikipedia page type Article  +
Year 2007  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:24:09  +
Categories Other natural language processing topics  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:20:48  +
show properties that link here 


Enter the name of the page to start browsing from.