Automatising the learning of lexical patterns: an application to the enrichment of WordNet by extracting semantic relationships from Wikipedia

From WikiLit
Revision as of 20:20, January 30, 2014 by Fnielsen (Talk | contribs) (Text replace - "|collected_datatype=" to "|data_source=")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Publication (help)
Automatising the learning of lexical patterns: an application to the enrichment of WordNet by extracting semantic relationships from Wikipedia
Authors: Maria Ruiz-Casado, Enrique Alfonseca, Pablo Castells [edit item]
Citation: Data and Knowledge Engineering 61 (3): 484-499. 2007.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1016/j.datak.2006.06.011.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Automatising the learning of lexical patterns: an application to the enrichment of WordNet by extracting semantic relationships from Wikipedia is a publication by Maria Ruiz-Casado, Enrique Alfonseca, Pablo Castells.


[edit] Abstract

This paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 2600 new relationships that did not appear in WordNet originally. The precision of these relationships depends on the degree of generality chosen for the patterns and the type of relation, being around 60-70% for the best combinations proposed.

[edit] Research questions

"n this paper, we present a procedure for automatically enriching an existing lexical semantic network with new relationships extracted from on-line encyclopedic information. The approach followed is mainly based in the use of lexical patterns that model each type of relationship and natural language processing resources. The semantic network chosen is WordNet [10], given that it is currently used in many applications, although the procedure is general enough to be used with other ontologies. The encyclopedia used is the Wikipedia, a collaborative web-based resource which is being constantly updated by its users"

Research details

Topics: Other natural language processing topics [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Websites, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"The algorithm has been evaluated with the whole Simple English Wikipedia entries, as available on September 27, 2005. Each of the entries was disambiguated using the procedure described in [63]. An evaluation of 360 entries, performed by two human judges, indicates that the precision of the disambiguation is 92% (87% for polysemous words). The high figure should not come as a surprise, given that, as can be expected, it is an easier problem to disambiguate the title of an encyclopedia entry (for which there exist much relevant data) than a word inside unrestricted text.

The next step consisted in extracting, from each Wikipedia entry e, a list of sentences containing references to other entries f which are related with e inside WordNet. This resulted in 485 sentences for hyponymy, 213 for hyperonymy, 562 for holonymy and 509 for meronymy. When analysing these patterns, however, we found that, both for hyperonymy and meronymy, most of the sentences extracted only contained the name of the entry f (the target of the relationship) with no contextual information around it. The reason was unveiled by examining the web pages:

• In the case of hyponyms and holonyms, it is very common to express the relationship with natural language, with expressions such as A dog is a mammal, or A wheel is part of a car.

• On the other hand, when describing hyperonyms and meronyms, their hyponyms and holonyms are usually expressed with enumerations, which tend to be formatted as HTML bullet lists. Therefore, the sentence splitter chunks each hyponym and each holonym as belonging to a separate sentence.

All the results in these experiments have been evaluated by hand by two judges. The total inter-judge agreement reached 95%. In order to unify the criteria, in the doubtful cases, similar relations were looked inside WordNet, and the judges tried to apply the same criteria as shown by those examples. The cases in which the judges disagree have not been taking into consideration for calculating the accuracy."

[edit] Comments

"Wikipedia pages - Websites (WordNet)"


Further notes[edit]

Facts about "Automatising the learning of lexical patterns: an application to the enrichment of WordNet by extracting semantic relationships from Wikipedia"RDF feed
AbstractThis paper describes an automatic approachThis paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 2600 new relationships that did not appear in WordNet originally. The precision of these relationships depends on the degree of generality chosen for the patterns and the type of relation, being around 60-70% for the best combinations proposed.60-70% for the best combinations proposed.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsWikipedia pages - Websites (WordNet)
ConclusionThe algorithm has been evaluated with the The algorithm has been evaluated with the whole Simple English Wikipedia entries, as available on September 27, 2005. Each of the entries was disambiguated using the procedure described in [63]. An evaluation of 360 entries, performed by two human judges, indicates that the precision of the disambiguation is 92% (87% for polysemous words). The high figure should not come as a surprise, given that, as can be expected, it is an easier problem to disambiguate the title of an encyclopedia entry (for which there exist much relevant data) than a word inside unrestricted text.

The next step consisted in extracting, from each Wikipedia entry e, a list of sentences containing references to other entries f which are related with e inside WordNet. This resulted in 485 sentences for hyponymy, 213 for hyperonymy, 562 for holonymy and 509 for meronymy. When analysing these patterns, however, we found that, both for hyperonymy and meronymy, most of the sentences extracted only contained the name of the entry f (the target of the relationship) with no contextual information around it. The reason was unveiled by examining the web pages:

• In the case of hyponyms and holonyms, it is very common to express the relationship with natural language, with expressions such as A dog is a mammal, or A wheel is part of a car.

• On the other hand, when describing hyperonyms and meronyms, their hyponyms and holonyms are usually expressed with enumerations, which tend to be formatted as HTML bullet lists. Therefore, the sentence splitter chunks each hyponym and each holonym as belonging to a separate sentence.

All the results in these experiments have been evaluated by hand by two judges. The total inter-judge agreement reached 95%. In order to unify the criteria, in the doubtful cases, similar relations were looked inside WordNet, and the judges tried to apply the same criteria as shown by those examples. The cases in which the judges disagree have not been taking into consideration for calculating the accuracy.
onsideration for calculating the accuracy.
Data sourceExperiment responses +, Websites + and Wikipedia pages +
Doi10.1016/j.datak.2006.06.011 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Automatising%2Bthe%2Blearning%2Bof%2Blexical%2Bpatterns%3A%2Ban%2Bapplication%2Bto%2Bthe%2Benrichment%2Bof%2BWordNet%2Bby%2Bextracting%2Bsemantic%2Brelationships%2Bfrom%2BWikipedia%22 +
Has authorMaria Ruiz-Casado +, Enrique Alfonseca + and Pablo Castells +
Has domainComputer science +
Has topicOther natural language processing topics +
Issue3 +
Pages484-499 +
Peer reviewedYes +
Publication typeJournal article +
Published inData and Knowledge Engineering +
Research designExperiment +
Research questionsn this paper, we present a procedure for an this paper, we present a procedure for automatically enriching an existing lexical semantic network with new relationships extracted from on-line encyclopedic information. The approach followed is mainly based in the use of lexical patterns that model each type of relationship and natural language processing resources. The semantic network chosen is WordNet [10], given that it is currently used in many applications, although the procedure is general enough to be used with other ontologies. The encyclopedia used is the Wikipedia, a collaborative web-based resource which is being constantly updated by its usersh is being constantly updated by its users
Revid10,675 +
TheoriesUndetermined
Theory typeDesign and action +
TitleAutomatising the learning of lexical patterns: an application to the enrichment of WordNet by extracting semantic relationships from Wikipedia
Unit of analysisArticle +
Urlhttp://dx.doi.org/10.1016/j.datak.2006.06.011 +
Volume61 +
Wikipedia coverageSample data +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2007 +