Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation

From WikiLit
Revision as of 20:31, January 30, 2014 by Fnielsen (Talk | contribs) (Text replace - "|collected_datatype=" to "|data_source=")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Publication (help)
Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation
Authors: Denis Turdakov, Pavel Velikhov [edit item]
Citation: Spring Young Researcher's Colloquium On Database and Information Systems  : . 2008.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation is a publication by Denis Turdakov, Pavel Velikhov.


[edit] Abstract

Wikipedia has grown into a high quality up-todate knowledge base and can enable many knowledge-based applications, which rely on semantic information. One of the most general and quite powerful semantic tools is a measure of semantic relatedness between concepts. Moreover, the ability to efficiently produce a list of ranked similar concepts for a given concept is very important for a wide range of applications. We propose to use a simple measure of similarity between Wikipedia concepts, based on Dice’s measure, and provide very efficient heuristic methods to compute top k ranking results. Furthermore, since our heuristics are based on statistical properties of scale-free networks, we show that these heuristics are applicable to other complex ontologies. Finally, in order to evaluate the measure, we have used it to solve the problem of word-sense disambiguation. Our approach to word sense disambiguation is based solely on the similarity measure and produces results with high accuracy.

[edit] Research questions

"We propose to use a simple measure of similarity between Wikipedia concepts, based on Dice’s measure, and provide very efficient heuristic methods to compute top k ranking results. Furthermore, since our heuristics are based on statistical properties of scale-free networks, we show that these heuristics are applicable to other complex ontologies. Finally, in order to evaluate the measure, we have used it to solve the problem of word-sense disambiguation"

Research details

Topics: Semantic relatedness [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Main topic [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"We have presented a simple measure of semantic relatedness, based on the link structure of Wikipedia. We addressed the problem of computing this measure efficiently and have provided heuristics for computing top k related articles. These heuristics achieve high accuracy, but limit the search space drastically and make the approach suitable for practical use in a variety of data intensive systems. We also presented a randomized algorithm to compute the relatedness measure between two articles efficiently and shown that its accuracy in ranking is very close to the true measure. In order to evaluate the quality of the measure, we have presented a simple method for word sense disambiguation, based on the relatedness measure. We evaluated our approach and found it to perform on par with the competing approaches and close to the performance of human experts."

[edit] Comments


Further notes[edit]

Facts about "Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation"RDF feed
AbstractWikipedia has grown into a high quality upWikipedia has grown into a high quality up-todate knowledge base and can enable many knowledge-based applications, which rely on semantic information. One of the most general and quite powerful semantic tools is a measure of semantic relatedness between concepts. Moreover, the ability to efficiently produce a list of ranked similar concepts for a given concept is very important for a wide range of applications. We propose to use a simple measure of similarity between Wikipedia concepts, based on Dice’s measure, and provide very efficient heuristic methods to compute top k ranking results. Furthermore, since our heuristics are based on statistical properties of scale-free networks, we show that these heuristics are applicable to other complex ontologies. Finally, in order to evaluate the measure, we have used it to solve the problem of word-sense disambiguation. Our approach to word sense disambiguation is based solely on the similarity measure and produces results with high accuracy.e and produces results with high accuracy.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
ConclusionWe have presented a simple measure of semaWe have presented a simple measure of semantic relatedness, based on the link structure of Wikipedia. We addressed the problem of computing this measure efficiently and have provided heuristics for computing

top k related articles. These heuristics achieve high accuracy, but limit the search space drastically and make the approach suitable for practical use in a variety of data intensive systems. We also presented a

randomized algorithm to compute the relatedness measure between two articles efficiently and shown that its accuracy in ranking is very close to the true measure. In order to evaluate the quality of the measure, we have presented a simple method for word sense disambiguation, based on the relatedness measure. We evaluated our approach and found it to perform on par with the competing approaches and close to the performance of human experts.
close to the performance of human experts.
Data sourceExperiment responses + and Wikipedia pages +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Semantic%2Brelatedness%2Bmetric%2Bfor%2BWikipedia%2Bconcepts%2Bbased%2Bon%2Blink%2Banalysis%2Band%2Bits%2Bapplication%2Bto%2Bword%2Bsense%2Bdisambiguation%22 +
Has authorDenis Turdakov + and Pavel Velikhov +
Has domainComputer science +
Has topicSemantic relatedness +
Peer reviewedYes +
Publication typeConference paper +
Published inSpring Young Researcher's Colloquium On Database and Information Systems +
Research designExperiment +
Research questionsWe propose to use a simple measure of simiWe propose to use a simple measure of similarity between Wikipedia concepts, based on Dice’s measure, and provide very efficient heuristic methods to compute top k ranking results. Furthermore, since our heuristics are based on statistical properties of scale-free networks, we show that these heuristics are applicable to other complex ontologies. Finally, in order to evaluate the measure, we have used it to solve the problem of word-sense disambiguatione the problem of word-sense disambiguation
Revid10,942 +
TheoriesUndetermined
Theory typeDesign and action +
TitleSemantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation
Unit of analysisArticle +
Urlhttp://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-355/turdakov.pdf +
Wikipedia coverageMain topic +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageNot specified +
Wikipedia page typeArticle +
Year2008 +