YAWN: a semantically annotated Wikipedia XML corpus

From WikiLit
Jump to: navigation, search
Publication (help)
YAWN: a semantically annotated Wikipedia XML corpus
Authors: Ralf Schenkel, Fabian M. Suchanek, Gjergji Kasneci [edit item]
Citation: Lecture Notes in Informatics  : . 2007.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
YAWN: a semantically annotated Wikipedia XML corpus is a publication by Ralf Schenkel, Fabian M. Suchanek, Gjergji Kasneci.


[edit] Abstract

Abstract: The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.

[edit] Research questions

"The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters."

Research details

Topics: Semantic relatedness, Research platform [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Main topic [edit item]
Theories: "Undetermined" [edit item]
Research design: Content analysis [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"This paper presented YAWN, a project to create an XML version of Wikipedia with semantic information. We showed how to extract semantics from categories, lists, and template invocations, yielding a huge XML corpus annotated with semantically rich tags."

[edit] Comments


Further notes[edit]

Facts about "YAWN: a semantically annotated Wikipedia XML corpus"RDF feed
AbstractAbstract: The paper presents YAWN, a systeAbstract: The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.n be exploited for high-precision queries.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
ConclusionThis paper presented YAWN, a project to create an XML version of Wikipedia with semantic information. We showed how to extract semantics from categories, lists, and template invocations, yielding a huge XML corpus annotated with semantically rich tags.
Data sourceWikipedia pages +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22YAWN%3A%2Ba%2Bsemantically%2Bannotated%2BWikipedia%2BXML%2Bcorpus%22 +
Has authorRalf Schenkel +, Fabian M. Suchanek + and Gjergji Kasneci +
Has domainComputer science +
Has topicSemantic relatedness + and Research platform +
Peer reviewedYes +
Publication typeJournal article +
Published inLecture Notes in Informatics +
Research designContent analysis +
Research questionsThe paper presents YAWN, a system to conveThe paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters.ations of templates with named parameters.
Revid11,115 +
TheoriesUndetermined
Theory typeDesign and action +
TitleYAWN: a semantically annotated Wikipedia XML corpus
Unit of analysisArticle +
Urlhttp://www.suchanek.name/work/publications/btw2007.pdf +
Wikipedia coverageMain topic +
Wikipedia data extractionDump +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2007 +