Browse wiki

Jump to: navigation, search
YAWN: a semantically annotated Wikipedia XML corpus
Abstract Abstract: The paper presents YAWN, a systeAbstract: The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.n be exploited for high-precision queries.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Conclusion This paper presented YAWN, a project to create an XML version of Wikipedia with semantic information. We showed how to extract semantics from categories, lists, and template invocations, yielding a huge XML corpus annotated with semantically rich tags.
Data source Wikipedia pages  +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22YAWN%3A%2Ba%2Bsemantically%2Bannotated%2BWikipedia%2BXML%2Bcorpus%22  +
Has author Ralf Schenkel + , Fabian M. Suchanek + , Gjergji Kasneci +
Has domain Computer science +
Has topic Semantic relatedness + , Research platform +
Peer reviewed Yes  +
Publication type Journal article  +
Published in Lecture Notes in Informatics +
Research design Content analysis  +
Research questions The paper presents YAWN, a system to conveThe paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters.ations of templates with named parameters.
Revid 11,115  +
Theories Undetermined
Theory type Design and action  +
Title YAWN: a semantically annotated Wikipedia XML corpus
Unit of analysis Article  +
Url http://www.suchanek.name/work/publications/btw2007.pdf  +
Wikipedia coverage Main topic  +
Wikipedia data extraction Dump  +
Wikipedia language English  +
Wikipedia page type Article  +
Year 2007  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:36:50  +
Categories Semantic relatedness  + , Research platform  + , Computer science  + , Publications with missing comments  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:34:24  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.