Browse wiki

Jump to: navigation, search
Autonomously semantifying Wikipedia
Abstract Berners-Lee’s compelling vision of a SemanBerners-Lee’s compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method — creating enough structured data to motivate the development of applications. This paper argues that autonomously “Semantifying Wikipedia” is the best way to solve the problem. We choose Wikipedia as an initial data source, because it is comprehensive, not too large, high-quality, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. We identify several types of structures which can be automatically enhanced in Wikipedia (e.g., link structure, taxonomic data, infoboxes, etc.), and we describe a prototype implementation of a self-supervised, machine learning system which realizes our vision. Preliminary experiments demonstrate the high precision of our system’s extracted data — in one case equaling that of humans.ata — in one case equaling that of humans.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments A system was proposed to extract structured data from Wikipedia; the main issues result "from Wikipedia’s growth through decentralized authoring... [Also, the] list and category information [in Wikipedia] is rudimentary" p. 49
Conclusion We propose bootstrapping the SemanticWeb bWe propose bootstrapping the SemanticWeb by miningWikipedia and we identify some unique challenges (lack of redundancy) and opportunities (unique identifiers, user-supplied training data, lists, categories, etc.) of this approach. We also identify additional issues resulting from Wikipedia’s growth through decentralized authoring (e.g., inconsistency, schema drift, etc.). This high-level analysis should benefit future work onWikipedia and similar collaborative knowledge repositories. • We describe a systems for automatically generating attribute/value pairs summarizing an article’s properties. Based on self-supervised learning, KYLIN achieves performance which is roughly comparable with that of human editors. In one case, KYLIN does even better. • By automatically identifying missing internal links for proper nouns, more semantic tags are added. Because these links resolve noun phrases to unique identifiers, they are useful for many purposes such as information retrieval, structural analysis, and further semantic processing. Meaning lies in the graph structure of concepts defined in terms of each other, and KYLIN helps complete that graph. • Collaboratively authored data is rife with noise and incompleteness. We identify robust learning methods which can cope in this environment. Extensive experiments demonstrate the performance of our system and characterize some of the crucial architectural choices (e.g., the optimal ordering of heuristics, the utility of classifier-based training data refinement, a pipelined architecture for attribute extraction).ed architecture for attribute extraction).
Conference location Lisboa, Portugal +
Data source Experiment responses  + , Wikipedia pages  +
Dates 6-9 +
Doi 10.1145/1321440.1321449 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Autonomously%2Bsemantifying%2BWikipedia%22  +
Has author Fei Wu + , Daniel S. Weld +
Has domain Computer science +
Has topic Information extraction +
Month November  +
Pages 41-50  +
Peer reviewed Yes  +
Publication type Conference paper  +
Published in CIKM '07 Proceedings of the sixteenth ACM conference on Conference on information and knowledge management +
Publisher Association for Computing Machinery +
Research design Experiment  +
Research questions Berners-Lee’s compelling vision of a SemanBerners-Lee’s compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method — creating enough structured data to motivate the development of applications. This paper argues that autonomously “SemantifyingWikipedia” is the best way to solve the problem. We choose Wikipedia as an initial data source, because it is comprehensive, not too large, high-quality, and contains enough manuallyderived structure to bootstrap an autonomous, self-supervised process. We identify several types of structures which can be automatically enhanced in Wikipedia (e.g., link structure, taxonomic data, infoboxes, etc.), and we describe a prototype implementation of a self-supervised, machine learning system which realizes our vision.learning system which realizes our vision.
Revid 10,676  +
Theories Undetermined
Theory type Design and action  +
Title Autonomously semantifying Wikipedia
Unit of analysis Article  +
Url http://dl.acm.org/citation.cfm?id=1321449  +
Wikipedia coverage Main topic  +
Wikipedia data extraction Dump  +
Wikipedia language Not specified  +
Wikipedia page type Article  +
Year 2007  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:24:09  +
Categories Information extraction  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:20:49  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.