Browse wiki

Jump to: navigation, search
Using Wikipedia to bootstrap open information extraction
Abstract We often use ‘Data Management’ to refer toWe often use ‘Data Management’ to refer to the manipu- lation of relational or semi-structured information, but much of the world’s data is unstructured, for example the vast amount of natural-language text on the Web. The ability to manage the information underlying this unstructured text is therefore increasingly important. While information retrieval tech- niques, as embodied in today’s sophisticated search engines, offer important capabilities, they lack the most important faculties found in relational databases: 1) queries compris- ing aggregation, sorting and joins, and 2) structured visual- ization such as faceted browsing [29].al- ization such as faceted browsing [29].
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments We advocate an alternative approach: using Wikipedia to generate relation-specific training data for a broad set of thousands of relations.
Conclusion This paper describes Kylin, which uses selThis paper describes Kylin, which uses self-supervised learning to train relationally-targeted extractors from Wikipedia infoboxes. We explained how shrinkage and retraining allow Kylin to improve extractor robustness, and we demonstrate that these extractors can successfully mine tuples from a broader set of Web pages. Finally, we argued that the best way to utilize human efforts is by inviting humans to quickly validate the correctness of machine-generated extractions.rectness of machine-generated extractions.
Data source Experiment responses  + , Wikipedia pages  +
Doi 10.1145/1519103.1519113 +
Google scholar url  +
Has author Daniel S. Weld + , Raphael Hoffmann + , Fei Wu +
Has domain Computer science +
Has topic Information extraction +
Issue 4  +
Pages 62-68  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in ACM SIGMOD Record +
Research design Experiment  +
Research questions this paper presents Kylin as a case study this paper presents Kylin as a case study of open IE. We start by describing Kylin’s use of Wikipedia to power the self-supervised training of information extractors. Then, in Section 3 we show how Wikipedia training can be seen as a bootstrapping method enabling extraction from the wider set of general Web pages. Not even the best machine-learning algorithms have production-level precisionalgorithms have production-level precision
Revid 11,183  +
Theories Undetermined
Theory type Design and action  +
Title Using Wikipedia to bootstrap open information extraction
Unit of analysis Article  +
Url  +
Volume 37  +
Wikipedia coverage Other  +
Wikipedia data extraction Dump  +
Wikipedia language Not specified  +
Wikipedia page type Article  +
Year 2009  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:32:28  +
Categories Information extraction  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:32:09  +
hide properties that link here 
  No properties link to this page.


Enter the name of the page to start browsing from.