Learning to tag and tagging to learn: a case study on Wikipedia
Abstract The problem of semantically annotating Wikipedia inspires a novel method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available.
Comments Wikipedia pages; secondary data (2003 Conference on Natural Language Learning (CoNLL) English NER data set.)
Our investigation of enriching the DBpedia metadata collection through the use of an NLP tagger and statistical analysis provided significant information. The results undoubtedly will be useful for many Wikipedia-specific tasks, such as mapping templates, cleaning up infobox data, and providing better searching and browsing experiences. Because Wikipedia's domain is broad, we expect that our data sets will serve as useful background knowledge in other applications. For example, we've shown how to apply the data toward the problem of improving our baseline tagger used for semantic annotation.
Has author Peter Mika + , Massimiliano Ciaramita + , Hugo Zaragoza + , Jordi Atserias +
Published in IEEE Intelligent Systems +
In this article, we investigate how to use standard named-entity recognition (NER) technology to significantly enrich the metadata available in Wikipedia. By using this knowledge, we also examine how to generate additional training data to improve NER technology without additional human intervention.
Theories We hypothesize that the lack of performance increase is because the two distributions being combined are too different, a typical domain adaptation problem in NLP
Title Learning to tag and tagging to learn: a case study on Wikipedia
Year 2008  +
