Learning to tag and tagging to learn: a case study on Wikipedia

From WikiLit
Jump to: navigation, search
Publication (help)
Learning to tag and tagging to learn: a case study on Wikipedia
Authors: Peter Mika, Massimiliano Ciaramita, Hugo Zaragoza, Jordi Atserias [edit item]
Citation: IEEE Intelligent Systems 23 (5): 26-33. 2008 October.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1109/MIS.2008.85.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Learning to tag and tagging to learn: a case study on Wikipedia is a publication by Peter Mika, Massimiliano Ciaramita, Hugo Zaragoza, Jordi Atserias.


[edit] Abstract

The problem of semantically annotating Wikipedia inspires a novel method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available.

[edit] Research questions

"In this article, we investigate how to use standard named-entity recognition (NER) technology to significantly enrich the metadata available in Wikipedia. By using this knowledge, we also examine how to generate additional training data to improve NER technology without additional human intervention."

Research details

Topics: Information extraction [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "We hypothesize that the lack of performance increase is because the two distributions being combined are too different, a typical domain

adaptation problem in NLP" [edit item]

Research design: Statistical analysis [edit item]
Data source: Archival records, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"Our investigation of enriching the DBpedia metadata collection through the use of an NLP tagger and statistical analysis provided significant information. The results undoubtedly will be useful for many Wikipedia-specific tasks, such as mapping templates, cleaning up infobox data, and providing better searching and browsing experiences. Because Wikipedia’s domain is broad, we expect that our data sets will serve as useful background knowledge in other applications. For example, we’ve shown how to apply the data toward the problem of improving our baseline tagger used for semantic annotation."

[edit] Comments

"Wikipedia pages; secondary data (2003 Conference on Natural Language Learning (CoNLL) English NER data set.)"


Further notes[edit]

Facts about "Learning to tag and tagging to learn: a case study on Wikipedia"RDF feed
AbstractThe problem of semantically annotating Wikipedia inspires a novel method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsWikipedia pages; secondary data (2003 Conference

on Natural Language Learning (CoNLL)

English NER data set.)
ConclusionOur investigation of enriching the DBpediaOur investigation of enriching the DBpedia

metadata collection through the use of an NLP tagger and statistical analysis provided significant information. The results undoubtedly will be useful for many Wikipedia-specific tasks, such as mapping templates, cleaning up infobox data, and providing better searching and browsing experiences. Because Wikipedia’s domain is broad, we expect that our data sets will serve as useful background knowledge in other applications. For example, we’ve shown how to apply the data toward the problem of improving our baseline tagger used for semantic annotation.eline tagger used for semantic

annotation.
Data sourceArchival records + and Wikipedia pages +
Doi10.1109/MIS.2008.85 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Learning%2Bto%2Btag%2Band%2Btagging%2Bto%2Blearn%3A%2Ba%2Bcase%2Bstudy%2Bon%2BWikipedia%22 +
Has authorPeter Mika +, Massimiliano Ciaramita +, Hugo Zaragoza + and Jordi Atserias +
Has domainComputer science +
Has topicInformation extraction +
Issue5 +
MonthOctober +
Pages26-33 +
Peer reviewedYes +
Publication typeJournal article +
Published inIEEE Intelligent Systems +
Research designStatistical analysis +
Research questionsIn this article, we investigate how to useIn this article, we investigate how to use

standard named-entity recognition (NER) technology to significantly enrich the metadata available in Wikipedia. By using this knowledge, we also examine how to generate additional training data to improve NER

technology without additional human intervention.
ogy without additional human intervention.
Revid10,851 +
TheoriesWe hypothesize that the lack of performance increase is because the two distributions being combined are too different, a typical domain adaptation problem in NLP
Theory typeDesign and action +
TitleLearning to tag and tagging to learn: a case study on Wikipedia
Unit of analysisArticle +
Urlhttp://dx.doi.org/10.1109/MIS.2008.85 +
Volume23 +
Wikipedia coverageSample data +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageNot specified +
Wikipedia page typeArticle +
Year2008 +