Last modified on January 30, 2014, at 20:31

The Wikipedia XML corpus

Publication (help)
The Wikipedia XML corpus
Authors: Ludovic Denoyer, Patrick Gallinari [edit item]
Citation: ACM SIGIR Forum 40 (1): 64-69. 2006 June.
Publication type: Journal article
Peer-reviewed: Yes
DOI: 10.1145/1147197.1147210.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Format: BibTeX
The Wikipedia XML corpus is a publication by Ludovic Denoyer, Patrick Gallinari.

[edit] Abstract

Wikipedia is a well known free content, multilingual encyclopedia written collaboratively by contributors around the world. Anybody can edit an article using a wiki markup language that offers a simplified alternative to HTML. This encyclopedia is composed of millions of articles in different languages.

[edit] Research questions

"In this article, we describe a set of XML collections based on Wikipedia."

Research details

Topics: Other corpus topics, Research platform [edit item]
Domains: Computer science [edit item]
Theory type: Analysis [edit item]
Wikipedia coverage: Main topic [edit item]
Theories: "Undetermined" [edit item]
Research design: Other [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Secondary dataset [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Multiple [edit item]

[edit] Conclusion

"In this article, we describe a set of XML collections based on Wikipedia."

[edit] Comments

Further notes[edit]