Browse wiki

Jump to: navigation, search
Wikipedia revision toolkit: efficiently accessing Wikipedia's edit history
Abstract We present an open-source toolkit which alWe present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that, the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. By using a dedicated storage format, our toolkit massively decreases the data volume to less than 2% of the original size, and at the same time provides an easy-to-use interface to access the revision data. The language-independent design allows to process any language represented in Wikipedia. We expect this work to consolidate NLP research using Wikipedia in general, and to foster research making use of the knowledge encoded in Wikipedia’s edit history.ledge encoded in Wikipedia’s edit history.
Added by wikilit team Yes  +
Collected data time dimension Longitudinal  +
Conclusion In this paper, we presented an open-sourceIn this paper, we presented an open-source toolkit which extends JWPL, an API for accessing Wikipedia, with the ability to reconstruct past states of Wikipedia, and to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia, and is also a requirement for the creation of time-based series of Wikipedia snapshots and for assessing the influence of Wikipedia growth on NLP algorithms. Furthermore, Wikipedia’s edit history has been shown to be a valuable knowledge source for NLP, which is hard to access because of the lack of efficient tools for managing the huge amount of revision data. By utilizing a dedicated storage format for the revisions, our toolkit massively decreases the amount of data to be stored. At the same time, it provides an easyto-use interface to access the revision data. We expect this work to consolidate NLP research using Wikipedia in general, and to foster research making use of the knowledge encoded in Wikipedia’s edit history. The toolkit will be made available as part of JWPL, and can be obtained from the project’s website at Google Code. (http:// jwpl.googlecode.com)Google Code. (http:// jwpl.googlecode.com)
Conference location Portland, Oregon, USA +
Data source Experiment responses  + , Wikipedia pages  +
Dates 21 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Wikipedia%2BRevision%2BToolkit%3A%2BEfficiently%2BAccessing%2BWikipedia%27s%2BEdit%2BHistory%22  +
Has author Oliver Ferschke + , Torsten Zesch + , Iryna Gurevych +
Has domain Computer science +
Has topic Other natural language processing topics +
Month June  +
Pages 97-102  +
Peer reviewed Yes  +
Publication type Conference paper  +
Published in 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies +
Research design Experiment  +
Research questions We present an open-source toolkit which alWe present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia.ious experimental work based on Wikipedia.
Revid 11,100  +
Theories Undetermined?
Theory type Design and action  +
Title Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia's Edit History
Unit of analysis Edit  +
Url http://dl.acm.org/citation.cfm?id=2002440.2002457  +
Wikipedia coverage Main topic  +
Wikipedia data extraction Dump  +
Wikipedia language English  +
Wikipedia page type Article  + , History  +
Year 2011  +
Creation dateThis property is a special property in this wiki. 17 October 2012 15:44:49  +
Categories Other natural language processing topics  + , Computer science  + , Publications with missing comments  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:33:04  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.