Browse wiki

Jump to: navigation, search
Information arbitrage across multi-lingual Wikipedia
Abstract The rapid globalization of Wikipedia is geThe rapid globalization of Wikipedia is generating a parallel, multi-lingual corpus of unprecedented scale. Pages for the same topic in many different languages emerge both as a result of manual translation and independent development. Unfortunately, these pages may appear at different times, vary in size, scope, and quality. Furthermore, differential growth rates cause the conceptual mapping between articles in different languages to be both complex and dynamic. These disparities provide the opportunity for a powerful form of information arbitrage-leveraging articles in one or more languages to improve the content in another. Analyzing four large language domains (English, Spanish, French, and German), we present Ziggurat, an automated system for aligning Wikipedia infoboxes, creating new infoboxes as necessary, filling in missing information, and detecting discrepancies between parallel pages. Our method uses self-supervised learning and our experiments demonstrate the method's feasibility, even in the absence of dictionaries.lity, even in the absence of dictionaries.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments "The system provides a unique mechanism that allows the content in one language to benefit from parallel content in others." p. 103
Conclusion The system provides a unique mechanism thaThe system provides a unique mechanism that allows the content in one language to benefit from parallel content in others. By utilizing the notion that this differential is exploitable (an arbitrage opportunity), we develop an accurate system for filling in missing infobox data.ystem for filling in missing infobox data.
Conference location Barcelona, Spain +
Data source Experiment responses  + , Wikipedia pages  +
Dates 9-12 +
Doi 10.1145/1498759.1498813 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Information%2Barbitrage%2Bacross%2Bmulti-lingual%2BWikipedia%22  +
Has author Eytan Adar + , Michael Skinner + , Daniel S. Weld +
Has domain Computer science +
Has topic Other content topics + , Text classification +
Month February  +
Pages 94-103  +
Peer reviewed Yes  +
Publication type Conference paper  +
Published in WSDM '09 Proceedings of the Second ACM International Conference on Web Search and Data Mining +
Research design Experiment  +
Research questions The globalization of Wikipedia shows no apThe globalization of Wikipedia shows no apparent slowdown and there is a unique opportunity to utilize the parallel work of editors versed in different languages. As content is created at different rates in different languages, and the quality of that content is highly variable, there is a huge opportunity to resolve differences and inconsistencies. In this paper we introduce Ziggurat, a system to automatically resolve differentials in infobox completeness.lve differentials in infobox completeness.
Revid 10,824  +
Theories Undetermined
Theory type Design and action  +
Title Information arbitrage across multi-lingual Wikipedia
Unit of analysis Article  +
Url http://dl.acm.org/citation.cfm?id=1498813  +
Wikipedia coverage Main topic  +
Wikipedia data extraction Dump  +
Wikipedia language English  + , French  + , German  + , Spanish  +
Wikipedia page type Article  +
Year 2009  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:29:00  +
Categories Other content topics  + , Text classification  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:28:54  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.