Schema evolution in Wikipedia - toward a web information system benchmark

From WikiLit
Jump to: navigation, search
Publication (help)
Schema evolution in Wikipedia - toward a web information system benchmark
Authors: Carlo A. Curino, Hyun J. Moon, Letizia Tanca, Carlo Zaniolo [edit item]
Citation: International Conference on Enterprise Information Systems  : . 2008.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Schema evolution in Wikipedia - toward a web information system benchmark is a publication by Carlo A. Curino, Hyun J. Moon, Letizia Tanca, Carlo Zaniolo.


[edit] Abstract

Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an indepth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known example of a large family of web information systems built using the open-source software MediaWiki. Our study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, we briefly discuss documentation and automation support systems for database evolution, and suggest that the Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.

[edit] Research questions

"The main contributions of this paper are the following: (i) we present the first schema evolution analysis of a real-life Web Information System DB, by studying the MediaWiki DB backend. This provides a deep insight on Wikipedia, one of the ten most popular websites to date and reveals the need for DB schema evolution and versioning techniques, and (ii) we provide and plant the seeds of the first public, real-life-based, benchmark for schema evolution, which will offer to researchers and practitioners a rich data-set to evaluate their approaches and solutions. As a part of the benchmark, we also release a simple but effective tool-suite for evolution analysis."

Research details

Topics: Other corpus topics, Technical infrastructure [edit item]
Domains: Computer science [edit item]
Theory type: Analysis [edit item]
Wikipedia coverage: Main topic [edit item]
Theories: "Undetermined" [edit item]
Research design: Statistical analysis [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Longitudinal [edit item]
Unit of analysis: Website [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Not specified [edit item]

[edit] Conclusion

"Our study shows that MediaWiki has undergone a very intensive schema evolution, as a result of the cooperative, multi-party, opensource development and administration that is common in leading-edge WIS projects. Thus, the WIS environment, (i) contrasts with the smaller, less-open and slow-turnover setting of typical in traditional information systems, (ii) creates a more urgent needs for better automation and documentation tools for supporting graceful schema evolution in WIS. In this paper we analyze and quantify the schema evolution problem of WIS and introduce concepts and tools that represent an important first step toward realizing (ii). At the conceptual level, we have introduced the Schema Modification Operators (SMOs), that proved effective both in an operational mode to support schema evolution (Moon et al., 2008; Curino et al., 2008b), and in an “a posteriori” mode to support indepth analysis. Moreover, we also developed a simple set of software tools to facilitate the analysis of schema evolution, and the derivation of the SMOs describing such an evolution. This tool-suite proved effective in the analysis of MediaWiki and is available online at (Curino et al., 2008a)."

[edit] Comments

"MediaWiki Schema"


Further notes[edit]

Facts about "Schema evolution in Wikipedia - toward a web information system benchmark"RDF feed
AbstractEvolving the database that is at the core Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an indepth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known example of a large family of web information systems built using the open-source software MediaWiki. Our study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, we briefly discuss documentation and automation support systems for database evolution, and suggest that the Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.rk for testing and improving such systems.
Added by wikilit teamAdded on initial load +
Collected data time dimensionLongitudinal +
CommentsMediaWiki Schema
ConclusionOur study shows that MediaWiki has undergoOur study shows that MediaWiki has undergone a very intensive schema evolution, as a result of the cooperative, multi-party, opensource development and administration that is common in leading-edge WIS projects. Thus, the WIS environment, (i) contrasts with the smaller, less-open and slow-turnover setting of typical in traditional information systems, (ii) creates a more urgent needs for better automation and documentation tools for supporting graceful schema evolution in WIS. In this paper we analyze and quantify the schema evolution problem of WIS and introduce concepts and tools that represent an important first step toward realizing (ii). At the conceptual level, we have introduced the Schema Modification Operators (SMOs), that proved effective both in an operational mode to support schema evolution (Moon et al., 2008; Curino et al., 2008b), and in an “a posteriori” mode to support indepth analysis. Moreover, we also developed a simple set of software tools to facilitate the analysis of schema evolution, and the derivation of the SMOs describing such an evolution. This tool-suite proved effective in the analysis of MediaWiki and is available online at (Curino et al., 2008a).vailable online at (Curino et al., 2008a).
Data sourceWikipedia pages +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Schema%2Bevolution%2Bin%2BWikipedia%2B-%2Btoward%2Ba%2Bweb%2Binformation%2Bsystem%2Bbenchmark%22 +
Has authorCarlo A. Curino +, Hyun J. Moon +, Letizia Tanca + and Carlo Zaniolo +
Has domainComputer science +
Has topicOther corpus topics + and Technical infrastructure +
Peer reviewedYes +
Publication typeConference paper +
Published inInternational Conference on Enterprise Information Systems +
Research designStatistical analysis +
Research questionsThe main contributions of this paper are tThe main contributions of this paper are the following: (i) we present the first schema evolution analysis of a real-life Web Information System DB, by studying the MediaWiki DB backend. This provides a deep insight on Wikipedia, one of the ten most popular websites to date and reveals the need for DB schema evolution and versioning techniques, and (ii) we provide and plant the seeds of the first public, real-life-based, benchmark for schema evolution, which will offer to researchers and practitioners a rich data-set to evaluate their approaches and solutions. As a part of the benchmark, we also release a simple but effective tool-suite for evolution analysis.fective tool-suite for evolution analysis.
Revid10,934 +
TheoriesUndetermined
Theory typeAnalysis +
TitleSchema evolution in Wikipedia - toward a web information system benchmark
Unit of analysisWebsite +
Urlhttp://www.cs.ucla.edu/~zaniolo/papers/ICEIS2008.pdf +
Wikipedia coverageMain topic +
Wikipedia data extractionDump +
Wikipedia languageNot specified +
Wikipedia page typeArticle +
Year2008 +