Temporal analysis of the Wikigraph
Authors: | Luciana S. Buriol, Carlos Castillo, Debora Donato, Stefano Leonardi, Stefano Millozzi [edit item] |
Citation: | WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence : 45-51. 2006. |
Publication type: | Conference paper |
Peer-reviewed: | Yes |
DOI: | 10.1109/WI.2006.164. |
Google Scholar cites: | Citations |
Link(s): | Paper link |
Added by Wikilit team: | Added on initial load |
Contents
[edit] Abstract
Wikipedia is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a "Wikigraph" a graph that represents the link structure of Wikipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are explicit timestamps associated with each node's events. This allows us to do a detailed analysis of the Wikipedia evolution over time. In the first part of this study we characterize this evolution in terms of users editions and articles; in the second part we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available.
[edit] Research questions
""Wikigraph", a graph that represents the link structure of Wikipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are explicit timestamps associated with each node's events. This allows us to do a detailed analysis of the Wikipedia evolution over time. In the first part of this study we characterize this evolution in terms of users, editions and articles; in the second part, we depict the temporal evolution of several topological properties of the Wikigraph."
Research details
Topics: | Vandalism, Participation trends [edit item] |
Domains: | Computer science [edit item] |
Theory type: | Analysis [edit item] |
Wikipedia coverage: | Main topic [edit item] |
Theories: | "Undetermined" [edit item] |
Research design: | Content analysis, Statistical analysis [edit item] |
Data source: | Wikipedia pages [edit item] |
Collected data time dimension: | Longitudinal [edit item] |
Unit of analysis: | Website [edit item] |
Wikipedia data extraction: | Dump [edit item] |
Wikipedia page type: | Article, History [edit item] |
Wikipedia language: | English [edit item] |
[edit] Conclusion
"The observation of the Wikipedia provides mixed signals of growth and maturity of this collection. Signs of transient regime (growth): • The number of articles, updates, visitors and editors is still growing exponentially. • The size of articles is still growing linearly. • The number of links per article is also growing linearly, slowly than the amount of text. • The number of reverts is growing slowly, which may signal more vandalism, but the number of double reverts (revert wars) has stabilized. Signs of permanent regime (maturity): • There is a clear power-law distribution of the indegree and outdegree. • The average edits per user has been mostly constant in the last two years. • There is a high correlation between PageRank and indegree, indicating that the microscopic connectivity of the encyclopedia resembles its mesoscopic properties. • The clustering coefﬁcient and edge reciprocity of links have remained basically constant during the last two years. • Over 2/3 of the articles belong now to the larger strongly connected component."
[edit] Comments
"The observation of the Wikipedia provides mixed signals of increasing growth and constant maturity of this collection."
Further notes[edit]
Year | 2006 + |