Analysis of community structure in Wikipedia

Analysis of Community Structure in Wikipedia
Dmitry Lizorkin, Olena Medelyan, Maria Grineva
Citation: 18th int. conf. on World Wide Web (WWW)  : 1221-1222. 2009. Madrid, Spain.
Publication type: Conference paper
Peer-reviewed: Yes
Analysis of Community Structure in Wikipedia is a publication by Dmitry Lizorkin, Olena Medelyan, Maria Grineva.

Abstract

We present the results of a community detection analysis of the Wikipedia graph. Distinct communities in Wikipedia contain semantically closely related articles. The central topic of a community can be identified using PageRank. Extracted communities can be organized hierarchically similar to manually created Wikipedia category structure.

Research questions

Conclusion

References

  • F. Bellomi and R. Bonato. Network analysis for Wikipedia. In Wikimania, 2005.
  • A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70:066111, 2004.
  • D. Milne and I. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Wikipedia and AI workshop at the AAAI, 2008

Further notes

The entire Wikipedia graph can be automatically organized into a hierarchy of communities comprising thematically related Wikipedia articles. Combined with the PageRank analysis to identify their central topics, we can automatically produce an ontological structure similar to the existing Wikipedia category tree. Evaluation of the accuracy of such structure will be a part of our future work, however the initial experiments demonstrate the potential of our method.

The community-detection analysis is fully language-independent. Thus, it will be particular useful for Wikipedias in languages, where a category structure is not as well developed as in the English Wikipedia. Furthermore, community detection analysis could be used to improve existing categories, created by humans without the knowledge of the integral hyperlink organization, or to augment Wikipedia search results with same-community terms.