Analysis of community structure in Wikipedia
|Analysis of Community Structure in Wikipedia|
|Authors:||Dmitry Lizorkin, Olena Medelyan, Maria Grineva|
|Citation:||18th int. conf. on World Wide Web (WWW) : 1221-1222. 2009. Madrid, Spain.|
|Publication type:||Conference paper|
|Google Scholar cites:||Citations|
|Added by Wikilit team:||No|
|Article:||Google Scholar BASE PubMed|
|Other scholarly wikis:||AcaWiki Brede Wiki WikiPapers|
|Web search:||Bing Google Yahoo! — Google PDF|
We present the results of a community detection analysis of the Wikipedia graph. Distinct communities in Wikipedia contain semantically closely related articles. The central topic of a community can be identified using PageRank. Extracted communities can be organized hierarchically similar to manually created Wikipedia category structure.
|Theory type:||Missing theory_type|
|Collected data time dimension:|
|Unit of analysis:||Missing unit_of_analysis|
|Wikipedia data extraction:||Missing wikipedia_data_extraction|
|Wikipedia page type:||Missing wikipedia_page_type|
|Wikipedia language:||Missing wikipedia_language|
- F. Bellomi and R. Bonato. Network analysis for Wikipedia. In Wikimania, 2005.
- A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70:066111, 2004.
- D. Milne and I. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Wikipedia and AI workshop at the AAAI, 2008
The entire Wikipedia graph can be automatically organized into a hierarchy of communities comprising thematically related Wikipedia articles. Combined with the PageRank analysis to identify their central topics, we can automatically produce an ontological structure similar to the existing Wikipedia category tree. Evaluation of the accuracy of such structure will be a part of our future work, however the initial experiments demonstrate the potential of our method.
The community-detection analysis is fully language-independent. Thus, it will be particular useful for Wikipedias in languages, where a category structure is not as well developed as in the English Wikipedia. Furthermore, community detection analysis could be used to improve existing categories, created by humans without the knowledge of the integral hyperlink organization, or to augment Wikipedia search results with same-community terms.