Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia

From WikiLit
Jump to: navigation, search
Publication (help)
Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia
Authors: Andrea Capocci, Francesco Rao, Guido Caldarelli [edit item]
Citation: Europhysics Letters 81 (2): 28006-1. 2008 January.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1209/0295-5075/81/28006.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia is a publication by Andrea Capocci, Francesco Rao, Guido Caldarelli.


[edit] Abstract

In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless of the statistically similar behaviour, the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method.

[edit] Research questions

"In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia."

Research details

Topics: Ontology building [edit item]
Domains: Information science [edit item]
Theory type: Analysis [edit item]
Wikipedia coverage: Case [edit item]
Theories: "Undetermined" [edit item]
Research design: Case study [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Category [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: All languages [edit item]

[edit] Conclusion

"We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless of the statistically similar behaviour, the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method."

[edit] Comments

"The varying agreement between clustering and categorization across the studied versions of Wikipedia suggests that links in Wikipedia do not necessarily imply similarity or relatedness relations."


Further notes[edit]

Facts about "Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia"RDF feed
AbstractIn this paper we investigate the nature anIn this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless of the statistically similar behaviour, the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method.te the suitability of a clustering method.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsThe varying agreement between clustering and categorization across the studied versions of Wikipedia suggests that links in Wikipedia do not necessarily imply similarity or relatedness relations.
ConclusionWe find a statistical similarity in the diWe find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless of the statistically similar behaviour, the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method.te the suitability of a clustering method.
Data sourceWikipedia pages +
Doi10.1209/0295-5075/81/28006 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Taxonomy%2Band%2Bclustering%2Bin%2Bcollaborative%2Bsystems%3A%2Bthe%2Bcase%2Bof%2Bthe%2Bon-line%2Bencyclopedia%2BWikipedia%22 +
Has authorAndrea Capocci +, Francesco Rao + and Guido Caldarelli +
Has domainInformation science +
Has topicOntology building +
Issue2 +
MonthJanuary +
Pages28006-1 +
Peer reviewedYes +
Publication typeJournal article +
Published inEurophysics Letters +
Research designCase study +
Research questionsIn this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia.
Revid10,956 +
TheoriesUndetermined
Theory typeAnalysis +
TitleTaxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia
Unit of analysisCategory +
Urlhttp://dx.doi.org/10.1209/0295-5075/81/28006 +
Volume81 +
Wikipedia coverageCase +
Wikipedia data extractionDump +
Wikipedia languageAll languages +
Wikipedia page typeArticle +
Year2008 +