Finn Årup Nielsen



Researching Wikipedia - current approaches and new directions, WikiLit - collecting the wiki and Wikipedia literature


  1. Some articles is the Wikipedia language is specified with "Not specified" even though it is implicit that it is English:
    1. Codifying collaborative knowledge: using Wikipedia as a basis for automated ontology learning
    2. Consistency without concurrency control in large, dynamic systems
  1. Problem with research details:
    1. Numerous issues with Scientific citations in Wikipedia: [1]
    2. If study uses a survey, then the research design is not a case study, but probably rather "statistical analysis". Unit of analysis is probably "user" and collected data time dimension probably cross-sectional or longitudinal.
    3. Is Research design = Grounded theory ok?
    4. Is Research design = Experiment ok? How many do experiment with Wikipedia?
    5. Are the case studies wrong?
    6. If research design is conceptual should any data be collected?
  2. The work of sustaining order in Wikipedia: the banning of a vandal: "Unit of analysis" should it not be "edit" rather than be "user". Staeiou edits [2]. Correct this if there is a problem.
  3. Content analysis should be a quantitative research design. [3].
  4. Do we have 100 conference papers? Number of conference papers added by wikilit team on initial load: 72

  1. Same Andrea Forte papers: Should one of them be left out? The categories are wrong. Decentralization in Wikipedia governance and Scaling consensus: increasing decentralization in Wikipedia governance.
  2. Research design: Missing description. What is "content analysis"? What is a "Historical analysis" in "Research design"?
  3. Categorization: There is little collaboration among the students. Why is Category:Student_contribution under Category:Collaborative culture
  4. Breaking the knowledge acquisition bottleneck through conversational knowledge management Why "Technical infrastructure"? Is Wikipedia page type "multiple"?

Solved issues

  1. Missing A gene wiki for community annotation of gene function? We have got a related paper: The gene wiki: community intelligence applied to human gene annotation, but why not the PLoS Biology paper? "A gene wiki ..." is a "community page" article.
  2. After deadline: Generating quality open content: a functional group perspective based on the time, interaction, and performance theory and Membership turnover and collaboration success in online communities: explaining rises and falls from grace in Wikipedia Solution: We have added these with research details.
  3. Problem for Wikipedia page type: Should it be checkboxes or such "Multiple" be allowed? Solution: Wikipedia page type is now checkboxes, and "multiple" is no longer there.
  4. There are two different articles for Wikipedia, Scholarpedia, and references to journals in the brain and behavioral sciences: a comparison of cited sources and recommended readings in matching free online encyclopedia entries volume 29, one is 1-2, the other in issue 3. Solution: Both have now been added.
  5. Why is Changhu Wang on the site? The user has added several other researchers [4]. Solution: the research and the other contribution from the Cybersun user was deleted.
  6. A qualitative and quantitative analysis of how Wikipedia talk pages are used is missing. It is a conference paper
  7. What is "log" compared to "history" for Wikipedia page type property? A history is a "history". Everything else is a log.
  8. Domains of Gender differences in information behavior concerning Wikipedia, an unorthodox information source?. Should we generally change the domain? What about "Topics"? The specific paper investigates students in journalism and mass-communication, and we are being inclussive in the domains.
  9. For "collected data time dimension" it is not clear if it refers to Wikipedia data only or data more generally. The property is both for Wikipedia data and non-Wikipedia data, i.e., a cross-sectional survey should be marked as "cross-sectional"
  10. What aboue "The democratization of information? Wikipedia as a reference resource". Is that a non-peer-reviewed work? It is an editorial that we leave out.
  11. For "unit of analysis" it is not clear if "user" is a reader or only a wikipedian. If is not clear is "subject" is a topic or a person under investigation (research subject). A "user" is a reader and a Wikipedian. "Subject" is topic. (see also property page)
  12. "We need higher-level synthesis" rather than citations. This issue has been discussed. Okoli has even put up a document with references about it.
  13. The gene wiki: community intelligence applied to human gene annotation seriously misclassified. Topics and domains now ok.
  14. Category question mark problem: Some articles have a question mark at the topic. This is not allowed if the Form:Publication is to work. Assigning trust to Wikipedia content, Collectivism vs. Individualism in a wiki world: librarians respond to Jaron Lanier's essay "Digital Maoism: the hazards of the new online collectivism", Investigation into trust for collaborative information repositories: a Wikipedia case study, Schema evolution in Wikipedia - toward a web information system benchmark, The Wikipedia XML corpus, Using Wikipedia at the TREC QA track, YAWN: a semantically annotated Wikipedia XML corpus
  15. "Unit of analysis": "subject", "subjects". Changed "subjects" to "subject"
  16. In Form:Publication under research_design changed "Literature review (qualitative)" to "Literature review"
  17. Several value problems for Data source (Should be fixed by Finn).
  18. Governance, organization, and democracy on the Internet: the iron law and the evolution of Wikipedia had a strange long abstract that did not correspond to the paper abstract. Why does it have a "Governance" topic?
  19. "Peer reviewed" has a case problem. Apparently not a problem
  20. Why is Category:Vandalism under Category:Participation? (Was discussed and settled)
  21. "Law" under "humanities"? Why not social sciences? (Discussed: moved to "Social sciences")
  22. Missing category description, e.g., [5] What is this?
  23. Why is "David Milne" "David N. Milne"? The "N" seems to be wrong. No that is correct
  24. Problem with "domain": Computer Science and "Information science" and "Health" (Chitu make an "Interdisciplinary" domain for Computer science Information Science and Health.). It has been moved to "Interdiscinplinary".
  25. Domains seems to have been overwritten: [6] The problem with overwritten domain is so serious that we need to compare with the original data. We have fixed this.
  26. What is Category:Text classification compared to Category:Ranking and clustering systems? Has become clearer with the definition on the on the category pages.
  27. Not clear why Category:Translation should under Category:Information retrieval. Renamed
  28. A comparison of World Wide Web resources for identifying medical information should not be "sample data". Changed
  29. Should Governance and Policies be merged? Yes, decided 2012-05-16.
  30. "Named Entity Recognition" in Learning to tag and tagging to learn: a case study on Wikipedia. The topic was changed to "Information extraction".
  31. Forced transparency: corporate image on Wikipedia and what it means for public relations seems not be be in our database. This was an error. Probably it has been missed because the journal was an ejournal.


Missing papers?


Fixed papers

  1. Clustering of scientific citations in Wikipedia (Entered)
  2. Casting the net: a multimodal network perspective on user-system interactions (Not about Wikipedia)
  3. It's a network, not an encyclopedia: a social network perspective on Wikipedia collaboration (conference paper)
  4. The WikiPhil Portal: visualizing meaningful philosophical connections (journal paper from 2009, entered by Finn)
  5. Wikipedia: community or social movement? (Notified by Piotr Konieczny, Entered by Chitu)
  6. A multimethod study of information quality in wiki collaboration (from 2011 March in ACM Transactions on Management Information Systems). Research details and summary entered by Finn .

Data for automated analysis

Year per Topics

Content: CSV Corpus: CSV General: CSV Infrastructure: CSV Participation: CSV Readership: CSV


Property, fields, form, template

Further issues

  1. Social operational information, competence, and participation in online collective action. Paper not open. Multiple property issues.
  2. Dynamics of platform-based markets. Paper not open. Multiple property issues. Problem formating
  3. Mining meaning from Wikipedia Data source is wrong. research_design is wrong
  4. Expediency-based practice? Medical students' reliance on Google and Wikipedia for biomedical inquiries seems not to be a longitudinal study.

Categorization issues

  1. Automatically refining the Wikipedia infobox ontology. This is not quite using Wikipedia but modifying Wikipedia
  2. Extracting lexical semantic knowledge from Wikipedia and Wiktionary: Why "Natural language processing" topic and not "Information extraction" and "Technical Infrastructure".
  3. Modeling events in time using cascades of Poisson processes. Why experiment.
  4. Explaining the sustainability of digital ecosystems based on the wiki model through critical mass theory Why "Health"?
  5. There is one article that uses SEM

Overwritten properties problem

  1. A five-year study of on-campus Internet use by undergraduate biomedical students ('health' has been erased from domains, is that ok?, theory_type also get changed!?)
  2. The work of sustaining order in Wikipedia: the banning of a vandal
  3. Wikipedia and academic peer review
  4. Wikipedia and lesser-resourced languages
  5. Wikipedia and 'open source' mental health information
  6. Wikipedia and osteosarcoma: a trustworthy patients' information?
  7. Wikipedia and psychology: coverage of concepts and its use by undergraduate students
  8. Wikipedia and the epistemology of testimony
  9. Wikipedia and the future of legal education
  10. Wikipedia as a tool for forestry outreach
  11. Wikipedia as an encyclopaedia of life
  12. Wikipedia as an evidence source for nursing and healthcare students
  13. Wikipedia as participatory journalism: reliable sources? Metrics for evaluating collaborative media as a news resource
  14. Wikipedia as public scholarship: communicating our impact online
  15. Wikipedia leeches? The promotion of traffic through a collaborative web format
  16. Wikipedia, critical social theory, and the possibility of rational discourse
  17. Wikipedia model for collective intelligence: a review of information quality
  18. Wikipedia, Scholarpedia, and references to books in the brain and behavioral sciences: a comparison of cited sources and recommended readings in matching free online encyclopedia entries
  19. 'Wikipedia, the free encyclopedia' as a role model? Lessons for open innovation from an exploratory examination of the supposedly democratic-anarchic nature of Wikipedia
  20. Wikipedia(s) on the language map of the world
  21. Wikipedia workload analysis for decentralized hosting


  1. Learning to tag and tagging to learn: a case study on Wikipedia
  2. Wikipedia and the disappearing "author"
  3. Wikipedia as an encyclopaedia of life