Ranking of Wikipedia articles in search engines revisited: fair ranking for reasonable quality?
Abstract Abstract This paper aims to review the fieAbstract This paper aims to review the fiercely discussed question of whether the ranking of Wikipedia articles in search engines is justified by the quality of the articles. After an overview of current research on information quality in Wikipedia, a summary of the extended discussion on the quality of encyclopedic entries in general is given. On this basis, a heuristic method for evaluating Wikipedia entries is developed and applied to Wikipedia articles that scored highly in a search engine retrieval effectiveness test and compared with the relevance judgment of jurors. In all search engines tested, Wikipedia results are unanimously judged better by the jurors than other results on the corresponding results position. Relevance judgments often roughly correspond with the results from the heuristic evaluation. Cases in which high relevance judgments are not in accordance with the comparatively low score from the heuristic evaluation are interpreted as an indicator of a high degree of trust in Wikipedia. One of the systemic shortcomings of Wikipedia lies in its necessarily incoherent user model. A further tuning of the suggested criteria catalog, for instance, the different weighing of the supplied criteria, could serve as a starting point for a user model differentiated evaluation of Wikipedia articles. Approved methods of quality evaluation of reference works are applied to Wikipedia articles and integrated with the question of search engine evaluation. the question of search engine evaluation.
Collected data time dimension Cross-sectional  +
Conclusion In general, our study confirms that the raIn general, our study confirms that the ranking of Wikipedia articles in search engines is justified by a satisfactory overall quality of the articles. For general informational queries, the negative assessment of Wikipedia articles could not be reinforced with the exception of relatively poor quality concerning orthographical and grammatical correctness. Our study showed that despite the intense research on Wikipedia quality there is still a lack of commonly agreed on authoritative heuristics as well as evaluation methods (research question 1). However, from the range of existing quality criteria we were able to derive a heuristics adequate for evaluating Wikipedia articles (research question 2). Jurors agreed on the provided criteria catalog (research question 2a). Our heuristic method is apt for the task of detecting quality distinctions, as the quality differences between articles in the sample were clearly noticeable (research question 2c). In answer to research question 4b, 4c (“Is the ranking appropriate? Are good entries ranked high enough?”), we can say that the rankings in search engines are at least appropriate. According to the user judgment of relevancy, the search engine providers would even be well advised to rank Wikipedia articles even higher than they do now. However, a definite assessment is difficult, as relevance judgment is too multifarious and not solely dependent on content quality of the result. Regarding the correspondence of relevance judgments and scores from the heuristic evaluation (research question 4a), we found some conformance, but as relevance is a multifaceted concept, the results can only give an indication with regard to the reliability of the ranking. In conclusion, the ranked articles were useful (research question 4b). We did not find articles that were useless (research question 4). However, usefulness varied considerably. While we assume that users' trust in Wikipedia lets them judge most articles as relevant, based on the heuristic evaluation we cannot recommend always showing Wikipedia results on top of the results list.ipedia results on top of the results list.
Data source Experiment responses  + , Wikipedia pages  +
Doi 10.1002/asi.21423 +
Google scholar url  +
Has author Dirk Lewandowski + , Ulrike Spree +
Has domain Information science +
Has topic Ranking and popularity + , Computational estimation of trustworthiness +
Peer reviewed Yes  +
Publication type Journal article  +
Published in Journal of the American Society for Information Science and Technology +
Research design Experiment  +
Research questions This paper aims to review the fiercely discThis paper aims to review the fiercely discussed question of whether the ranking of Wikipedia articles in search engines is justified by the quality of the articles. 1. Which applicable quality standards (heuristics) exist for evaluating Wikipedia articles? In what context were they developed and applied and do they justice to the generic markings of Wikipedia articles? 2. Based on the research on existing quality standards, we developed our own heuristics. With the help of these heuristics human evaluators should be able to make sound and intersubjectively comprehensible quality judgments of individual Wikipedia articles. As we wanted to develop an easy-to-apply tool our heuristic had to meet the following requirements: a. Human evaluators can evaluate individual Wikipedia articles on the basis of the provided criteria catalog and can agree whether a given article meets a certain criterion or not. b. On the basis of the criteria catalog human evaluators attain similar evaluating scores for the same article. c. On the basis of the criteria catalog noticeable differences in quality of Wikipedia articles can be determined. 3. The calibrated heuristic was applied to Wikipedia articles that scored highly in the retrieval test to find out: a. whether there exist noticeable differences in quality among the examples of our sample; b. whether there are really bad articles among the highly ranked articles. 4. On this basis new insight into the user judgment of Wikipedia hits is possible as it can now be analyzed: a. how user relevance judgments of the Wikipedia hits in the search engine results correspond with scores from the heuristic evaluation; b. how useful the ranked articles are; c. whether the ranking is appropriate, respectively whether good entries are ranked high enough.ether good entries are ranked high enough.
Theories In the theory of specialized lexicography,In the theory of specialized lexicography, quality management is firmly grounded on the determination of a user structure consisting of the three aspects of user presupposition: degree of expertise such as layperson or expert, user situation referring to the actual usage such as text production or understanding, and user intention, which can widely vary from gathering factual information to background information or references (Geeb, 1998). So far, Wikipedia has no determined user structure and is trying to serve the needs of the general user as well as the expert. Based on this, it could be concluded that quality problems are to be expected, especially for articles in arcane academic areas like mathematics, as the knowledge gap between the general user and the specialist is large. In accordance with our theoretical assumption (see previous section) that the quality of an encyclopedia article should always be evaluated not only against the aims and objectives of the encyclopedia but also against its user structure and expectations, we strove to design a flexible and adaptable a flexible and adaptable heuristic.
Theory type Analysis  +
Title Ranking of Wikipedia articles in search engines revisited: fair ranking for reasonable quality?
Unit of analysis Article  +
Url  +
Wikipedia coverage Main topic  +
Wikipedia data extraction Live Wikipedia  +
Wikipedia language German  +
Wikipedia page type Article  +
Year 2011  +
Categories Ranking and popularity  + , Computational estimation of trustworthiness  + , Information science  + , Publications with missing comments  + , Publications  +
