In expert evaluations, nearly one–third of … In expert evaluations, nearly one–third of the featured articles assessed were found to fail Wikipedia’s own featured article criteria. As such, the featured article process can hardly be considered successful. It is not especially surprising that Wikipedia’s non–expert contributors are not able to adequately assess the quality of featured articles. Other research into quality on Wikipedia suggests that the participants in the featured article process apply rather unsophisticated criteria to their decisions.
Blumenstock (2008), for example, showed that article length (as measured by word count) is an excellent predictor of whether an article is featured or not (identifying featured articles with 96.3 percent accuracy); a method more accurate than many more complicated measures. While this can be taken as evidence that Wikipedia’s longest articles are its best, it seems more likely, in light of the evidence presented here, that the featured article process (due to the non–expert nature of its participants) focuses on easily measured attributes like length rather than on actual quality judgments. It is easier for a non–expert to judge length than true comprehensiveness.
For Wikipedia, then, it seems that if the featured article process is to serve as an effective means of quality control, it must be changed. The most obvious way to improve the process would be to include the input of experts. Wikipedia contributors have a tendency to reject the input of outsiders and to suggest that experts will not work for free, but involving a few outside reviewers in the featured article process would not be especially difficult. Over the first eight months of 2009, Wikipedia identified an average of 44 featured articles per month, which is hardly an overwhelming number to review. More importantly, given Wikipedia’s importance as a source of information for the general public, scholars are coming to recognize that they need to be concerned with its content. For example, an editorial in Nature called on scientific researchers to “read Wikipedia cautiously and amend it enthusiastically.”  Laurent and Vickers (2009) suggest that medical doctors should do the same, in order to ensure that good health information is available to patients. Given these views from experts, and the small number of featured articles, it seems quite possible that experts could be involved in the process, improving its effectiveness. Furthermore, involving expert reviewers in the featured article process could serve as a gateway to introduce more scholars to Wikipedia.
For future scholars, the data presented here should serve as a caution when undertaking research into quality on Wikipedia. Previous efforts to determine what leads to the production of high–quality content in Wikipedia have often (as with Poderi (2009) or Huberman and Wilkinson (2007)) assumed that featured articles are of high quality, or at least represent the best of Wikipedia. If featured articles are not in fact of high quality, then such research does nothing more than show what leads to the production of featured articles, which is hardly an interesting research program.
The fact that featured articles are not necessarily of high quality, however, does not necessarily suggest that they are no better than other articles on Wikipedia. It seems almost absurd, considering that more that more than one million of Wikipedia’s articles are “stubs” (short articles of only a few sentences), to suggest that featured articles are no better than average. On the other hand, there is reason to believe that Wikipedia’s featured articles are not much better than other reasonably developed articles on the site. Two other surveys, both done by newspapers, have asked experts to grade Wikipedia entries on a scale of one to 10 (one by the Guardian and one by the Mail and Guardian; see van Noort, 2005). Together, these two studies produced assessments of 15 non–featured articles. The scores of these articles averaged a 6.2 (only slightly lower than the average of 7 for the featured articles evaluated for this paper) and two of the articles from these studies received a score of 10. Out of the 15 articles evaluated in these studies, seven scored a seven or better, indicating that they are comparable in quality to the average featured article. To put it simply, being a featured article may not mean much at all. Thus I suggest that rather than accepting Wikipedia’s assertion that its featured articles are the best, future scholars should use a more sophisticated approach. should use a more sophisticated approach.