Improving Wikipedia's accuracy: is edit age a solution?

From WikiLit
Jump to: navigation, search
Publication (help)
Improving Wikipedia's accuracy: is edit age a solution?
Authors: Brendan Luyt, Tay Chee Hsien Aaron, Lim Hai Thian, Cheng Kian Hong [edit item]
Citation: Journal of the American Society for Information Science and Technology 59 (2): 318-330. 2008.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1002/asi.20755.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Improving Wikipedia's accuracy: is edit age a solution? is a publication by Brendan Luyt, Tay Chee Hsien Aaron, Lim Hai Thian, Cheng Kian Hong.


[edit] Abstract

Wikipedia is fast becoming a key information source for many despite criticism that it is unreliable and inaccurate. A number of recommendations have been made to sort the chaff from the wheat in Wikipedia, among which is the idea of color-coding article segment edits according to age (Cross, 2006). Using data collected as part of a wider study published in Nature, this article examines the distribution of errors throughout the life of a select group of Wikipedia articles. The survival time of each error edit" in terms of the edit counts and days was calculated and the hypothesis that surviving material added by older edits is more trustworthy was tested. Surprisingly we find that roughly 20% of errors can be attributed to surviving text added by the first edit which confirmed the existence of a "first-mover" effect (Viegas Wattenberg Kushal 2004) whereby material added by early edits are less likely to be removed. We suggest that the sizable number of errors added by early edits is simply a result of more material being added near the beginning of the life of the article. Overall the results do not provide support for the idea of trusting surviving segments attributed to older edits because such edits tend to add more material and hence contain more errors which do not seem to be offset by greater opportunities for error correction by later edits.

[edit] Research questions

"Using data collected as part of a wider study published in Nature, this article examines the distribution of errors throughout the life of a select group of Wikipedia articles. The survival time of each “error edit” in terms of the edit counts and days was calculated and the hypothesis that surviving material added by older edits is more trustworthy was tested."

Research details

Topics: Reliability [edit item]
Domains: Information science [edit item]
Theory type: Analysis [edit item]
Wikipedia coverage: Main topic [edit item]
Theories: "Undetermined" [edit item]
Research design: Statistical analysis [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Longitudinal [edit item]
Unit of analysis: Edit [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"In this article we analysed the distribution of error by survival time in terms of edits and time. The general findings are that a significant number of error edits occur in the very first few edits. Roughly 20% of error edits appeared on the first day and on the first edit.

The ANOVA tests of the means of the error rates of three sections (divided by edit counts) show that there is no significant difference between them and so classifying edit age by edit counts does not seem useful. Analysing the error edits in terms of survival time shows that things are not as bleak as they seem, and that in terms of survival time scaled by the age of the article, many more error edits appear in the last third mostly due to the fact that more edits were made in a shorter span of time as Wikipedia's popularity grew. However, this still doesn't give much support to the feasibility of trusting edit contributions based on survival time. The main problem is that a fifth of the errors are attributable to the first error edit so that any measure of trustworthiness based on chronological order of edits (whether by time, edits, or editors) will not be able to filter out this sizeable number of errors.

It should be noted that this conclusion is particularly significant in light of the fact that the method used has a built-in bias for locating error edits later rather than earlier. First, in the case of disagreements, we use the “later” model rather than “earlier” model. Second, our method of locating the error edit is based on finding errors from the most recent edits to the earlier edit. Although reasonable care was taken to ensure that we located the edit that added the substance of the error rather than the exact wording, it is likely that in some cases we settled for an edit too soon, missing an earlier edit from a rewrite that actually implied or made the same erroneous point in a different way.

The findings of this article have serious implications for the Wikipedia community. It suggests that authors and editors need to become more familiar with the possible existence of first-mover effects in entries and, as a result, pay more attention to critically editing the existing core material rather than blindly adding more. For users of Wikipedia, it suggests that age-based mechanisms for coding trustworthiness will not be capable of elevating Wikipedia's status as a reference tool. For the foreseeable future, users will have to continue to follow the maxim “caveat emptor” in their dealings with this online encyclopaedia."

[edit] Comments

""A significant number of error edits occur in the very first few edits [and] roughly 20% of error edits appeared on the first day and on the first edit.""


Further notes[edit]

Facts about "Improving Wikipedia's accuracy: is edit age a solution?"RDF feed
AbstractWikipedia is fast becoming a key informatiWikipedia is fast becoming a key information source for many despite criticism that it is unreliable and inaccurate. A number of recommendations have been made to sort the chaff from the wheat in Wikipedia, among which is the idea of color-coding article segment edits according to age (Cross, 2006). Using data collected as part of a wider study published in Nature, this article examines the distribution of errors throughout the life of a select group of Wikipedia articles. The survival time of each error edit" in terms of the edit counts and days was calculated and the hypothesis that surviving material added by older edits is more trustworthy was tested. Surprisingly we find that roughly 20% of errors can be attributed to surviving text added by the first edit which confirmed the existence of a "first-mover" effect (Viegas Wattenberg Kushal 2004) whereby material added by early edits are less likely to be removed. We suggest that the sizable number of errors added by early edits is simply a result of more material being added near the beginning of the life of the article. Overall the results do not provide support for the idea of trusting surviving segments attributed to older edits because such edits tend to add more material and hence contain more errors which do not seem to be offset by greater opportunities for error correction by later edits.ities for error correction by later edits.
Added by wikilit teamAdded on initial load +
Collected data time dimensionLongitudinal +
Comments"A significant number of error edits occur in the very first few edits [and] roughly 20% of error edits appeared on the first day and on the first edit."
ConclusionIn this article we analysed the distributiIn this article we analysed the distribution of error by survival time in terms of edits and time. The general findings are that a significant number of error edits occur in the very first few edits. Roughly 20% of error edits appeared on the first day and on the first edit.

The ANOVA tests of the means of the error rates of three sections (divided by edit counts) show that there is no significant difference between them and so classifying edit age by edit counts does not seem useful. Analysing the error edits in terms of survival time shows that things are not as bleak as they seem, and that in terms of survival time scaled by the age of the article, many more error edits appear in the last third mostly due to the fact that more edits were made in a shorter span of time as Wikipedia's popularity grew. However, this still doesn't give much support to the feasibility of trusting edit contributions based on survival time. The main problem is that a fifth of the errors are attributable to the first error edit so that any measure of trustworthiness based on chronological order of edits (whether by time, edits, or editors) will not be able to filter out this sizeable number of errors.

It should be noted that this conclusion is particularly significant in light of the fact that the method used has a built-in bias for locating error edits later rather than earlier. First, in the case of disagreements, we use the “later” model rather than “earlier” model. Second, our method of locating the error edit is based on finding errors from the most recent edits to the earlier edit. Although reasonable care was taken to ensure that we located the edit that added the substance of the error rather than the exact wording, it is likely that in some cases we settled for an edit too soon, missing an earlier edit from a rewrite that actually implied or made the same erroneous point in a different way.

The findings of this article have serious implications for the Wikipedia community. It suggests that authors and editors need to become more familiar with the possible existence of first-mover effects in entries and, as a result, pay more attention to critically editing the existing core material rather than blindly adding more. For users of Wikipedia, it suggests that age-based mechanisms for coding trustworthiness will not be capable of elevating Wikipedia's status as a reference tool. For the foreseeable future, users will have to continue to follow the maxim “caveat emptor” in their dealings with this online encyclopaedia.
r dealings with this online encyclopaedia.
Data sourceWikipedia pages +
Doi10.1002/asi.20755 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Improving%2BWikipedia%27s%2Baccuracy%3A%2Bis%2Bedit%2Bage%2Ba%2Bsolution%3F%22 +
Has authorBrendan Luyt +, Tay Chee Hsien Aaron +, Lim Hai Thian + and Cheng Kian Hong +
Has domainInformation science +
Has topicReliability +
Issue2 +
Pages318-330 +
Peer reviewedYes +
Publication typeJournal article +
Published inJournal of the American Society for Information Science and Technology +
Research designStatistical analysis +
Research questionsUsing data collected as part of a wider stUsing data collected as part of a wider study published in Nature, this article examines the distribution of errors throughout the life of a select group of Wikipedia articles. The survival time of each “error edit” in terms of the edit counts and days was calculated and the hypothesis that surviving material added by older edits is more trustworthy was tested.lder edits is more trustworthy was tested.
Revid10,814 +
TheoriesUndetermined
Theory typeAnalysis +
TitleImproving Wikipedia's accuracy: is edit age a solution?
Unit of analysisEdit +
Urlhttp://dx.doi.org/10.1002/asi.20755 +
Volume59 +
Wikipedia coverageMain topic +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2008 +