Automatic vandalism detection in Wikipedia

From WikiLit
Jump to: navigation, search
Publication (help)
Automatic vandalism detection in Wikipedia
Authors: Martin Potthast, Benno Stein, Robert Gerling [edit item]
Citation: European Conference on Information Retrieval  : 663-668. 2008. Berlin, Heidelberg. Springer-Verlag.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Automatic vandalism detection in Wikipedia is a publication by Martin Potthast, Benno Stein, Robert Gerling.


[edit] Abstract

We present results of a new approach to detect destructive article revisions, so-called vandalism, inWikipedia. Vandalism detection is a one-class classification problem, where vandalism edits are the target to be identified among all revisions. Interestingly, vandalism detection has not been addressed in the Information Retrieval literature by now. In this paper we discuss the characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task. We compiled a large number of vandalism edits in a corpus, which allows for the comparison of existing and new detection approaches. Using logistic regression we achieve 83% precision at 77% recall with our model. Compared to the rule-based methods that are currently applied in Wikipedia, our approach increases the F-Measure performance by 49% while being faster at the same time.

[edit] Research questions

"In this paper we discuss the characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task. We compiled a large number of vandalism edits in a corpus, which allows for the comparison of existing and new detection approaches."

Research details

Topics: Vandalism [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Main topic [edit item]
Theories: "Undetermined" [edit item]
Research design: Statistical analysis [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Longitudinal [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Secondary dataset [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"Potthast et al. **** presented a new approach to detect Vandalism in Wikipedia based on Logistic Regression, a machine learning classification. algorithm. The classification task is accomplished based on various features extracted to quantify the characteristics of Vandalism in Wikipedia articles. These features include term frequency, character distribution, edit anonymity. This approach achieved 83% precision at 77% recall."

[edit] Comments


Further notes[edit]

Facts about "Automatic vandalism detection in Wikipedia"RDF feed
AbstractWe present results of a new approach to deWe present results of a new approach to detect destructive article revisions, so-called vandalism, inWikipedia. Vandalism detection is a one-class classification problem, where vandalism edits are the target to be identified among all revisions. Interestingly, vandalism detection has not been addressed in the Information Retrieval literature by now. In this paper we discuss the characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task. We compiled a large number of vandalism edits in a corpus, which allows for the comparison of existing and new detection approaches. Using logistic regression we achieve 83% precision at 77% recall with our model. Compared to the rule-based methods that are currently applied in Wikipedia, our approach increases the F-Measure performance by 49% while being faster at the same time.y 49% while being faster at the same time.
Added by wikilit teamAdded on initial load +
Collected data time dimensionLongitudinal +
ConclusionPotthast et al. **** presented a new approPotthast et al. **** presented a new approach to detect Vandalism in Wikipedia based on Logistic Regression, a machine learning classification. algorithm. The classification task is accomplished based on various features extracted to quantify the characteristics of Vandalism in Wikipedia articles. These features include term frequency, character distribution, edit anonymity. This approach achieved 83% precision at 77% recall.oach achieved 83% precision at 77% recall.
Conference locationBerlin, Heidelberg +
Data sourceWikipedia pages +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Automatic%2Bvandalism%2Bdetection%2Bin%2BWikipedia%22 +
Has authorMartin Potthast +, Benno Stein + and Robert Gerling +
Has domainComputer science +
Has topicVandalism +
Pages663-668 +
Peer reviewedYes +
Publication typeConference paper +
Published inEuropean Conference on Information Retrieval +
PublisherSpringer-Verlag +
Research designStatistical analysis +
Research questionsIn this paper we discuss the characteristiIn this paper we discuss the characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task. We compiled a large number of vandalism edits in a corpus, which allows for the comparison of existing and new detection approaches. of existing and new detection approaches.
Revid10,671 +
TheoriesUndetermined
Theory typeDesign and action +
TitleAutomatic vandalism detection in Wikipedia
Unit of analysisArticle +
Urlhttp://www.springerlink.com/content/a457383n01w44653/ +
Wikipedia coverageMain topic +
Wikipedia data extractionSecondary dataset +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2008 +