A practical approach to language complexity: a Wikipedia case study

From WikiLit
Jump to: navigation, search
Publication (help)
A practical approach to language complexity: a Wikipedia case study
Authors: Taha Yasseri, András Kornai, János Kertész [edit item]
Citation: PLoS ONE  : . 2012.
Publication type: Journal article
Peer-reviewed:
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s):
Added by Wikilit team: No
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
A practical approach to language complexity: a Wikipedia case study is a publication by Taha Yasseri, András Kornai, János Kertész.


[edit] Abstract

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, e.g. that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.

[edit] Research questions

Research details

Topics: Missing topics [edit item]
Domains: Missing domains [edit item]
Theory type: Missing theory_type [edit item]
Wikipedia coverage: [edit item]
Theories: [edit item]
Research design: [edit item]
Data source: [edit item]
Collected data time dimension: [edit item]
Unit of analysis: Missing unit_of_analysis [edit item]
Wikipedia data extraction: Missing wikipedia_data_extraction [edit item]
Wikipedia page type: Missing wikipedia_page_type [edit item]
Wikipedia language: Missing wikipedia_language [edit item]

[edit] Conclusion

[edit] Comments


Further notes[edit]