Harvesting wiki consensus: using Wikipedia entries as vocabulary for knowledge management

From WikiLit
Jump to: navigation, search
Publication (help)
Harvesting wiki consensus: using Wikipedia entries as vocabulary for knowledge management
Authors: Martin Hepp, Katharina Siorpaes, Daniel Bachlechner [edit item]
Citation: IEEE Internet Computing 11 (5): 54-65. 2007 October.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Yes
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Harvesting wiki consensus: using Wikipedia entries as vocabulary for knowledge management is a publication by Martin Hepp, Katharina Siorpaes, Daniel Bachlechner.


[edit] Abstract

Vocabularies that provide unique identifiers for conceptual elements of a domain can improve precision and recall in knowledge-management applications. Although creating and maintaining such vocabularies is generally hard, wiki users easily manage to develop comprehensive, informal definitions of terms, each one identified by a {URI.} Here, the authors show that the {URIs} of Wikipedia entries are reliable identifiers for conceptual entities. They also demonstrate how Wikipedia entries can be used for annotating Web resources and knowledge assets and give precise estimates of the amount of Wikipedia {URIs} in terms of the popular Proton ontology's top-level concepts.

[edit] Research questions

"Vocabularies that provide unique identifiers for conceptual elements of a domain can improve precision and recall in knowledge-management applications. Although creating and maintaining such vocabularies is generally hard, wiki users easily manage to develop comprehensive, informal definitions of terms, each one identified by a URI. Here, the authors show that the URIs of Wikipedia entries are reliable identifiers for conceptual entities. They also demonstrate how Wikipedia entries can be used for annotating Web resources and knowledge assets and give precise estimates of the amount of Wikipedia URIs in terms of the popular Proton ontology’s top-level concepts."

Research details

Topics: Ontology building [edit item]
Domains: Computer science, Knowledge management [edit item]
Theory type: Analysis [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Statistical analysis [edit item]
Data source: N/A [edit item]
Collected data time dimension: N/A [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"The analysis of Wikipedia entries reveals two interesting results. First, despite the unsupervised, community- driven editing process, the conceptual entity associated with Wikipedia URIs rarely changes. Second, among the 1.5 million entries are very substantial amounts of concepts that are relevant for annotating Web resources, such as popular actors, research fields, cities, or universities. The analysis of Wikipedia entries’ ontological nature shows that the majority of URIs in our sample (87 percent) denote instances or subconcepts to the Proton top-level category object. This is defined as “entities that could be claimed to exist” (see http://proton.semanticweb.org). Nine percent are some sort of abstract, and 4 percent are classified as a happening. Figure 3a illustrates the proportion of entries in each main Proton category. The breakdown of Wikipedia entries that fall into the protont.Object branch is very interesting. Figure 3b shows the proportions in the sample. (For statistical reasons, the point estimates for the population don’t necessarily add up to 100 percent, which is why we based Figure 3 on the sample proportions and not on the population estimates from Table 3.) We always assigned each Wikipedia entry to the most specific subclass of protont.Object. This means that agent here counts only those conceptual entities for which no more specific subclass of protont.Object.Agent exists. We can see that the majority of the URIs denote people (23 percent), locations (23 percent), organizations (13 percent), product types (13 percent), and groups (6 percent). Our analysis shows that for the vast majority of Wikipedia entries, a community consensus exists about the URIs’ meaning from the very first to the most recent version. In other words, open communities seem able to achieve consensus about named conceptual entities as very lightweight ontological agreements in an unsupervised fash- 62 www. ion and relying only on the known mechanisms of standard wiki software to prevent destructive changes. We assume that the ease of access and using complementing multimedia elements for conceptualizing an entry are important factors in this process."

[edit] Comments

"We demonstrate how Wikipedia entries can be used for annotating Web resources and knowledge assets and give precise estimates of the amount of Wikipedia URIs in terms of the popular Proton ontology’s top-level concepts."


Further notes[edit]

Facts about "Harvesting wiki consensus: using Wikipedia entries as vocabulary for knowledge management"RDF feed
AbstractVocabularies that provide unique identifieVocabularies that provide unique identifiers for conceptual elements of a domain can improve precision and recall in knowledge-management applications. Although creating and maintaining such vocabularies is generally hard, wiki users easily manage to develop comprehensive, informal definitions of terms, each one identified by a {URI.} Here, the authors show that the {URIs} of Wikipedia entries are reliable identifiers for conceptual entities. They also demonstrate how Wikipedia entries can be used for annotating Web resources and knowledge assets and give precise estimates of the amount of Wikipedia {URIs} in terms of the popular Proton ontology's top-level concepts.ular Proton ontology's top-level concepts.
Added by wikilit teamYes +
Collected data time dimensionN/A +
CommentsWe demonstrate how Wikipedia entries can be used for annotating Web resources and knowledge assets and give precise estimates of the amount of Wikipedia URIs in terms of the popular Proton ontology’s top-level concepts.
ConclusionThe analysis of Wikipedia entries reveals The analysis of Wikipedia entries reveals two interesting

results. First, despite the unsupervised, community- driven editing process, the conceptual entity associated with Wikipedia URIs rarely changes. Second, among the 1.5 million entries are very substantial amounts of concepts that are relevant for annotating Web resources, such as popular actors, research fields, cities, or universities. The analysis of Wikipedia entries’ ontological nature shows that the majority of URIs in our sample (87 percent) denote instances or subconcepts to the Proton top-level category object. This is defined as “entities that could be claimed to exist” (see http://proton.semanticweb.org). Nine percent are some sort of abstract, and 4 percent are classified as a happening. Figure 3a illustrates the proportion of entries in each main Proton category. The breakdown of Wikipedia entries that fall into the protont.Object branch is very interesting. Figure 3b shows the proportions in the sample. (For statistical reasons, the point estimates for the population don’t necessarily add up to 100 percent, which is why we based Figure 3 on the sample proportions and not on the population estimates from Table 3.) We always assigned each Wikipedia entry to the most specific subclass of protont.Object. This means that agent here counts only those conceptual entities for which no more specific subclass of protont.Object.Agent exists. We can see that the majority of the URIs denote people (23 percent), locations (23 percent), organizations (13 percent), product types (13 percent), and groups (6 percent). Our analysis shows that for the vast majority of Wikipedia entries, a community consensus exists about the URIs’ meaning from the very first to the most recent version. In other words, open communities seem able to achieve consensus about named conceptual entities as very lightweight ontological agreements in an unsupervised fash- 62 www. ion and relying only on the known mechanisms of standard wiki software to prevent destructive changes. We assume that the ease of access and using complementing multimedia elements for conceptualizing an entry are important factors in this process.try are important factors in

this process.
Data sourceN/A +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Harvesting%2Bwiki%2Bconsensus%3A%2Busing%2BWikipedia%2Bentries%2Bas%2Bvocabulary%2Bfor%2Bknowledge%2Bmanagement%22 +
Has authorMartin Hepp +, Katharina Siorpaes + and Daniel Bachlechner +
Has domainComputer science + and Knowledge management +
Has topicOntology building +
Issue5 +
MonthOctober +
Pages54-65 +
Peer reviewedYes +
Publication typeJournal article +
Published inIEEE Internet Computing +
Research designStatistical analysis +
Research questionsVocabularies that provide unique identifieVocabularies that provide unique identifiers for conceptual elements of a domain

can improve precision and recall in knowledge-management applications. Although creating and maintaining such vocabularies is generally hard, wiki users easily manage to develop comprehensive, informal definitions of terms, each one identified by a URI. Here, the authors show that the URIs of Wikipedia entries are reliable identifiers for conceptual entities. They also demonstrate how Wikipedia entries can be used for annotating Web resources and knowledge assets and give precise estimates of the amount of Wikipedia URIs in terms of

the popular Proton ontology’s top-level concepts.
ular Proton ontology’s top-level concepts.
Revid10,801 +
TheoriesUndetermined
Theory typeAnalysis +
TitleHarvesting wiki consensus: using Wikipedia entries as vocabulary for knowledge management
Unit of analysisArticle +
Urlhttp://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4305569&tag=1 +
Volume11 +
Wikipedia coverageSample data +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2007 +