Mining domain-specific thesauri from Wikipedia: a case study

From WikiLit
Jump to: navigation, search
Publication (help)
Mining domain-specific thesauri from Wikipedia: a case study
Authors: David N. Milne, Olena Medelyan, Ian H. Witten [edit item]
Citation: WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence  : 442-448. 2006.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: 10.1109/WI.2006.119.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Mining domain-specific thesauri from Wikipedia: a case study is a publication by David N. Milne, Olena Medelyan, Ian H. Witten.


[edit] Abstract

Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts.

[edit] Research questions

"Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia."

Research details

Topics: Comprehensiveness, Data mining [edit item]
Domains: Computer science, Library science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Main topic [edit item]
Theories: "Undetermined" [edit item]
Research design: Statistical analysis, Other [edit item]
Data source: Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Website [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts."

[edit] Comments

""In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts.""


Further notes[edit]

Facts about "Mining domain-specific thesauri from Wikipedia: a case study"RDF feed
AbstractDomain-specific thesauri are high-cost, hiDomain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts.stakingly-constructed manual counterparts.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
Comments"In a comparison with a professional thesa"In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts."takingly-constructed manual counterparts."
ConclusionIn a comparison with a professional thesauIn a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts.stakingly-constructed manual counterparts.
Data sourceWikipedia pages +
Doi10.1109/WI.2006.119 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Mining%2Bdomain-specific%2Bthesauri%2Bfrom%2BWikipedia%3A%2Ba%2Bcase%2Bstudy%22 +
Has authorDavid N. Milne +, Olena Medelyan + and Ian H. Witten +
Has domainComputer science + and Library science +
Has topicComprehensiveness + and Data mining +
Pages442-448 +
Peer reviewedYes +
Publication typeConference paper +
Published inWI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence +
Research designStatistical analysis + and Other +
Research questionsDomain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia.
Revid10,872 +
TheoriesUndetermined
Theory typeDesign and action +
TitleMining domain-specific thesauri from Wikipedia: a case study
Unit of analysisWebsite +
Urlhttp://dl.acm.org/citation.cfm?id=1249168 +
Wikipedia coverageMain topic +
Wikipedia data extractionDump +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2006 +