Classifying tags using open content resources

From WikiLit
Jump to: navigation, search
Publication (help)
Classifying tags using open content resources
Authors: Simon Overell, Börkur Sigurbjörnsson, Roelof Van Zwol [edit item]
Citation: WSDM '09 Proceedings of the Second ACM International Conference on Web Search and Data Mining  : 64-73. 2009 9-12. Barcelona, Spain. Association for Computing Machinery.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: 10.1145/1498759.1498810.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Classifying tags using open content resources is a publication by Simon Overell, Börkur Sigurbjörnsson, Roelof Van Zwol.


[edit] Abstract

Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet baseline our method increases the coverage of the Flickr vocabulary by 115%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geo-caching and wii.

[edit] Research questions

"In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classi cation schema and ground truth."

Research details

Topics: Text classification [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Websites, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

[edit] Comments


Further notes[edit]

Facts about "Classifying tags using open content resources"RDF feed
AbstractTagging has emerged as a popular means to Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet baseline our method increases the coverage of the Flickr vocabulary by 115%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geo-caching and wii.g Island, Ronaldinho, geo-caching and wii.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
Conference locationBarcelona, Spain +
Data sourceExperiment responses +, Websites + and Wikipedia pages +
Dates9-12 +
Doi10.1145/1498759.1498810 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Classifying%2Btags%2Busing%2Bopen%2Bcontent%2Bresources%22 +
Has authorSimon Overell +, Börkur Sigurbjörnsson + and Roelof Van Zwol +
Has domainComputer science +
Has topicText classification +
Pages64-73 +
Peer reviewedYes +
Publication typeConference paper +
Published inWSDM '09 Proceedings of the Second ACM International Conference on Web Search and Data Mining +
PublisherAssociation for Computing Machinery +
Research designExperiment +
Research questionsIn this paper we present a generic method

In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classi cation schema and ground truth.our classi cation

schema and ground truth.
Revid10,694 +
TheoriesUndetermined
Theory typeDesign and action +
TitleClassifying tags using open content resources
Unit of analysisArticle +
Urlhttp://dl.acm.org/citation.cfm?id=1498810 +
Wikipedia coverageSample data +
Wikipedia data extractionDump +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2009 +