Categorising social tags to improve folksonomy-based recommendations

From WikiLit
Jump to: navigation, search
Publication (help)
Categorising social tags to improve folksonomy-based recommendations
Authors: Iván Cantador, Ioannis Konstas, Joemon M. Jose [edit item]
Citation: Journal of Web Semantics  : . 2011.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Categorising social tags to improve folksonomy-based recommendations is a publication by Iván Cantador, Ioannis Konstas, Joemon M. Jose.


[edit] Abstract

In social tagging systems, users have different purposes when they annotate items. Tags not only depict the content of the annotated items, for example by listing the objects that appear in a photo, or express contextual information about the items, for example by providing the location or the time in which a photo was taken, but also describe subjective qualities and opinions about the items, or can be related to organisational aspects, such as self-references and personal tasks.

Current folksonomy-based search and recommendation models exploit the social tag space as a whole to retrieve those items relevant to a tag-based query or user profile, and do not take into consideration the purposes of tags. We hypothesise that a significant percentage of tags are noisy for content retrieval, and believe that the distinction of the personal intentions underlying the tags may be beneficial to improve the accuracy of search and recommendation processes.

We present a mechanism to automatically filter and classify raw tags in a set of purpose-oriented categories. Our approach finds the underlying meanings (concepts) of the tags, mapping them to semantic entities belonging to external knowledge bases, namely WordNet and Wikipedia, through the exploitation of ontologies created within the W3C Linking Open Data initiative. The obtained concepts are then transformed into semantic classes that can be uniquely assigned to content- and context-based categories. The identification of subjective and organisational tags is based on natural language processing heuristics.

We collected a representative dataset from Flickr social tagging system, and conducted an empirical study to categorise real tagging data, and evaluate whether the resultant tags categories really benefit a recommendation model using the Random Walk with Restarts method. The results show that content- and context-based tags are considered superior to subjective and organisational tags, achieving equivalent performance to using the whole tag space.

[edit] Research questions

"We present a mechanism to automatically filter and classify raw tags in a set of purpose-oriented categories. Our approach finds the underlying meanings (concepts) of the tags, mapping them to semantic entities belonging to external knowledge bases, namely WordNet and Wikipedia, through the exploitation of ontologies created within the W3C Linking Open Data initiative. The obtained concepts are then transformed into semantic classes that can be uniquely assigned to content- and context-based categories. The identification of subjective and organisational tags is based on natural language processing heuristics."

Research details

Topics: Ontology building [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Measuring the relatedness of two nodes in the graph can be

achieved using the Random Walks with Restarts (RWR) theory (L. Lovasz, 1996)" [edit item]

Research design: Experiment [edit item]
Data source: Experiment responses, Websites, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Secondary dataset [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"Analysing our categorisation results, we found that, in most of the cases, ambiguities occurred with social tags classified into both content and context categories, especially in those cases where the social tags corresponded to locations. Thus, although it would be convenient to correctly disambiguate and classify such tags, the results obtained with our recommendation model are still valid as its most accurate recommendations were obtained exploiting content- and context-based tags. Ambiguities in subjective and organisational tags may occur but their influence in the recommendations is relatively much lower. Nonetheless, for recommendation purposes, we find very interesting the possibility of exploring sentiment analysis approaches to enhance our subjective and organisational tag categorisation strategy based on regular expressions. As discussed in the paper, theremayexist incorrect tag assignments to subjective subcategories. For example, the tag bad hotel is categorised by our approach as a “quality” tag as it satisfies the [*<adjective><noun>*] regular expression, whereas it should be categorised as an “opinion” tag."

[edit] Comments

"Websites (Flickr, WOrdnet) Wikipedia articles"


Further notes[edit]

Facts about "Categorising social tags to improve folksonomy-based recommendations"RDF feed
AbstractIn social tagging systems, users have diffIn social tagging systems, users have different purposes when they annotate items. Tags not only depict the content of the annotated items, for example by listing the objects that appear in a photo, or express contextual information about the items, for example by providing the location or the time in which a photo was taken, but also describe subjective qualities and opinions about the items, or can be related to organisational aspects, such as self-references and personal tasks.

Current folksonomy-based search and recommendation models exploit the social tag space as a whole to retrieve those items relevant to a tag-based query or user profile, and do not take into consideration the purposes of tags. We hypothesise that a significant percentage of tags are noisy for content retrieval, and believe that the distinction of the personal intentions underlying the tags may be beneficial to improve the accuracy of search and recommendation processes.

We present a mechanism to automatically filter and classify raw tags in a set of purpose-oriented categories. Our approach finds the underlying meanings (concepts) of the tags, mapping them to semantic entities belonging to external knowledge bases, namely WordNet and Wikipedia, through the exploitation of ontologies created within the W3C Linking Open Data initiative. The obtained concepts are then transformed into semantic classes that can be uniquely assigned to content- and context-based categories. The identification of subjective and organisational tags is based on natural language processing heuristics.

We collected a representative dataset from Flickr social tagging system, and conducted an empirical study to categorise real tagging data, and evaluate whether the resultant tags categories really benefit a recommendation model using the Random Walk with Restarts method. The results show that content- and context-based tags are considered superior to subjective and organisational tags, achieving equivalent performance to using the whole tag space.
performance to using the whole tag space.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsWebsites (Flickr, WOrdnet) Wikipedia articles
ConclusionAnalysing our categorisation results, we fAnalysing our categorisation results, we found that, in most

of the cases, ambiguities occurred with social tags classified into both content and context categories, especially in those cases where the social tags corresponded to locations. Thus, although it would be convenient to correctly disambiguate and classify such tags, the results obtained with our recommendation model are still valid as its most accurate recommendations were obtained exploiting content- and context-based tags. Ambiguities in subjective and organisational tags may occur but their influence in the recommendations is relatively much lower. Nonetheless, for recommendation purposes, we find very interesting the possibility of exploring sentiment analysis approaches to enhance our subjective and organisational tag categorisation strategy based on regular expressions. As discussed in the paper, theremayexist incorrect tag assignments to subjective subcategories. For example, the tag bad hotel is categorised by our approach as a “quality” tag as it satisfies the [*<adjective><noun>*] regular expression, whereas it

should be categorised as an “opinion” tag.
should be categorised as an “opinion” tag.
Data sourceExperiment responses +, Websites + and Wikipedia pages +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Categorising%2Bsocial%2Btags%2Bto%2Bimprove%2Bfolksonomy-based%2Brecommendations%22 +
Has authorIván Cantador +, Ioannis Konstas + and Joemon M. Jose +
Has domainComputer science +
Has topicOntology building +
Peer reviewedYes +
Publication typeJournal article +
Published inJournal of Web Semantics +
Research designExperiment +
Research questionsWe present a mechanism to automatically fiWe present a mechanism to automatically filter and classify raw tags in a set of purpose-oriented categories.

Our approach finds the underlying meanings (concepts) of the tags, mapping them to semantic entities belonging to external knowledge bases, namely WordNet and Wikipedia, through the exploitation of ontologies created within the W3C Linking Open Data initiative. The obtained concepts are then transformed into semantic classes that can be uniquely assigned to content- and context-based categories. The identification of subjective and organisational tags is based on natural language processing heuristics.on natural language processing

heuristics.
Revid10,691 +
TheoriesMeasuring the relatedness of two nodes in the graph can be achieved using the Random Walks with Restarts (RWR) theory (L. Lovasz, 1996)
Theory typeDesign and action +
TitleCategorising social tags to improve folksonomy-based recommendations
Unit of analysisArticle +
Urlhttp://www.sciencedirect.com/science/article/pii/S1570826810000685 +
Wikipedia coverageSample data +
Wikipedia data extractionSecondary dataset +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2011 +