|Categorising social tags to improve folksonomy-based recommendations|
|Authors:||Iván Cantador, Ioannis Konstas, Joemon M. Jose|
|Citation:||Journal of Web Semantics : . 2011.|
|Publication type:||Journal article|
|Google Scholar cites:||Citations|
|Added by Wikilit team:||Added on initial load|
|Article:||Google Scholar BASE PubMed|
|Other scholarly wikis:||AcaWiki Brede Wiki WikiPapers|
|Web search:||Bing Google Yahoo! — Google PDF|
In social tagging systems, users have different purposes when they annotate items. Tags not only depict the content of the annotated items, for example by listing the objects that appear in a photo, or express contextual information about the items, for example by providing the location or the time in which a photo was taken, but also describe subjective qualities and opinions about the items, or can be related to organisational aspects, such as self-references and personal tasks.
Current folksonomy-based search and recommendation models exploit the social tag space as a whole to retrieve those items relevant to a tag-based query or user profile, and do not take into consideration the purposes of tags. We hypothesise that a significant percentage of tags are noisy for content retrieval, and believe that the distinction of the personal intentions underlying the tags may be beneficial to improve the accuracy of search and recommendation processes.
We present a mechanism to automatically filter and classify raw tags in a set of purpose-oriented categories. Our approach finds the underlying meanings (concepts) of the tags, mapping them to semantic entities belonging to external knowledge bases, namely WordNet and Wikipedia, through the exploitation of ontologies created within the W3C Linking Open Data initiative. The obtained concepts are then transformed into semantic classes that can be uniquely assigned to content- and context-based categories. The identification of subjective and organisational tags is based on natural language processing heuristics.
We collected a representative dataset from Flickr social tagging system, and conducted an empirical study to categorise real tagging data, and evaluate whether the resultant tags categories really benefit a recommendation model using the Random Walk with Restarts method. The results show that content- and context-based tags are considered superior to subjective and organisational tags, achieving equivalent performance to using the whole tag space.
"We present a mechanism to automatically filter and classify raw tags in a set of purpose-oriented categories. Our approach finds the underlying meanings (concepts) of the tags, mapping them to semantic entities belonging to external knowledge bases, namely WordNet and Wikipedia, through the exploitation of ontologies created within the W3C Linking Open Data initiative. The obtained concepts are then transformed into semantic classes that can be uniquely assigned to content- and context-based categories. The identification of subjective and organisational tags is based on natural language processing heuristics."
|Theory type:||Design and action|
|Wikipedia coverage:||Sample data|
|Theories:||"Measuring the relatedness of two nodes in the graph can be
achieved using the Random Walks with Restarts (RWR) theory (L. Lovasz, 1996)"
|Data source:||Experiment responses, Websites, Wikipedia pages|
|Collected data time dimension:||Cross-sectional|
|Unit of analysis:||Article|
|Wikipedia data extraction:||Secondary dataset|
|Wikipedia page type:||Article|
"Analysing our categorisation results, we found that, in most of the cases, ambiguities occurred with social tags classified into both content and context categories, especially in those cases where the social tags corresponded to locations. Thus, although it would be convenient to correctly disambiguate and classify such tags, the results obtained with our recommendation model are still valid as its most accurate recommendations were obtained exploiting content- and context-based tags. Ambiguities in subjective and organisational tags may occur but their influence in the recommendations is relatively much lower. Nonetheless, for recommendation purposes, we find very interesting the possibility of exploring sentiment analysis approaches to enhance our subjective and organisational tag categorisation strategy based on regular expressions. As discussed in the paper, theremayexist incorrect tag assignments to subjective subcategories. For example, the tag bad hotel is categorised by our approach as a “quality” tag as it satisfies the [*<adjective><noun>*] regular expression, whereas it should be categorised as an “opinion” tag."
"Websites (Flickr, WOrdnet) Wikipedia articles"