Browse wiki

Jump to: navigation, search
Clustering short texts using Wikipedia
Abstract Subscribers to the popular news or blog feSubscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation. conventional bag of words representation.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Conclusion Giving greater importance to the title of Giving greater importance to the title of a news article is beneficial. We obtained better results by doubling the weights of the concepts retrieved by the title query string. A new representation of the given article was generated by augmenting this vector to the term frequency vector constructed by the first method. We refer to this representation method as Wiki_Method...We have proposed a method of improving the accuracy of clustering short text items using Wikipedia as an additional knowledge source. Our experiment shows that this method can substantially improve clustering accuracy. The results obtained here also corroborate the recent findings that world knowledge can help in the different information retrieval tasks.the different information retrieval tasks.
Conference location Amsterdam, Netherlands +
Data source Experiment responses  + , Websites  + , Wikipedia pages  +
Dates 23-27 +
Doi 10.1145/1277741.1277909 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Clustering%2Bshort%2Btexts%2Busing%2BWikipedia%22  +
Has author Somnath Banerjee + , Krishnan Ramanathan + , Ajay Gupta +
Has domain Computer science +
Has topic Ranking and clustering systems +
Month July  +
Pages 787-788  +
Peer reviewed Yes  +
Publication type Conference paper  +
Published in SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval +
Publisher Association for Computing Machinery +
Research design Design science  + , Experiment  +
Research questions Clustering items at the feed reader end isClustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia.n with additional features from Wikipedia.
Revid 10,696  +
Theories Undetermined
Theory type Design and action  +
Title Clustering short texts using Wikipedia
Unit of analysis Article  +
Url http://dl.acm.org/citation.cfm?id=1277741.1277909  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Dump  +
Wikipedia language English  +
Wikipedia page type Article  +
Year 2007  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:24:57  +
Categories Ranking and clustering systems  + , Computer science  + , Publications with missing comments  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:21:43  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.