Clustering short texts using Wikipedia

From WikiLit
Jump to: navigation, search
Publication (help)
Clustering short texts using Wikipedia
Authors: Somnath Banerjee, Krishnan Ramanathan, Ajay Gupta [edit item]
Citation: SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval  : 787-788. 2007 July 23-27. Amsterdam, Netherlands. Association for Computing Machinery.
Publication type: Conference paper
Peer-reviewed: Yes
Database(s):
DOI: 10.1145/1277741.1277909.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Clustering short texts using Wikipedia is a publication by Somnath Banerjee, Krishnan Ramanathan, Ajay Gupta.


[edit] Abstract

Subscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation.

[edit] Research questions

"Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia."

Research details

Topics: Ranking and clustering systems [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Design science, Experiment [edit item]
Data source: Experiment responses, Websites, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Dump [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English [edit item]

[edit] Conclusion

"Giving greater importance to the title of a news article is beneficial. We obtained better results by doubling the weights of the concepts retrieved by the title query string. A new representation of the given article was generated by augmenting this vector to the term frequency vector constructed by the first method. We refer to this representation method as Wiki_Method...We have proposed a method of improving the accuracy of clustering short text items using Wikipedia as an additional knowledge source. Our experiment shows that this method can substantially improve clustering accuracy. The results obtained here also corroborate the recent findings that world knowledge can help in the different information retrieval tasks."

[edit] Comments


Further notes[edit]

Facts about "Clustering short texts using Wikipedia"RDF feed
AbstractSubscribers to the popular news or blog feSubscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation. conventional bag of words representation.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
ConclusionGiving greater importance to the title of Giving greater importance to the title of a news article is beneficial. We obtained better results by doubling the weights of the concepts retrieved by the title query string. A new representation of the given article was generated by augmenting this vector to the term frequency vector constructed by the first method. We refer to this representation method as Wiki_Method...We have proposed a method of improving the accuracy of clustering short text items using Wikipedia as an additional knowledge source. Our experiment shows that this method can substantially improve clustering accuracy. The results obtained here also corroborate the recent findings that world knowledge can help in the different information retrieval tasks.the different information retrieval tasks.
Conference locationAmsterdam, Netherlands +
Data sourceExperiment responses +, Websites + and Wikipedia pages +
Dates23-27 +
Doi10.1145/1277741.1277909 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Clustering%2Bshort%2Btexts%2Busing%2BWikipedia%22 +
Has authorSomnath Banerjee +, Krishnan Ramanathan + and Ajay Gupta +
Has domainComputer science +
Has topicRanking and clustering systems +
MonthJuly +
Pages787-788 +
Peer reviewedYes +
Publication typeConference paper +
Published inSIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval +
PublisherAssociation for Computing Machinery +
Research designDesign science + and Experiment +
Research questionsClustering items at the feed reader end isClustering items at the feed reader end is a challenging task as

usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their

representation with additional features from Wikipedia.
n with additional features from Wikipedia.
Revid10,696 +
TheoriesUndetermined
Theory typeDesign and action +
TitleClustering short texts using Wikipedia
Unit of analysisArticle +
Urlhttp://dl.acm.org/citation.cfm?id=1277741.1277909 +
Wikipedia coverageSample data +
Wikipedia data extractionDump +
Wikipedia languageEnglish +
Wikipedia page typeArticle +
Year2007 +