Retrieval and feedback models for blog feed search
|Retrieval and feedback models for blog feed search|
|Authors:||Jonathan L. Elsas, Jaime Arguello, Jamie Callan, Jaime G. Carbonell|
|Citation:||SIGIR '08 Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval : 347-354. 2008 July 20-24. Singapore, Singapore. Association for Computing Machinery.|
|Publication type:||Conference paper|
|Google Scholar cites:||Citations|
|Added by Wikilit team:||Added on initial load|
|Article:||Google Scholar BASE PubMed|
|Other scholarly wikis:||AcaWiki Brede Wiki WikiPapers|
|Web search:||Bing Google Yahoo! — Google PDF|
Blog feed search poses different and interesting challenges from traditional ad hoc document retrieval. The units of retrieval, the blogs, are collections of documents, the blog posts. In this work we adapt a state-of-the-art federated search model to the feed retrieval task, showing a significant improvement over algorithms based on the best performing submissions in the TREC 2007 Blog Distillation task . We also show that typical query expansion techniques such as pseudo-relevance feedback using the blog corpus do not provide any significant performance improvement and in many cases dramatically hurt performance. We perform an in-depth analysis of the behavior of pseudorelevance feedback for this task and develop a novel query expansion technique using the link structure in Wikipedia. This query expansion technique provides significant and consistent performance improvements for this task, yielding a 22% and 14% improvement in MAP over the unexpanded query for our baseline and federated algorithms respectively.
"In this work we adapt a state-of-the-art federated search model to the feed retrieval task, showing a significant improvement over algorithms based on the best performing submissions in the TREC 2007 Blog Distillation task."
|Theory type:||Design and action|
|Wikipedia coverage:||Main topic|
|Data source:||Experiment responses, Wikipedia pages|
|Collected data time dimension:||Cross-sectional|
|Unit of analysis:||Article|
|Wikipedia data extraction:||Dump|
|Wikipedia page type:||Article|
"we presented an in-depth analysis of query expansion for blog feed retrieval. On this task, our novel Wikipedia link-based approach obtained a greater than 13% improvement over no expansion (across large and small document models) in terms of both MAP and P@10. Although this method did not generalize to the Terabyte Track ad hoc queries it does show promise for queries that represent more general information needs, similar to those typical of feed retrieval."
""our novel Wikipedia link-based approach obtained a greater than 13% improvement over no expansion (across large and small document models) in terms of both MAP and P@10" p. 354"