Browse wiki

Jump to: navigation, search
Facet-based opinion retrieval from blogs
Abstract The paper presents methods of retrieving bThe paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the {Kullback-Leibler} divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurrences of query terms and subjective words in documents, and the third combines both factors. Methods of structuring queries into facets, facet expansion using Wikipedia, and a facet-based retrieval are also investigated in this work. The methods were evaluated using the TREC 2007 and 2008 Blog track topics, and proved to be highly effective.topics, and proved to be highly effective.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments In this paper new methods of retrieving doIn this paper new methods of retrieving documents containing opinions expressed about an entity or entities specified by the user in the query were proposed. Research design: now "Design science, Statistical analysis", but could also be "experiment" and statistical modeling". Table 1 and 2 displays results of an experiment with mathematical analysis, "Opinion-based document ranking methods" has some mathematical modeling "Data source" could apart from Wikipedia pages also be "Experiment responses". "Wikipedia data extraction": where is it written that this is the live/dump? It seems just to state "Wikipedia"./dump? It seems just to state "Wikipedia".
Conclusion In this paper new methods of retrieving doIn this paper new methods of retrieving documents containing opinions expressed about an entity or entities specified by the user in the query were proposed. The main stages of the proposed methods are as follows: (1) Collection pre-processing. Our experiments demonstrate that this stage has a significant impact on the effectiveness of document retrieval from blogs in terms of both topic- and opinion-relevance. The major performance-improving steps at this stage include the removal of HTML tags, scripts and style definitions, and all lines where the hyperlinks account for 50% or more of the words. (2) Query processing. A method of building faceted queries by utilising Wikipedia was developed. The method consisted of the following steps: identifying concepts in the topic titles by matching them to Wikipedia article titles, grouping concepts into facets, and expanding each facet with new concepts by using Wikipedia article redirects and valid abbreviations. The evaluation of different query processing levels demonstrates the merit of all three steps in query processing. Expanding queries using only target pages, redirected to from the Wikipedia page titles found in the query (“limitedQE-Wiki-phrases”) is better than expanding the query with other pages redirecting to the same targets (“fullQE-Wiki-phrases”). (3) Document retrieval. Retrieval of the initial document set using a topic-based ranking method, such as BM25. (4) Opinion-based document re-ranking. Three methods were proposed: - KLD-based method (KLD), using the Kullback-Leibler divergence scores of the subjective words in the windows around query term occurrences; - Proximity-based method (dist), using distances between a query term occurrence and each of the co-occurring subjective words; - A method combining the previous two (KLD+dist). In addition, all of these methods contain a Facet Distance component, which factors in the distance between query terms/phrases from different facets, and a Facet Validation component, which down-ranks documents that do not contain at least one concept from each facet. Evaluation demonstrates that the proposed methods are highly effective, and are among the best-performing methods developed by the Blog track 2007 and 2008 participants. Specifically, the proposed methods achieved the highest improvements over the standard baseline run provided by Blog 2008 organisers “Baseline 4” compared to other opinionfinding runs submitted by the participants. Series of experiments were conducted to determine the effect of the major components (FV, FD, KLD and dist) on performance. The results indicate that all components, in general, have a positive effect on performance. However, the proximity of query terms to subjective words does not always improve the performance when used in conjunction with KL divergence of subjective words. Specifically, “KLD+dist-FD-FV-subj-bm25” yielded lower MAPop and P10op than the run “KLD-FD-FV-subj-bm25” on limitedQE-Wiki-phrase baseline (Blog 2007 topics). “KLD+dist-FD-FV-subj-bm25” and “KLD+dist-FD-FV-subj-b4” on the other hand, yielded higher MAPop and R-precisionop on the other two baselines: limitedQE-Wiki-phrase (Blog 2008 topics) and Baseline 4. An analysis of the methods’ performance by topic categories based on the type of entity expressed in the query was performed. It was found that the methods are most effective in finding opinions about events, products, geographical locations and people. They were least effective in finding opinions about entities in the category “media/art”, which included TV shows, films and books, and in the category “miscellaneous”, which mostly contained abstract concepts. which mostly contained abstract concepts.
Data source Experiment responses  + , Wikipedia pages  +
Doi 10.1016/j.ipm.2009.06.005 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Facet-based%2Bopinion%2Bretrieval%2Bfrom%2Bblogs%22  +
Has author Olga Vechtomova +
Has domain Computer science +
Has topic Textual information retrieval +
Issue 1  +
Pages 71-88  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in Information Processing and Management +
Research design Design science  + , Experiment  + , Mathematical modeling  + , Statistical analysis  +
Research questions The paper presents methods of retrieving bThe paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the Kullback-Leibler divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurrences of query terms and subjective words in documents, and the third combines both factors. Methods of structuring queries into facets, facet expansion using Wikipedia, and a facet-based retrieval are also investigated in this work. The methods were evaluated using the TREC 2007 and 2008 Blog track topics, and proved to be highly effective.topics, and proved to be highly effective.
Revid 11,131  +
Theories The Kullback-Leibler divergence measures tThe Kullback-Leibler divergence measures the relative entropy between two probability distributions. It was defined in information theory (Losee, 1990) and used in many information retrieval and natural language processing tasks, for example, in query expansion following pseudo-relevance feedback (Carpineto et al., 2001).levance feedback (Carpineto et al., 2001).
Theory type Design and action  +
Title Facet-based opinion retrieval from blogs
Unit of analysis Article  +
Url http://dx.doi.org/10.1016/j.ipm.2009.06.005  +
Volume 46  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Dump  +
Wikipedia language Not specified  +
Wikipedia page type Article  +
Year 2010  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:28:22  +
Categories Textual information retrieval  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:53:51  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.