Browse wiki

Jump to: navigation, search
Why finding entities in Wikipedia is difficult, sometimes
Abstract Entity Retrieval (ER)--in comparison to clEntity Retrieval (ER)--in comparison to classical search--aims at finding individual entities instead of relevant documents. Finding a list of entities requires therefore techniques different to classical search engines. In this paper, we present a model to describe entities more formally and how an ER system can be build on top of it. We compare different approaches designed for finding entities in Wikipedia and report on results using standard test collections. An analysis of entity-centric queries reveals different aspects and problems related to ER and shows limitations of current systems performing ER with Wikipedia. It also indicates which approaches are suitable for which kinds of queries.s are suitable for which kinds of queries.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments "In this paper we presented a general model for ranking entities and we showed how the model can be applied to different real world scenarios." p. 564
Conclusion In this paper we presented a general modelIn this paper we presented a general model for ranking entities and we showed how the model can be applied to different real world scenarios. We described in detail a possible instantiation of the model and a set of algorithms designed for the Wikipedia dataset. We make use of the Wikipedia structure—page links and categories—and employ an accurate ontology to remove possible noise in Wikipedia category assignments. The results show that, in the used test collection, category assignments can be both very helpful for retrieval as well as misleading depending on the query syntax. We also employ several NLP techniques to transform the query and to fill the gaps between the query and the Wikipedia language models. We extract essential information (lexical expressions, key concepts, named entities) from the query, as well as expand the terms (by means of synonyms or related words) to find entities by specific spelling variants of their attributes. By combining several techniques we can achieve a relatively high effectiveness of the ER system; still, further improvement is possible by selectively applying the methods for different queries. The experimental evaluation of the ER algorithms has shown that by combining our approaches we achieve an average improvement of 24% in terms of xInfAP and of 30% in terms of P@10 on the XER task of the INEX-XER 2008 test collection. While the proposed techniques were designed for the ER task, experimental results for the list completion task are consistent. While more experimentation is needed to conclude that the proposed techniques perform well in general, we have shown how they improve effectiveness on the used test collection. We also saw that it might be possible to apply and/or combine different approaches depending on the query in order to maximize effectiveness—e.g., by using our methods we achieve an xInfAP value of over 0.7 for 20% of the queries of the used test collection and the mean xInfAP can be further boosted by 27% only by selecting the appropriate approach for each given topic. We leave as future work the research question of automatically selecting appropriate approaches for each query (e.g., by estimating the expected number of relevant results). We also point out that initial steps toward this goal have been done in Vercoustre et al. (2009) by applying machine learning techniques to predict query difficulty.ng techniques to predict query difficulty.
Data source Experiment responses  + , Wikipedia pages  +
Doi 10.1007/s10791-010-9135-7 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Why%2Bfinding%2Bentities%2Bin%2BWikipedia%2Bis%2Bdifficult%2C%2Bsometimes%22  +
Has author Gianluca Demartini + , Claudiu S. Firan + , Tereza Iofciu + , Ralf Krestel + , Wolfgang Nejdl +
Has domain Computer science +
Has topic Other information retrieval topics +
Issue 5  +
Month October  +
Pages 534  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in Information Retrieval +
Research design Experiment  +
Research questions we present a model to describe entities more formally and how an ER system can be build on top of it. We compare different approaches designed for finding entities in Wikipedia and report on results using standard test collections.
Revid 11,059  +
Theories Undetermined
Theory type Design and action  +
Title Why finding entities in Wikipedia is difficult, sometimes
Unit of analysis Article  +
Url http://proquest.umi.com/pqdweb?did=2152237371&Fmt=7&clientId=10306&RQT=309&VName=PQD  +
Volume 13  +
Wikipedia coverage Other  +
Wikipedia data extraction Dump  +
Wikipedia language English  +
Wikipedia page type Article  + , Information categorization and navigation  +
Year 2010  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:33:40  +
Categories Other information retrieval topics  + , Computer science  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:32:31  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.