Browse wiki

Jump to: navigation, search
An empirical study of the effects of NLP components on geographic IR performance
Abstract Natural language processing (NLP) techniquNatural language processing (NLP) techniques, such as toponym detection and resolution, are an integral part of most geographic information retrieval (GIR) architectures. Without these components, synonym detection, ambiguity resolution and accurate toponym expansion would not be possible. However, there are many important factors affecting the success of an NLP approach to GIR, including toponym detection errors, toponym resolution errors and query overloading. The aim of this paper is to determine how severe these errors are in state-of-the-art systems, and to what extent they affect GIR performance. We show that a careful choice of weighting schemes in the IR engine can minimize the negative impact of these errors on GIR accuracy. We provide empirical evidence from the GeoCLEF 2005 and 2006 datasets to support our observations.2006 datasets to support our observations.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Comments wikipedia pages websites (www.geonames.org)
Conclusion Table 7 shows the MAP scores for these runTable 7 shows the MAP scores for these runs on the annotated collections, using the title and description fields in the GeoCLEF 2005 and 2006 queries. The most impressive result from this table, is the consistent effectiveness of the query normalisation strategy. On all annotated collections and GeoCLEF queries (2005/2006) the average MAP scores of the geo run exceed the corresponding geo_nonorm scores. In addition, the MAP score range of the geo only varies between 0.32 and 0.36 (with the exception of disambiguation accuracy of 0%), which shows how effective the normalisation strategy is at reducing the negative effects of the NLP errors on retrieval performance. The MAP scores in Table 7 also provide us with a means of analysing the effect of NLP errors on GIR performance. However, since the geo and geo_nonorm runs both mitigate NLP errors by including location text terms in their queries, will focus the rest of this discussion on the results of geo_notxt run as it is more sensitive to these errors. Hence, we can conclude that low NERC recall has a greater impact on retrieval effectiveness than low NERC precision does. However, the most significant finding of all our experiments is that a baseline IR system run on nonannotated data performs nearly as well as our top performing Geo run (OpenNLP) on the GeoCLEF 2005 and 2006 topics.nNLP) on the GeoCLEF 2005 and 2006 topics.
Data source Experiment responses  + , Websites  + , Wikipedia pages  +
Doi 10.1080/13658810701626210 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22An%2Bempirical%2Bstudy%2Bof%2Bthe%2Beffects%2Bof%2BNLP%2Bcomponents%2Bon%2Bgeographic%2BIR%2Bperformance%22  +
Has author Nicola Stokes + , Yi Li + , Alistair Moffat + , Jiawen Rong +
Has domain Computer science + , Geography +
Has topic Geographic information retrieval +
Issue 3  +
Pages 247-264  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in International Journal of Geographical Information Science +
Research design Experiment  +
Research questions However, there are many important factors However, there are many important factors affecting the success of an NLP approach to GIR, including toponym detection errors, toponym resolution errors, and query overloading. In this paper we examine these issues in detail and provide empirical evidence from the GeoCLEF 2005 and 2006 datasets to support our observations concerning them. We also propose a novel toponym resolution approach that leverages geospatial information from Wikipedia to improve performance, and show that it performs well on the type of data used in the GeoCLEF trials.e type of data used in the GeoCLEF trials.
Revid 10,658  +
Theories Undetermined
Theory type Analysis  + , Design and action  +
Title An empirical study of the effects of NLP components on geographic IR performance
Unit of analysis Article  +
Url http://www.tandfonline.com/doi/abs/10.1080/13658810701626210  +
Volume 22  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Live Wikipedia  +
Wikipedia language Not specified  +
Wikipedia page type Article  +
Year 2008  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:02:46  +
Categories Geographic information retrieval  + , Computer science  + , Geography  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:20:29  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.