Browse wiki

Jump to: navigation, search
Exploiting Wikipedia and EuroWordNet to solve cross-lingual question answering
Abstract This paper describes a new advance in solvThis paper describes a new advance in solving Cross-Lingual Question Answering {(CL-QA)} tasks. It is built on three main pillars: (i) the use of several multilingual knowledge resources to reference words between languages (the Inter Lingual Index (ILI) module of EuroWordNet and the multilingual knowledge encoded in Wikipedia); (ii) the consideration of more than only one translation per word in order to search candidate answers; and (iii) the analysis of the question in the original language without any translation process. This novel approach overcomes the errors caused by the common use of Machine Translation (MT) services by CL-QA systems. We also expose some studies and experiments that justify the importance of analyzing whether a Named Entity should be translated or not. Experimental results in bilingual scenarios show that our approach performs better than an MT based CL-QA approach achieving an average improvement of 36.7%.achieving an average improvement of 36.7%.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Conclusion To sum up, this paper illustrates a new adTo sum up, this paper illustrates a new advance to solve CL–QA tasks. Specifically, we present a robust CL methodology (implemented in a CL–QA system called BRILIW) and its evaluation in English–Spanish scenarios. The main contributions of our research are listed below: • The use of several multilingual knowledge resources (Wikipedia and ILI) to reference words between languages (proposal (i)). Our hypothesis is that both resources contain complementary information and therefore a combination could achieve better CL–QA performance. Wikipedia multilingual knowledge is incorporated into our CL–QA system in the NET module. In contrast, the other multilingual resource (ILI) employed by the system is used for translating common nouns and verbs. • The consideration of more than only one translation per word in order to search candidate answers (proposal (ii)). Different to common MT based CL–QA systems, our proposal considers more than one translation per word by means of using the different synsets of each word in the ILI module of EWN. • The analysis of the question in its original language (proposal (iii)). Analyzing the question in its original language avoids lexical and syntactical noise that could be introduced to the system by wrong question translations. • A study justifying the need for correct translations of NEs has been presented, in which we observed that the percentage of questions with NEs is quite high (87.7% on average). Nearly half these entities should be translated (41.2% on average) since these NEs are differently named depending on the language. Obviously, the remaining percentage of NEs must be checked since the system does not know if the NEs should be translated or not. In our strategy, this control is developed by the NET module using Wikipedia where most of the NEs are present. • Three different experiments are detailed in order to present the evolution of our CL–QA approach. Furthermore, these experiments show the improvements obtained with each new proposal (adding internal bilingual dictionaries +9.3% and adding the NET module +22.2%). Exploiting Wikipedia to support this procedure ensures the up-to-date status of the module. To our knowledge, we have been the first research group to apply multilingual knowledge from Wikipedia within the CL–QA environment. • The evaluation of our CL–QA methodology is developed using the CLEF 2004, 2005 and 2006 sets of English and Spanish questions (1200 in total). The results obtained are very promising. BRILIW obtains an improvement of 36.7% compared to the MT based approach. Our technique for solving the CL–QA task increases the precision of the CL to the level of monolingual QA runs. Compared to other state-of-the-art CL–QA systems, our approach obtains better results. In fact, our average precision loss of CL with respect to the monolingual run is around 7.2% whereas in the English–Spanish QA task at CLEF 2006 [39] the precision of the English–Spanish CL–QA task was approximately 50% lower than for the monolingual Spanish task. • Our CL methodology is able to solve questions that cannot be solved by MT based CL–QA approaches. There are a lot of questions that neither MT based CL–QA system would be able to solve. With the aim of proving this last affirmation and the up-to-date and real characteristics of our strategy, we have tested the NET module with questions that contains the NEs involved in the Top Searches Google News 2001–2006 and the precision of translating NEs climbed to 100%. This last test of the NET module shows and demonstrates the correct process of NE translation which is carried out by our CL technique. Compared to on-line MT services (results down to 46.7% on average), our approach obtains much better results. Furthermore, we want to emphasize that in most of these questions, a MT based CL–QA system would not be able to search the correct answer, because the NE of the question is usually wrongly translated by the MT services.lly wrongly translated by the MT services.
Data source Archival records  + , Experiment responses  + , Wikipedia pages  +
Doi 10.1016/j.ins.2009.06.031 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Exploiting%2BWikipedia%2Band%2BEuroWordNet%2Bto%2Bsolve%2Bcross-lingual%2Bquestion%2Banswering%22  +
Has author Sergio Ferrandez + , Antonio Toral + , Oscar Ferrandez + , Antonio Ferrandez + , Rafael Munoz +
Has domain Computer science +
Has topic Cross-language information retrieval +
Issue 20  +
Pages 3473-3488  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in Information Sciences +
Research design Experiment  +
Research questions This paper describes a new advance in solvThis paper describes a new advance in solving Cross-Lingual Question Answering (CL–QA) tasks. It is built on three main pillars: (i) the use of several multilingual knowledge resources to reference words between languages (the Inter Lingual Index (ILI) module of EuroWordNet and the multilingual knowledge encoded in Wikipedia); (ii) the consideration of more than only one translation per word in order to search candidate answers; and (iii) the analysis of the question in the original language without any translation process. This novel approach overcomes the errors caused by the common use of Machine Translation (MT) services by CL–QA systems. We also expose some studies and experiments that justify the importance of analyzing whether a Named Entity should be translated or not. Named Entity should be translated or not.
Revid 10,759  +
Theories Undetermined
Theory type Design and action  +
Title Exploiting Wikipedia and EuroWordNet to solve cross-lingual question answering
Unit of analysis Article  +
Url http://dx.doi.org/10.1016/j.ins.2009.06.031  +
Volume 179  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Dump  +
Wikipedia language English  + , Spanish  +
Wikipedia page type Article  +
Year 2009  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:27:55  +
Categories Cross-language information retrieval  + , Computer science  + , Publications with missing comments  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:26:12  +
hide properties that link here 
  No properties link to this page.
 

 

Enter the name of the page to start browsing from.