|Learning weights for translation candidates in Japanese-Chinese information retrieval|
|Authors:||Chu-Cheng Lin, Yu-Chun Wang, Chih-Hao Yeh, Wei-Chi Tsai, Richard Tzong-Han Tsai|
|Citation:||Expert Systems with Applications 36 (4): 7695-7699. 2009.|
|Publication type:||Journal article|
|Google Scholar cites:||Citations|
|Added by Wikilit team:||Added on initial load|
|Article:||Google Scholar BASE PubMed|
|Other scholarly wikis:||AcaWiki Brede Wiki WikiPapers|
|Web search:||Bing Google Yahoo! — Google PDF|
This paper describes our Japanese-Chinese information retrieval system. Our system takes the query-translation" approach. Our system employs both a more conventional bilingual Japanese-Chinese dictionary and Wikipedia for translating query terms. We propose that Wikipedia can be used as a good NE bilingual dictionary. By exploiting the nature of Japanese writing system we propose that query terms be processed differently based on the forms they are written in. We use an iterative method for weight-tuning and term disambiguation which is based on the PageRank algorithm. When evaluating on the NTCIR-5 test set our system achieves as high as 0.2217 and 0.2276 in relax MAP (mean average precision) measurement of T-runs and D-runs.
"his paper describes our Japanese–Chinese information retrieval system. Our system takes the “query-translation” approach. Our system employs both a more conventional bilingual Japanese–Chinese dictionary and Wikipedia for translating query terms. We propose that Wikipedia can be used as a good NE bilingual dictionary. By exploiting the nature of Japanese writing system, we propose that query terms be processed differently based on the forms they are written in. We use an iterative method for weight-tuning and term disambiguation, which is based on the PageRank algorithm."
|Topics:||Cross-language information retrieval|
|Theory type:||Design and action|
|Wikipedia coverage:||Sample data|
|Research design:||Mathematical modeling, Statistical analysis|
|Data source:||Documents, Wikipedia pages|
|Collected data time dimension:||Cross-sectional|
|Unit of analysis:||Article|
|Wikipedia data extraction:||Live Wikipedia|
|Wikipedia page type:||Article|
|Wikipedia language:||Chinese, Japanese|
"We exploited the nature of Japanese vocabulary and the Japanese writing system for better translations. Using Kanji for translation yields significant improvements in our evaluation. The results of the evaluation confirm that foreign terms are widely transcribed in Katakana.
To cope with ambiguity, we have adopted an iterative disambiguating scheme. The current implementation of this scheme, which uses the likelihood function as its weight function, proved to be effective in the evaluation. Our system has achieved MAP as high as 0.2276, and outperforms the previous NTCIR-5 CLIR Japanese–Chinese T-run’s best rigid MAP by 111%, and D-run’s by 19%."