Browse wiki

Jump to: navigation, search
Improving the extraction of bilingual terminology from Wikipedia
Abstract Research on the automatic construction of Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs.rectness of unseen term-translation pairs.
Added by wikilit team Added on initial load  +
Collected data time dimension Cross-sectional  +
Conclusion The experiment proved that it is effectiveThe experiment proved that it is effective to use an SVM classifier to determine the correctness of a term-translation pair. Furthermore, the experiment showed that our proposed method, which uses 13 different features, performs better than our previous method, which used only 2 features. We also showed that many of the extracted term-translation pairs are not covered in even comprehensive manually created dictionaries. Furthermore, since Wikipedia is growing continuously, both accuracy and coverage of our dictionary will become even better in the near future. We believe that we can easily combine our dictionary with manually constructed dictionaries such as the BEOLINGUS dictionary, in order to enhance the coverage of common terms, especially for word groups other than nouns.pecially for word groups other than nouns.
Data source Experiment responses  + , Wikipedia pages  +
Doi 10.1145/1596990.1596995 +
Google scholar url http://scholar.google.com/scholar?ie=UTF-8&q=%22Improving%2Bthe%2Bextraction%2Bof%2Bbilingual%2Bterminology%2Bfrom%2BWikipedia%22  +
Has author Maike Erdmann + , Kotaro Nakayama + , Takahiro Hara + , Shojiro Nishio +
Has domain Computer science +
Has topic Cross-language information retrieval +
Issue 4  +
Peer reviewed Yes  +
Publication type Journal article  +
Published in ACM Transactions on Multimedia Computing, Communications and Applications +
Research design Experiment  +
Research questions In this article, we want to further pursueIn this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs.rectness of unseen term-translation pairs.
Revid 10,818  +
Theories Undetermined
Theory type Design and action  +
Title Improving the extraction of bilingual terminology from Wikipedia
Unit of analysis Article  +
Url http://dx.doi.org/10.1145/1596990.1596995  +
Volume 5  +
Wikipedia coverage Sample data  +
Wikipedia data extraction Live Wikipedia  +
Wikipedia language English  + , German  +
Wikipedia page type Article  +
Year 2009  +
Creation dateThis property is a special property in this wiki. 15 March 2012 20:28:57  +
Categories Cross-language information retrieval  + , Computer science  + , Publications with missing comments  + , Publications  +
Modification dateThis property is a special property in this wiki. 30 January 2014 20:28:51  +
show properties that link here 

 

Enter the name of the page to start browsing from.