Improving the extraction of bilingual terminology from Wikipedia

From WikiLit
Jump to: navigation, search
Publication (help)
Improving the extraction of bilingual terminology from Wikipedia
Authors: Maike Erdmann, Kotaro Nakayama, Takahiro Hara, Shojiro Nishio [edit item]
Citation: ACM Transactions on Multimedia Computing, Communications and Applications 5 (4): . 2009.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: 10.1145/1596990.1596995.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
Improving the extraction of bilingual terminology from Wikipedia is a publication by Maike Erdmann, Kotaro Nakayama, Takahiro Hara, Shojiro Nishio.


[edit] Abstract

Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs.

[edit] Research questions

"In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs."

Research details

Topics: Cross-language information retrieval [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: English, German [edit item]

[edit] Conclusion

"The experiment proved that it is effective to use an SVM classifier to determine the correctness of a term-translation pair. Furthermore, the experiment showed that our proposed method, which uses 13 different features, performs better than our previous method, which used only 2 features. We also showed that many of the extracted term-translation pairs are not covered in even comprehensive manually created dictionaries. Furthermore, since Wikipedia is growing continuously, both accuracy and coverage of our dictionary will become even better in the near future. We believe that we can easily combine our dictionary with manually constructed dictionaries such as the BEOLINGUS dictionary, in order to enhance the coverage of common terms, especially for word groups other than nouns."

[edit] Comments


Further notes[edit]

Facts about "Improving the extraction of bilingual terminology from Wikipedia"RDF feed
AbstractResearch on the automatic construction of Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs.rectness of unseen term-translation pairs.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
ConclusionThe experiment proved that it is effectiveThe experiment proved that it is effective to use an SVM classifier to determine the correctness of a

term-translation pair. Furthermore, the experiment showed that our proposed method, which uses 13 different features, performs better than our previous method, which used only 2 features. We also showed that many of the extracted term-translation pairs are not covered in even comprehensive manually created dictionaries. Furthermore, since Wikipedia is growing continuously, both accuracy and coverage of our dictionary will become even better in the near future. We believe that we can easily combine our dictionary with manually constructed dictionaries such as the BEOLINGUS dictionary, in order to enhance the coverage of common terms, especially for word groups other than nouns.pecially for word

groups other than nouns.
Data sourceExperiment responses + and Wikipedia pages +
Doi10.1145/1596990.1596995 +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22Improving%2Bthe%2Bextraction%2Bof%2Bbilingual%2Bterminology%2Bfrom%2BWikipedia%22 +
Has authorMaike Erdmann +, Kotaro Nakayama +, Takahiro Hara + and Shojiro Nishio +
Has domainComputer science +
Has topicCross-language information retrieval +
Issue4 +
Peer reviewedYes +
Publication typeJournal article +
Published inACM Transactions on Multimedia Computing, Communications and Applications +
Research designExperiment +
Research questionsIn this article, we want to further pursueIn this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs.rectness of unseen term-translation pairs.
Revid10,818 +
TheoriesUndetermined
Theory typeDesign and action +
TitleImproving the extraction of bilingual terminology from Wikipedia
Unit of analysisArticle +
Urlhttp://dx.doi.org/10.1145/1596990.1596995 +
Volume5 +
Wikipedia coverageSample data +
Wikipedia data extractionLive Wikipedia +
Wikipedia languageEnglish + and German +
Wikipedia page typeArticle +
Year2009 +