DOI QR코드

DOI QR Code

Extended pivot-based approach for bilingual lexicon extraction

  • Seo, Hyeong-Won (Dept. of Computer Engineering, Korea Maritime and Ocean University) ;
  • Kwon, Hong-Seok (Dept. of Computer Engineering, Korea Maritime and Ocean University) ;
  • Kim, Jae-Hoon (Dept. of Computer Engineering, Korea Maritime and Ocean University)
  • Received : 2014.01.15
  • Accepted : 2014.03.26
  • Published : 2014.06.30

Abstract

This paper describes the extended pivot-based approach for bilingual lexicon extraction. The basic features of the approach can be described as follows: First, the approach builds context vectors between a source (or target) language and a pivot language like English, respectively. This is the same as the standard pivot-based approach which is useful for extracting bilingual lexicons between low-resource languages such as Korean-French. Second, unlike the standard pivot-based approach, the approach looks for similar context vectors in a source language. This is helpful to extract translation candidates for polysemous words as well as lets the translations be more confident. Third, the approach extracts translation candidates from target context vectors through the similarity between source and target context vectors. Based on these features, this paper describes the extended pivot-based approach and does various experiments in a language pair, Korean-French (KR-FR). We have observed that the approach is useful for extracting the most proper translation candidate as well as for a low-resource language pair.

Keywords

References

  1. P. Fung, "A statistical view on bilingual lexicon extraction: from parallel corpora to non-parallel corpora", In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas, pp. 1-16, 1998.
  2. K.-J. Lee, J.-H. Kim, H.-W. Seo, and K.-S. Ryu, "Feature weighting for opinion classification of comments on news articles", Journal of the Korean Society of Marine Engineering, vol. 34, no. 6, pp. 871-879, 2010. https://doi.org/10.5916/jkosme.2010.34.6.871
  3. P. Fung and K. McKeown, "Finding terminology translations from non-parallel corpora", Proceedings of the 5th Annual Workshop on Very Large Corpora, pp. 192-202, 1997.
  4. R. Rapp, "Automatic identification of word translations from unrelated English and German corpora" Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 519-526, 1999.
  5. Y. Cao and H. Li, "Base noun phrase translation using web data and the EM algorithm" Proceedings of the 19th International Conference on Computational Linguistics, pp. 127-133, 2002.
  6. Y. Chiao and P. Zweigenbaum, "Looking for candidate translational equivalents in specialized, comparable corpora", Proceedings of the 19th International Conference on Computational Linguistics, pp. 1208-1212, 2002.
  7. H. Dejean and E. Gaussier, "Une nouvelle approchea lextraction de lexiques bilingues a partir de corpus comparables", Lexicometrica, Alignement Lexical Dans les Corpus Multilingues, pp. 1-22, 2002.
  8. P. Koehn and K. Knight, "Learning a translation lexicon from monolingual corpora", Proceedings of the Association for Computational Linguistic on Unsupervised Lexical Acquisition, pp. 9-16, 2002.
  9. T. Tsunakawa, N. Okazaki, and J. Tsujii, "Building a bilingual lexicon using phrase-based statistical machine translation via a pivot language", Proceeding of the 22nd International Conference on Computational Linguistics, Posters Proceedings, pp. 18-22, 2008.
  10. P. Koehn, F. Och, and D. Marcu, "Statistical phrase-based translation", Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48-54, 2003.
  11. H. Dejean, F. Sadat, and E. Gaussier, "An approach based on multilingual thesauri and model combination for bilingual lexicon extraction", Proceedings of the 19th International Conference on Computational Linguistics, pp. 218-224, 2002.
  12. B. Daille and E. Morin, "French-English terminology extraction from comparable corpora", Proceedings of the 2nd International Joint Conference on Natural Language Processing, pp. 707-718, 2005.
  13. H.-W. Seo, H.-S. Kwon, and J.-H. Kim, "Context-based bilingual lexicon extraction via a pivot language", Proceedings of the Conference of the Pacific Association for Computational Linguistics, 2013.
  14. H.-W. Seo, H.-S. Kwon, and J.-H. Kim, "Rated recall: Evaluation method for constructing bilingual lexicons", Proceedings of the 25th Annual Conference on Human and Cognitive Language Technology, pp. 146-151, 2013.
  15. J.-H. Kim, H.-W. Seo, and H.-S. Kwon, "Bilingual lexicon induction through a pivot language", Journal of the Korean Society of Marine Engineering, vol. 37, no. 3, pp. 300-306, 2013. https://doi.org/10.5916/jkosme.2013.37.3.300
  16. R. Rapp, "Identify word translations in non-parallel texts", Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pp. 320-322, 1995.
  17. G. Grefenstette, "Corpus-derived first, second and third-order word affinities", Proceedings of the 6th Congress of the European Association for Lexicography, pp. 279-290, 1995.
  18. A. Hazem, E. Morin, and S. Saldarriaga, "Bilingual lexicon extraction from comparable corpora as metasearch", Proceeding of the 4th workshop on Building and Using Comparable Corpora, pp. 35-43, 2011.
  19. P. Koehn, "Europarl: A parallel corpus for statistical machine translation", Proceedings of the Conference on the 10th Machine Translation Summit, pp. 79-86, 2005.
  20. E. Voorhees, "The TREC-8 question answering track report", Proceedings of the 8th Text Retrieval Conference, pp. 77-82, 1999.

Cited by

  1. Analyzing Errors in Bilingual Multi-word Lexicons Automatically Constructed through a Pivot Language vol.39, pp.2, 2015, https://doi.org/10.5916/jkosme.2015.39.2.172
  2. 감정점수의 전파를 통한 한국어 감정사전 생성 vol.9, pp.2, 2014, https://doi.org/10.3745/ktsde.2020.9.2.53