DOI QR코드

DOI QR Code

Classification-Based Approach for Hybridizing Statistical and Rule-Based Machine Translation

  • 투고 : 2014.08.26
  • 심사 : 2015.01.19
  • 발행 : 2015.05.01

초록

In this paper, we propose a classification-based approach for hybridizing statistical machine translation and rulebased machine translation. Both the training dataset used in the learning of our proposed classifier and our feature extraction method affect the hybridization quality. To create one such training dataset, a previous approach used auto-evaluation metrics to determine from a set of component machine translation (MT) systems which gave the more accurate translation (by a comparative method). Once this had been determined, the most accurate translation was then labelled in such a way so as to indicate the MT system from which it came. In this previous approach, when the metric evaluation scores were low, there existed a high level of uncertainty as to which of the component MT systems was actually producing the better translation. To relax such uncertainty or error in classification, we propose an alternative approach to such labeling; that is, a cut-off method. In our experiments, using the aforementioned cut-off method in our proposed classifier, we managed to achieve a translation accuracy of 81.5% - a 5.0% improvement over existing methods.

키워드

참고문헌

  1. Y.A. Seo, S.K. Park, and K.S. Choi, "Structural Disambiguation of Korean Adverbs Based on Correlative Relation and Morphological Context," ETRI J., vol. 28, no. 6, Dec. 2006, pp. 803-806. https://doi.org/10.4218/etrij.06.0206.0139
  2. S.I. Yang et al., "Noun Sense Identification of Korean Nominal Compounds Based on Sentential Form Recovery," ETRI J., vol. 32, no. 5, Oct. 2010, pp. 740-749. https://doi.org/10.4218/etrij.10.1510.0083
  3. A. Eisele et al., "Hybrid Machine Translation Architectures within and beyond the EuroMatrix Project," Proc. Annual Conf. European Association Mach. Transl., Hamburg, Germany, Sept. 22-23, 2008, pp. 27-34.
  4. K. Papineni et al., "BLEU: A Method for Automatic Evaluation of Machine Translation," Proc. Association Comput. Linguistics, Philadelphia, PA, USA, July 2002, pp. 311-318.
  5. C. Federmann, "Hybrid Machine Translation Using Joint, Binarised Feature Vectors," Proc. Conf. Association Mach. Transl. Americas, San Diego, CA, USA, Oct. 2012, pp. 113-118.
  6. N. Ueffing, K. Macherey, and H. Ney, "Confidence Measures for Statistical Machine Translation," Mach. Transl. Summit, New Orleans, LA, USA, Sept. 2003, pp. 394-401.
  7. J. Blatz et al., "Confidence Estimation for Machine Translation," Int. Conf. Comput. Linguistics, Geneva, Switzerland, Aug. 23-27, 2004, pp. 315-321.
  8. C.B. Quirk, "Training a Sentence-Level Machine Translation Confidence Measure," Int. Conf. Language Resources Evaluation, Lisbon, Portugal, May 26-28, 2004, pp. 825-828.
  9. N. Ueffing and H. Ney, "Application of Word-Level Confidence Measures in Interactive Statistical Machine Translation," Annual Conf. European Association Mach. Transl., Budapest, Hungary, May 30-31, 2005, pp. 262-270.
  10. C. Callison-burch and R.S. Flournoy, "A Program for Automatically Selecting the Best Output from Multiple Machine Translation Engines," Proc. Mach. Transl. Summit VIII, Sept. 18-22, 2001, pp. 63-66.
  11. Y. Akiba, T. Watanabe, and E. Sumita, "Using Language and Translation Models to Select the Best among Outputs from Multiple MT systems," Proc. COLING, Aug. 26-30, 2002, pp. 8-14.
  12. F. Huang and K. Papineni, "Hierarchical System Combination for Machine Translation," Proc. Empirical Methods Natural Language Process, Prague, Czech Republic, June 2007, pp. 277-286.
  13. E. Avramidis, "DFKI System Combination with Sentence Ranking at ML4HMT-2011," Proc. Int. Workshop Using Linguistic Inf. Hybrid Mach. Transl. Shared Task Applying Mach. Learning Techn. Optimise Division Labor Hybrid Mach. Transl., Barcelona, Spain, Nov. 18, 2011.
  14. R. Soricut and S. Narsale, "Combining Quality Prediction and System Selection for Improved Automatic Translation Output," Proc. Workshop Statistical Mach. Transl. Association Comput. Linguistics, Jeju, Rep. of Korea, July 2012, pp. 163-170.
  15. S.H. Na, C.H. Kim, and Y.K. Kim, "Two-Stage Compound Morpheme Segmentation in CRF-Based Korean Morphological Analysis," Annual Conf. Human Cognitive Language Technol., Seoul, Rep. of Korea, Oct. 2007, 2013, pp. 13-17.
  16. P. Koehn et al., "Moses: Open Source Toolkit for Statistical Machine Translation," Proc. Annual Meeting ACL Interactive Poster Demonstration Sessions Association Comput. Linguistics, Prague, Czech Republic, June 23-30, pp. 177-180.
  17. H. Tseng et al., "A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005," Proc. SIGHAN Workshop Chinese Language Process., Jeju, Rep. of Korea, Oct. 14-15, 2005, pp. 168-171.
  18. C.-C. Chang and C.-J. Lin, "libSVM: A Library for Support Vector Machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, Apr. 2011, pp. 27:1-27:27.
  19. C.-W. Hsu, C.-C. Chang, and C.-J. Lin, "A Practical Guide to Support Vector Classification," Technical report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Apr. 15, 2010.
  20. K. Heafield, "KenLM: Faster and Smaller Language Model Queries," Proc. Workshop Statistical Mach. Transl. Association Comput. Linguistics, Edinburgh, UK, July 30-31, 2011, pp. 187-197.
  21. O.-W. Kwon et al., "Customizing an English-Korean Machine Translation System for Patent/Technical Documents Translation," Proc. Pacific Asia Conf. Language, Inf. Comput., Hong Kong, China, Dec. 3-5, 2009, pp. 718-725.

피인용 문헌

  1. Hybrid Translation with Classification: Revisiting Rule-Based and Neural Machine Translation vol.9, pp.2, 2020, https://doi.org/10.3390/electronics9020201