ENHANCING UZBEK-ENGLISH NEURAL MACHINE TRANSLATION WITH DOMAIN-SPECIFIC BERT PRETRAINING

Safoev N.N.,Fayziyev Sh.I.

Authors

Safoev N.N.,Fayziyev Sh.I. PhD student, Bukhara state technical university,Doctor of technical sciences, Associate Professor Responsible employee of the Accounts Chamber of the Republic of Uzbekistan.

Keywords:

Uzbek-English translation, neural machine translation, BERT pretraining, domain-specific language models, low-resource languages, transformer architecture, machine translation evaluation, domain adaptation, morphological complexity, natural language processing.

Abstract

This article investigates the enhancement of Uzbek-English neural machine translation (NMT) by leveraging domain-specific BERT pretraining. Due to the low-resource nature and morphological complexity of Uzbek, standard NMT models often struggle with domain-specific terminology and contextual nuances. By pretraining BERT models on monolingual corpora tailored to general, medical, and legal domains, and integrating them into a transformer-based NMT framework, the study achieves significant improvements in translation quality. Results demonstrate that domain-specific pretraining notably outperforms general pretraining and baseline models, highlighting its effectiveness for specialized translations in low-resource language pairs.

Downloads

Download data is not yet available.

References

Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (ICLR).

Chu, C., Dabre, R., & Nakazawa, T. (2017). A survey of domain adaptation for neural machine translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), 1307–1319.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT.

Gururangan, S., Marasović, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D., & Smith, N. A. (2020). Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. ACL.

Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., ... & Dean, J. (2017). Google's multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5, 339-351.

Koehn, P., & Knowles, R. (2017). Six challenges for neural machine translation. Proceedings of the First Workshop on Neural Machine Translation, 28-39.

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2019). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240.

Liu, Y., Zhou, M., Chen, W., Sun, C., Liu, J., & Wang, H. (2020). Fused pretrained language models for neural machine translation. Findings of the Association for Computational Linguistics: EMNLP 2020, 2647–2653.

Nguyen, T. Q., & Chiang, D. (2017). Transfer learning across low-resource, related languages for neural machine translation. EMNLP, 296–302.

Salloum, W., & Habash, N. (2014). A morphological segmentation approach for Arabic machine translation. Machine Translation, 28(2), 89-117.

ENHANCING UZBEK-ENGLISH NEURAL MACHINE TRANSLATION WITH DOMAIN-SPECIFIC BERT PRETRAINING

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License