ENHANCING UZBEK-ENGLISH NEURAL MACHINE TRANSLATION WITH DOMAIN-SPECIFIC BERT PRETRAINING

Authors

  • Safoev N.N.,Fayziyev Sh.I. PhD student, Bukhara state technical university,Doctor of technical sciences, Associate Professor Responsible employee of the Accounts Chamber of the Republic of Uzbekistan.

Keywords:

Uzbek-English translation, neural machine translation, BERT pretraining, domain-specific language models, low-resource languages, transformer architecture, machine translation evaluation, domain adaptation, morphological complexity, natural language processing.

Abstract

This article investigates the enhancement of Uzbek-English neural machine translation (NMT) by leveraging domain-specific BERT pretraining. Due to the low-resource nature and morphological complexity of Uzbek, standard NMT models often struggle with domain-specific terminology and contextual nuances. By pretraining BERT models on monolingual corpora tailored to general, medical, and legal domains, and integrating them into a transformer-based NMT framework, the study achieves significant improvements in translation quality. Results demonstrate that domain-specific pretraining notably outperforms general pretraining and baseline models, highlighting its effectiveness for specialized translations in low-resource language pairs.

Downloads

Download data is not yet available.

References

Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (ICLR).

Chu, C., Dabre, R., & Nakazawa, T. (2017). A survey of domain adaptation for neural machine translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), 1307–1319.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT.

Gururangan, S., Marasović, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D., & Smith, N. A. (2020). Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. ACL.

Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., ... & Dean, J. (2017). Google's multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5, 339-351.

Koehn, P., & Knowles, R. (2017). Six challenges for neural machine translation. Proceedings of the First Workshop on Neural Machine Translation, 28-39.

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2019). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240.

Liu, Y., Zhou, M., Chen, W., Sun, C., Liu, J., & Wang, H. (2020). Fused pretrained language models for neural machine translation. Findings of the Association for Computational Linguistics: EMNLP 2020, 2647–2653.

Nguyen, T. Q., & Chiang, D. (2017). Transfer learning across low-resource, related languages for neural machine translation. EMNLP, 296–302.

Salloum, W., & Habash, N. (2014). A morphological segmentation approach for Arabic machine translation. Machine Translation, 28(2), 89-117.

Downloads

Published

2025-06-21

How to Cite

Safoev N.N.,Fayziyev Sh.I. (2025). ENHANCING UZBEK-ENGLISH NEURAL MACHINE TRANSLATION WITH DOMAIN-SPECIFIC BERT PRETRAINING. Journal of Applied Science and Social Science, 15(06), 620–625. Retrieved from https://www.internationaljournal.co.in/index.php/jasass/article/view/1326