Google Translate vs. DeepL analysing neural machine translation performance under the challenge of phraseological variation
Main Article Content
Abstract
The present research analyses the performance of two free open-source neural machine translation (NMT) systems —Google Translate and DeepL— in the (ES>EN) translation of somatisms such as tomar el pelo and meter la pata, their nominal variants (tomadura/tomada de pelo and metedura/metida de pata), and other lower-frequency variants such as meter la pata hasta el corvejón, meter la gamba and metedura/metida de gamba. The machine translation outcomes will be contrasted and classified depending on whether these idioms are presented in their continuous or discontinuous form (Anastasiou 2010), i.e., whether different n-grams split the idiomatic sequence (or not), which may pose some difficulties for their automatic detection and translation. Overall, the insights gained from this study will prove useful in determining for which of the different scenarios either Google Translate or DeepL delivers a better performance under the challenge of phraseological variation and discontinuity.
Downloads
Article Details
The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
References
Anastasiou, Dimitra. (2010) Idiom Treatment Experiments in Machine Translation. Newcastle upon Tyne: Cambridge Scholars Publishing. ISBN-13: 978-1-4438-2515-3 ISBN-10: 1-4438-2515-8
Anastasopoulos, Antonios. (2019) “An analysis of source-side grammatical errors in NMT.” Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Florence, Italy: Association for Computational Linguistics, pp. 213–223.
Belinkov, Yonatan & Yonatan Bisk. (2018) “Synthetic and natural noise both break neural machine translation.” CoRR abs/1711.02173
Corpas Pastor, Gloria. (2015) “Translating English Verbal Collocations into Spanish: on Distribution and other Relevant Differences related to Diatopic Variation.” Lingvisticae Investigationes Special Issue ‘Spanish Phraseology. Varieties and variations’, 38:2, pp. 229–262. ISSN: 03784169 / E-ISSN: 15699927. DOI 10.1075/li.38.2.03cor.
Corpas Pastor, Gloria. (2018) “Laughing one’s head off in Spanish subtitles: a corpus-based study on diatopic variation and its consequences for translation.” In: Mogorrón Huerta, Pedro & Albadalejo-Martínez, Antonio. (eds.) 2018. Fraseología, Diatopía y Traducción, Colección “IVITRA Research in Linguistics and Literature”. Ámsterdam: John Benjamins, pp. 54-106. ISBN:9789027202253.
Costa-Jussà, Marta R.; Marcos Zampieri & Santanu Pal. (2018) “A neural approach to language variety translation.” Proceedings of the fifth workshop on NLP for similar languages, varieties and dialects (VarDial 2018), pp. 275–282.
Farhan, W.; Bashar Talafha; Analle Abuammar; Ruba Jaikat; Mahmoud Al-Ayyoub; Ahmad Bisher Tarakji & Anas Toma. (2020) “Unsupervised dialectal neural machine translation.” Information Processing & Management, 57:3. ISSN 0306-4573. DOI: https://doi.org/10.1016/j.ipm.2019.102181
Gülçehre, Çaglar; Sungjin Ahn; Ramesh Nallapati; Bowen Zhou & Yoshua Bengio. (2016) “Pointing the unknown words.” CoRR. abs/1603.08148
Honnet, Pierre-Edouard; Andrei Popescu-Belis; Claudiu Musat & Michael Baeriswyl. (2017) “Machine translation of low-resource spoken dialects: Strategies for normalizing swiss german.” CoRR. abs/1710.11035.
Huang, Po-Sen; Chong Wang; Sitao Huang; Denny Zhou, & Li Deng. (2018) “Towards neural phrase-based machine translation.” ICLR.
Kilgarriff, Adam; Vít Baisa; Jan Bušta; Miloš Jakubíček; Vojtěch Kovvář; Jan Michelfeit; Pavel Rychlý & Vít Suchomel. (2003) “The Sketch Engine.” https://www.sketchengine.eu
Koike, Kazumi. (2007) “Relaciones paradigmáticas y sintagmáticas de las locuciones verbales en español.” In: Cuartero Otal, Juan & Emsel, Martina (eds.), Vernetzungen Bedeutung in Wort, Satz und Text. Festschrift für Gerd Wotjak zum 65. Geburtstag. Frankfurt am Main: Peter Lang, pp. 263-275.
Liu, Nelson F.; Jonathan May; Michael Pust & Kevin Knight. (2018) “Augmenting statistical machine translation with subword translation of out-of-vocabulary words.” arXiv preprint arXiv:1808.05700.
Lohar, Pintu; Maja Popović; Haithem Alfi & Andy Way. (2019) “A systematic comparison between SMT and NMT on translating user-generated content.” 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2019). La Rochelle, France.
Rikters, Matīss & Ondřej Bojar. (2017) “Paying attention to multi-word expressions in neural machine translation.” arXiv preprint arXiv:1710.06313
Rohanian, Omid; Shiva Taslimipoor; Samaneh Kouchaki; Le An Ha & Ruslan Mitkov. (2019) “Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, pp. 2692 - 2698.
Sag, Ivan A.; Timothy Baldwin; Francis Bond; Ann Copestake & Dan Flickinger. (2002) “Multiword Expressions: A Pain in the Neck for NLP.” In: Alexander Gelbukh (ed.) 2002. Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, pp. 1–15.
Sennrich, Rico; Barry Haddow & Alexandra Birch. (2015) “Neural machine translation of rare words with subword units.” arXiv preprint arXiv:1508.07909.
Sperber, Matthias; Jan Niehues & Alex Waibel. (2017) “Toward robust neural machine translation for noisy input sequences.” International Workshop on Spoken Language Translation (IWSLT). Tokyo, Japan.
Valencia Giraldo, M. Victoria, & Gloria Corpas Pastor. (2019) “The Portrait of Dorian Gray: A Corpus-Based Analysis of Translated Verb + Noun (Object) Collocations in Peninsular and Colombian Spanish.” In: Gloria Corpas Pastor & Ruslan Mitkov (eds.) 2019. Computational and Corpus-Based Phraseology. Europhras 2019, LNAI 11755. Cham: Springer Nature Switzerland AG. DOI: https://doi.org/10.1007/978-3-030-30135-4_13
Wang, Xing; Zhaopeng Tu; Deyi Xiong & Min Zhang. (2017) “Translating phrases in neural machine translation.” EMNLP 2017, pp. 1421–1431.