no code implementations • ACL (IWSLT) 2021 • Toan Q. Nguyen, Kenton Murray, David Chiang
In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation.
no code implementations • 22 Mar 2021 • Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Ortiz Suarez, Iroro Orife, Kelechi Ogueji, Andre Niyongabo Rubungo, Toan Q. Nguyen, Mathias Müller, André Müller, Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni, Jamshidbek Mirzakhalov, Tapiwanashe Matangira, Colin Leong, Nze Lawson, Sneha Kudugunta, Yacine Jernite, Mathias Jenny, Orhan Firat, Bonaventure F. P. Dossou, Sakhile Dlamini, Nisansa de Silva, Sakine Çabuk Ballı, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur Bapna, Pallavi Baljekar, Israel Abebe Azime, Ayodele Awokoya, Duygu Ataman, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal, Mofetoluwa Adeyemi
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages.
6 code implementations • ACL 2020 • Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff
Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one.
5 code implementations • EMNLP (IWSLT) 2019 • Toan Q. Nguyen, Julian Salazar
We evaluate three simple, normalization-centric changes to improve Transformer training.
Ranked #4 on
Machine Translation
on IWSLT2015 English-Vietnamese
1 code implementation • WS 2019 • Kenton Murray, Jeffery Kinnison, Toan Q. Nguyen, Walter Scheirer, David Chiang
Neural sequence-to-sequence models, particularly the Transformer, are the state of the art in machine translation.
4 code implementations • NAACL 2018 • Toan Q. Nguyen, David Chiang
We explore two solutions to the problem of mistranslating rare words in neural machine translation.
no code implementations • IJCNLP 2017 • Toan Q. Nguyen, David Chiang
We present a simple method to improve neural translation of a low-resource language pair using parallel data from a related, also low-resource, language pair.