1 code implementation • ACL 2022 • Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, Alexander Panchenko
To the best of our knowledge, these are the first parallel datasets for this task. We describe our pipeline in detail to make it fast to set up for a new language or domain, thus contributing to faster and easier development of new parallel resources. We train several detoxification models on the collected data and compare them with several baselines and state-of-the-art unsupervised approaches.
1 code implementation • ACL 2022 • Nikolay Babakov, David Dale, Varvara Logacheva, Alexander Panchenko
In both tasks, the system is supposed to generate a text which should be semantically similar to the input text.
no code implementations • 6 Feb 2025 • The Omnilingual MT Team, Pierre Andrews, Mikel Artetxe, Mariano Coria Meglioli, Marta R. Costa-jussà, Joe Chuang, David Dale, Cynthia Gao, Jean Maillard, Alex Mourachko, Christophe Ropers, Safiyyah Saleem, Eduardo Sánchez, Ioannis Tsiamas, Arina Turkatenko, Albert Ventayol-Boada, Shireen Yates
Therefore, BOUQuET is specially suitable for the open initiative and call for translation participation that we are launching to extend it to a multi-way parallel corpus to any written language.
no code implementations • 11 Dec 2024 • Marta R. Costa-jussà, Pierre Andrews, Mariano Coria Meglioli, Joy Chen, Joe Chuang, David Dale, Christophe Ropers, Alexandre Mourachko, Eduardo Sánchez, Holger Schwenk, Tuan Tran, Arina Turkatenko, Carleigh Wood
The primary motivation behind providing summaries of different lengths is to establish a controllable framework for generating long texts from shorter inputs, i. e. summary expansion.
1 code implementation • 11 Dec 2024 • LCM team, Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, Mariano Coria, Guillaume Couairon, Marta R. Costa-jussà, David Dale, Hady Elsahar, Kevin Heffernan, João Maria Janeiro, Tuan Tran, Christophe Ropers, Eduardo Sánchez, Robin San Roman, Alexandre Mourachko, Safiyyah Saleem, Holger Schwenk
In this paper, we present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept.
no code implementations • 29 Jan 2024 • Christophe Ropers, David Dale, Prangthip Hansanti, Gabriel Mejia Gonzalez, Ivan Evtimov, Corinne Wong, Christophe Touret, Kristina Pereyra, Seohyun Sonia Kim, Cristian Canton Ferrer, Pierre Andrews, Marta R. Costa-jussà
Assessing performance in Natural Language Processing is becoming increasingly complex.
1 code implementation • 10 Jan 2024 • Marta R. Costa-jussà, Mariano Coria Meglioli, Pierre Andrews, David Dale, Prangthip Hansanti, Elahe Kalbassi, Alex Mourachko, Christophe Ropers, Carleigh Wood
Research in toxicity detection in natural language processing for the speech modality (audio-based) is quite limited, particularly for languages other than English.
1 code implementation • 8 Dec 2023 • Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek, Yilin Yang, Ethan Ye, Ivan Evtimov, Pierre Fernandez, Cynthia Gao, Prangthip Hansanti, Elahe Kalbassi, Amanda Kallet, Artyom Kozhevnikov, Gabriel Mejia Gonzalez, Robin San Roman, Christophe Touret, Corinne Wong, Carleigh Wood, Bokai Yu, Pierre Andrews, Can Balioglu, Peng-Jen Chen, Marta R. Costa-jussà, Maha Elbayad, Hongyu Gong, Francisco Guzmán, Kevin Heffernan, Somya Jain, Justine Kao, Ann Lee, Xutai Ma, Alex Mourachko, Benjamin Peloquin, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Anna Sun, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang, Mary Williamson
In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion.
automatic-speech-translation
Multimodal Machine Translation
+2
no code implementations • 23 Nov 2023 • Daryna Dementieva, Daniil Moskovskiy, David Dale, Alexander Panchenko
Text detoxification is the task of transferring the style of text from toxic to neutral.
no code implementations • 11 Nov 2023 • Marta R. Costa-jussà, David Dale, Maha Elbayad, Bokai Yu
MinTox uses a toxicity detection classifier which is multimodal (speech and text) and works in languages at scale.
no code implementations • 20 Sep 2023 • Belen Alastruey, Aleix Sant, Gerard I. Gállego, David Dale, Marta R. Costa-jussà
In doing so, we contribute to the ongoing research progress within the fields of Speech-to-Speech and Speech-to-Text translation.
4 code implementations • 22 Aug 2023 • Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-jussà, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzmán, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?
Ranked #1 on
Speech-to-Speech Translation
on CVSS
(using extra training data)
Automatic Speech Recognition
Speech-to-Speech Translation
+4
1 code implementation • 17 Aug 2023 • Nikolay Babakov, David Dale, Ilya Gusev, Irina Krotova, Alexander Panchenko
Text style transfer techniques are gaining popularity in natural language processing allowing paraphrasing text in the required form: from toxic to neural, from formal to informal, from old to the modern English language, etc.
1 code implementation • 19 May 2023 • David Dale, Elena Voita, Janice Lam, Prangthip Hansanti, Christophe Ropers, Elahe Kalbassi, Cynthia Gao, Loïc Barrault, Marta R. Costa-jussà
Hallucinations in machine translation are translations that contain information completely unrelated to the input.
no code implementations • 16 Dec 2022 • David Dale, Elena Voita, Loïc Barrault, Marta R. Costa-jussà
We propose to use a method that evaluates the percentage of the source contribution to a generated translation.
1 code implementation • FieldMatters (COLING) 2022 • David Dale
We present the first neural machine translation system for translation between the endangered Erzya language and Russian and the dataset collected by us to train and evaluate it.
Ranked #1 on
Machine Translation
on slone/myv_ru_2022 myv-ru
2 code implementations • 20 Jun 2022 • Nikolay Babakov, David Dale, Varvara Logacheva, Irina Krotova, Alexander Panchenko
Text style transfer techniques are gaining popularity in Natural Language Processing, finding various applications such as text detoxification, sentiment, or formality transfer.
1 code implementation • EMNLP 2021 • David Dale, Anton Voronov, Daryna Dementieva, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko
We compare our models with a number of methods for style transfer.
no code implementations • SEMEVAL 2021 • David Dale, Igor Markov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko
We show that fine-tuning a RoBERTa model for this problem is a strong baseline.
3 code implementations • 19 May 2021 • Daryna Dementieva, Daniil Moskovskiy, Varvara Logacheva, David Dale, Olga Kozlova, Nikita Semenov, Alexander Panchenko
We introduce the first study of automatic detoxification of Russian texts to combat offensive language.