Search Results for author: François Yvon

Found 43 papers, 20 papers with code

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

6 code implementations9 Nov 2022 BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.

Language Modelling Multilingual NLP

SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings

3 code implementations Findings of the Association for Computational Linguistics 2020 Masoud Jalili Sabet, Philipp Dufter, François Yvon, Hinrich Schütze

We find that alignments created from embeddings are superior for four and comparable for two language pairs compared to those produced by traditional statistical aligners, even with abundant parallel data; e. g., contextualized embeddings achieve a word alignment F1 for English-German that is 5 percentage points higher than eflomal, a high-quality statistical aligner, trained on 100k parallel sentences.

Machine Translation Multilingual Word Embeddings +3

GlotLID: Language Identification for Low-Resource Languages

3 code implementations24 Oct 2023 Amir Hossein Kargaran, Ayyoob Imani, François Yvon, Hinrich Schütze

Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages.

Dialect Identification

Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM

1 code implementation3 Mar 2023 Rachel Bawden, François Yvon

The NLP community recently saw the release of a new large open-access multilingual language model, BLOOM (BigScience et al., 2022) covering 46 languages.

Cross-Lingual Transfer Language Modelling +2

GlotScript: A Resource and Tool for Low Resource Writing System Identification

1 code implementation23 Sep 2023 Amir Hossein Kargaran, François Yvon, Hinrich Schütze

We present GlotScript, an open resource and tool for low resource writing system identification.

Language Modelling

Graph Algorithms for Multiparallel Word Alignment

1 code implementation EMNLP 2021 Ayyoob Imani, Masoud Jalili Sabet, Lütfi Kerem Şenel, Philipp Dufter, François Yvon, Hinrich Schütze

With the advent of end-to-end deep learning approaches in machine translation, interest in word alignments initially decreased; however, they have again become a focus of research more recently.

Link Prediction Machine Translation +3

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

1 code implementation18 Oct 2022 Ayyoob Imani, Silvia Severini, Masoud Jalili Sabet, François Yvon, Hinrich Schütze

An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages.

Part-Of-Speech Tagging POS +3

Weakly Supervised Word Segmentation for Computational Language Documentation

1 code implementation ACL 2022 Shu Okabe, Laurent Besacier, François Yvon

Word and morpheme segmentation are fundamental steps of language documentation as they allow to discover lexical units in a language for which the lexicon is unknown.

Incremental Learning Segmentation

Using Monolingual Data in Neural Machine Translation: a Systematic Study

1 code implementation WS 2018 Franck Burlot, François Yvon

Our findings confirm that back-translation is very effective and give new explanations as to why this is the case.

Machine Translation Translation

CroissantLLM: A Truly Bilingual French-English Language Model

1 code implementation1 Feb 2024 Manuel Faysse, Patrick Fernandes, Nuno M. Guerreiro, António Loison, Duarte M. Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro H. Martins, Antoni Bigata Casademunt, François Yvon, André F. T. Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo

We introduce CroissantLLM, a 1. 3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware.

Language Modelling Large Language Model

One Source, Two Targets: Challenges and Rewards of Dual Decoding

1 code implementation EMNLP 2021 Jitao Xu, François Yvon

Machine translation is generally understood as generating one target text from an input source document.

Machine Translation Translation +1

Joint Generation of Captions and Subtitles with Dual Decoding

1 code implementation IWSLT (ACL) 2022 Jitao Xu, François Buet, Josep Crego, Elise Bertin-Lemée, François Yvon

As the amount of audio-visual content increases, the need to develop automatic captioning and subtitling solutions to match the expectations of a growing international audience appears as the only viable way to boost throughput and lower the related post-production costs.

Unsupervised Word Segmentation from Speech with Attention

no code implementations18 Jun 2018 Pierre Godard, Marcely Zanon-Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio, Laurent Besacier

We present a first attempt to perform attentional word segmentation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL).

Acoustic Unit Discovery Machine Translation +2

Generative latent neural models for automatic word alignment

no code implementations AMTA 2020 Anh Khoa Ngo Ho, François Yvon

Word alignments identify translational correspondences between words in a parallel sentence pair and are used, for instance, to learn bilingual dictionaries, to train statistical machine translation systems or to perform quality estimation.

Machine Translation Sentence +3

Neural Baselines for Word Alignment

no code implementations EMNLP (IWSLT) 2019 Anh Khoa Ngo Ho, François Yvon

Word alignments identify translational correspondences between words in a parallel sentence pair and is used, for instance, to learn bilingual dictionaries, to train statistical machine translation systems , or to perform quality estimation.

Machine Translation Sentence +2

Can You Traducir This? Machine Translation for Code-Switched Input

no code implementations NAACL (CALCS) 2021 Jitao Xu, François Yvon

Code-Switching (CSW) is a common phenomenon that occurs in multilingual geographic or social contexts, which raises challenging problems for natural language processing tools.

Machine Translation Translation

Biais de genre dans un système de traduction automatiqueneuronale : une étude préliminaire (Gender Bias in Neural Translation : a preliminary study )

no code implementations JEP/TALN/RECITAL 2021 Guillaume Wisniewski, Lichao Zhou, Nicolas Ballier, François Yvon

Cet article présente les premiers résultats d’une étude en cours sur les biais de genre dans les corpus d’entraînements et dans les systèmes de traduction neuronale.

Vers la production automatique de sous-titres adaptés à l’affichage (Towards automatic adapted monolingual captioning)

no code implementations JEP/TALN/RECITAL 2021 François Buet, François Yvon

Une façon de réaliser un sous-titrage automatique monolingue est d’associer un système de reconnaissance de parole avec un modèle de traduction de la transcription vers les sous-titres.

Optimizing Word Alignments with Better Subword Tokenization

no code implementations MTSummit 2021 Anh Khoa Ngo Ho, François Yvon

Word alignment identify translational correspondences between words in a parallel sentence pair and are used and for example and to train statistical machine translation and learn bilingual dictionaries or to perform quality estimation.

Machine Translation Sentence +2

Priming Neural Machine Translation

no code implementations WMT (EMNLP) 2020 Minh Quang Pham, Jitao Xu, Josep Crego, François Yvon, Jean Senellart

Priming is a well known and studied psychology phenomenon based on the prior presentation of one stimulus (cue) to influence the processing of a response.

Machine Translation NMT +1

Screening Gender Transfer in Neural Machine Translation

no code implementations EMNLP (BlackboxNLP) 2021 Guillaume Wisniewski, Lichao Zhu, Nicolas Ballier, François Yvon

This paper aims at identifying the information flow in state-of-the-art machine translation systems, taking as example the transfer of gender when translating from French into English.

Machine Translation Translation

Two-Step MT: Predicting Target Morphology

no code implementations IWSLT 2016 Franck Burlot, Elena Knyazeva, Thomas Lavergne, François Yvon

This paper describes a two-step machine translation system that addresses the issue of translating into a morphologically rich language (English to Czech), by performing separately the translation and the generation of target morphology.

Machine Translation Translation +1

Multi-Domain Adaptation in Neural Machine Translation with Dynamic Sampling Strategies

no code implementations EAMT 2022 Minh-Quang Pham, Josep Crego, François Yvon

In this paper, we study dynamic data selection strategies that are able to automatically re-evaluate the usefulness of data samples and to evolve a data selection policy in the course of training.

Machine Translation Translation +1

Ré-ordonnancement via programmation dynamique pour l’adaptation cross-lingue d’un analyseur en dépendances (Sentence reordering via dynamic programming for cross-lingual dependency parsing )

no code implementations JEP/TALN/RECITAL 2022 Nicolas Devatine, Caio Corro, François Yvon

Cet article s’intéresse au transfert cross-lingue d’analyseurs en dépendances et étudie des méthodes pour limiter l’effet potentiellement néfaste pour le transfert de divergences entre l’ordre des mots dans les langues source et cible.

Dependency Parsing Sentence

Latent Group Dropout for Multilingual and Multidomain Machine Translation

no code implementations Findings (NAACL) 2022 Minh-Quang Pham, François Yvon, Josep Crego

Multidomain and multilingual machine translation often rely on parameter sharing strategies, where large portions of the network are meant to capture the commonalities of the tasks at hand, while smaller parts are reserved to model the peculiarities of a language or a domain.

Machine Translation Translation

Bilingual Synchronization: Restoring Translational Relationships with Editing Operations

1 code implementation24 Oct 2022 Jitao Xu, Josep Crego, François Yvon

Machine Translation (MT) is usually viewed as a one-shot process that generates the target language equivalent of some source text from scratch.

Machine Translation Translation +1

BiSync: A Bilingual Editor for Synchronized Monolingual Texts

1 code implementation1 Jun 2023 Josep Crego, Jitao Xu, François Yvon

In our globalized world, a growing number of situations arise where people are required to communicate in one or several foreign languages.

Sentence

Structural generalization in COGS: Supertagging is (almost) all you need

no code implementations21 Oct 2023 Alban Petit, Caio Corro, François Yvon

In many Natural Language Processing applications, neural networks have been found to fail to generalize on out-of-distribution examples.

Semantic Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.