2 code implementations • EMNLP 2020 • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, Alexander Rush
Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks.
9 code implementations • 9 Oct 2019 • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, Alexander M. Rush
Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks.
21 code implementations • 23 Jan 2019 • Thomas Wolf, Victor Sanh, Julien Chaumond, Clement Delangue
We introduce a new approach to generative data-driven dialogue systems (e. g. chatbots) called TransferTransfo which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model.
Ranked #3 on Dialogue Generation on Persona-Chat (using extra training data)
no code implementations • ACL 2018 • Thomas Wolf, Julien Chaumond, Clement Delangue
We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework.
no code implementations • 28 Mar 2018 • Thomas Wolf, Julien Chaumond, Clement Delangue
We consider the task of word-level language modeling and study the possibility of combining hidden-states-based short-term representations with medium-term representations encoded in dynamical weights of a language model.