DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

19 code implementations NeurIPS 2019 Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.

TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

17 code implementations23 Jan 2019 Thomas Wolf, Victor Sanh, Julien Chaumond, Clement Delangue

We introduce a new approach to generative data-driven dialogue systems (e. g. chatbots) called TransferTransfo which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model.

Continuous Learning in a Hierarchical Multiscale Neural Network

no code implementations ACL 2018 Thomas Wolf, Julien Chaumond, Clement Delangue

We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework.

Meta-Learning a Dynamical Language Model

no code implementations28 Mar 2018 Thomas Wolf, Julien Chaumond, Clement Delangue

We consider the task of word-level language modeling and study the possibility of combining hidden-states-based short-term representations with medium-term representations encoded in dynamical weights of a language model.

