no code implementations • 11 Jul 2023 • Adam Fisch, Amal Rannen-Triki, Razvan Pascanu, Jörg Bornschein, Angeliki Lazaridou, Elena Gribovskaya, Marc'Aurelio Ranzato
As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks.
no code implementations • 25 Apr 2023 • Massimo Caccia, Alexandre Galashov, Arthur Douillard, Amal Rannen-Triki, Dushyant Rao, Michela Paganini, Laurent Charlin, Marc'Aurelio Ranzato, Razvan Pascanu
The field of transfer learning is undergoing a significant shift with the introduction of large pretrained models which have demonstrated strong adaptability to a variety of downstream tasks.
1 code implementation • 15 Nov 2022 • Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuang Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de Las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc'Aurelio Ranzato
A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks.
no code implementations • 10 Oct 2022 • Lucio M. Dery, Abram L. Friesen, Nando de Freitas, Marc'Aurelio Ranzato, Yutian Chen
As machine learning permeates more industries and models become more expensive and time consuming to train, the need for efficient automated hyperparameter optimization (HPO) has never been more pressing.
1 code implementation • 26 May 2022 • Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc'Aurelio Ranzato, Sagi Perel, Nando de Freitas
Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution.
1 code implementation • 17 Jun 2021 • Lucas Caccia, Jing Xu, Myle Ott, Marc'Aurelio Ranzato, Ludovic Denoyer
Practitioners have then to decide how to allocate their computational budget in order to obtain the best performance at any point in time.
2 code implementations • 6 Jun 2021 • Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc'Aurelio Ranzato, Francisco Guzman, Angela Fan
One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks.
2 code implementations • ICLR 2021 • Tom Veniat, Ludovic Denoyer, Marc'Aurelio Ranzato
Finally, we introduce a new modular architecture, whose modules represent atomic skills that can be composed to perform a certain task.
no code implementations • 17 Dec 2020 • Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam
In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples.
no code implementations • 1 May 2020 • Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau
We investigate multi-scale transformer language models that learn representations of text at multiple scales, and present three different architectures that have an inductive bias to handle the hierarchical nature of language.
1 code implementation • ICLR 2020 • Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato
In this work, we investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level.
no code implementations • 6 Apr 2020 • Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam
Current large-scale auto-regressive language models display impressive fluency and can generate convincing text.
no code implementations • WS 2019 • Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc'Aurelio Ranzato
This paper describes Facebook AI's submission to the WAT 2019 Myanmar-English translation task.
1 code implementation • ICLR 2020 • Junxian He, Jiatao Gu, Jiajun Shen, Marc'Aurelio Ranzato
In this work, we first empirically show that self-training is able to decently improve the supervised baseline on neural sequence generation tasks.
no code implementations • EACL 2021 • Jiajun Shen, Peng-Jen Chen, Matt Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc'Aurelio Ranzato
While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in.
1 code implementation • ACL 2020 • Sergey Edunov, Myle Ott, Marc'Aurelio Ranzato, Michael Auli
Back-translation is a widely used data augmentation technique which leverages target monolingual data.
8 code implementations • NeurIPS 2019 • Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
In our experiments we consider a dataset with up to 30 billion words, and we plug our memory layer in a state-of-the-art transformer-based architecture.
no code implementations • 7 Jun 2019 • Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc'Aurelio Ranzato, Arthur Szlam
Energy-based models (EBMs), a. k. a.
1 code implementation • ICCV 2019 • Senthil Purushwalkam, Maximilian Nickel, Abhinav Gupta, Marc'Aurelio Ranzato
When extending the evaluation to the generalized setting which accounts also for pairs seen during training, we discover that naive baseline methods perform similarly or better than current approaches.
no code implementations • ICLR 2019 • Guillaume Lample, Sandeep Subramanian, Eric Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau
The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style".
5 code implementations • 27 Feb 2019 • Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K. Dokania, Philip H. S. Torr, Marc'Aurelio Ranzato
But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks.
Ranked #7 on
Class Incremental Learning
on cifar100
1 code implementation • 20 Feb 2019 • Tianxiao Shen, Myle Ott, Michael Auli, Marc'Aurelio Ranzato
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
2 code implementations • 4 Feb 2019 • Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato
For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available.
2 code implementations • ICLR 2019 • Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny
In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task.
Ranked #6 on
Continual Learning
on ASC (19 tasks)
3 code implementations • 1 Nov 2018 • Sandeep Subramanian, Guillaume Lample, Eric Michael Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau
The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style".
no code implementations • 27 Sep 2018 • Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato
In this work, we aim at addressing this problem by introducing a new benchmark evaluation suite, dubbed \textit{GenEval}.
no code implementations • 20 Apr 2018 • Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave
It is often the case that the best performing language model is an ensemble of a neural language model with n-grams.
15 code implementations • EMNLP 2018 • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato
Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs.
Ranked #2 on
Machine Translation
on WMT2016 English-Russian
1 code implementation • ICML 2018 • Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato
We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations.
no code implementations • NeurIPS 2017 • Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato
This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.
1 code implementation • NAACL 2018 • Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato
There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam.
Ranked #4 on
Machine Translation
on IWSLT2015 German-English
15 code implementations • ICLR 2018 • Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato
By learning to reconstruct in both languages from this shared feature space, the model effectively learns to translate without using any labeled data.
Ranked #6 on
Machine Translation
on WMT2016 German-English
18 code implementations • ICLR 2018 • Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation.
Ranked #2 on
Word Alignment
on en-es
5 code implementations • NeurIPS 2017 • David Lopez-Paz, Marc'Aurelio Ranzato
One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge.
3 code implementations • 1 Jun 2017 • Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato
This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.
no code implementations • CVPR 2017 • Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam
In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks.
1 code implementation • 15 Feb 2017 • Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache
While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps.
no code implementations • 29 Jan 2017 • Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala
In this work we propose a simple unsupervised approach for next frame prediction in video.
2 code implementations • 15 Dec 2016 • Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
A good dialogue agent should have the ability to interact with users by both responding to questions and by asking questions, and importantly to learn from both types of interaction.
2 code implementations • 29 Nov 2016 • Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
An important aspect of developing conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes.
5 code implementations • 20 Nov 2015 • Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba
Many natural language processing applications use language models to generate text.
Ranked #14 on
Machine Translation
on IWSLT2015 German-English
no code implementations • 26 Jun 2015 • Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba
The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation.
5 code implementations • 24 Dec 2014 • Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, Marc'Aurelio Ranzato
In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent.
4 code implementations • 17 Dec 2014 • Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio
Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review).
no code implementations • CVPR 2015 • Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf
Scaling machine learning methods to very large datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web.
no code implementations • 24 Apr 2014 • Marc'Aurelio Ranzato
Current automatic vision systems face two major challenges: scalability and extreme variability of appearance.
no code implementations • 20 Dec 2013 • Omry Yadan, Keith Adams, Yaniv Taigman, Marc'Aurelio Ranzato
In this work we evaluate different approaches to parallelize computation of convolutional neural networks across several GPUs.
no code implementations • 16 Dec 2013 • David Eigen, Marc'Aurelio Ranzato, Ilya Sutskever
In addition, we see that the different combinations are in use when the model is applied to a dataset of speech monophones.
no code implementations • NeurIPS 2013 • Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, Tomas Mikolov
Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories.
Ranked #10 on
Zero-Shot Action Recognition
on Kinetics
1 code implementation • CVPR 2014 • Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev
We propose a method for inferring human attributes (such as gender, hair style, clothes style, expression, action) from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion.
no code implementations • NeurIPS 2013 • Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas
We demonstrate that there is significant redundancy in the parameterization of several deep learning models.
no code implementations • NeurIPS 2012 • Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, Andrew Y. Ng
Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance.
1 code implementation • 29 Dec 2011 • Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng
For example, is it possible to learn a face detector using only unlabeled images?
no code implementations • NeurIPS 2010 • Marc'Aurelio Ranzato, Volodymyr Mnih, Geoffrey E. Hinton
Probabilistic models of natural images are usually evaluated by measuring performance on rather indirect tasks, such as denoising and inpainting.
no code implementations • NeurIPS 2010 • George Dahl, Marc'Aurelio Ranzato, Abdel-rahman Mohamed, Geoffrey E. Hinton
Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task.
no code implementations • NeurIPS 2007 • Marc'Aurelio Ranzato, Y-Lan Boureau, Yann L. Cun
Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input.