Search Results for author: Marc'Aurelio Ranzato

Found 59 papers, 31 papers with code

DiPaCo: Distributed Path Composition

no code implementations • 15 Mar 2024 • Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Adhiguna Kuncoro, Yani Donchev, Rachita Chhaparia, Ionel Gog, Marc'Aurelio Ranzato, Jiajun Shen, Arthur Szlam

Progress in machine learning (ML) has been fueled by scaling neural network models.

Language Modelling Model Compression

Paper
Add Code

Asynchronous Local-SGD Training for Language Modeling

1 code implementation • 17 Jan 2024 • Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato

Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication.

Distributed Optimization Language Modelling

Paper
Code

DiLoCo: Distributed Low-Communication Training of Language Models

no code implementations • 14 Nov 2023 • Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc'Aurelio Ranzato, Arthur Szlam, Jiajun Shen

In this work, we propose a distributed optimization algorithm, Distributed Low-Communication (DiLoCo), that enables training of language models on islands of devices that are poorly connected.

Distributed Optimization

Paper
Add Code

Towards Robust and Efficient Continual Language Learning

no code implementations • 11 Jul 2023 • Adam Fisch, Amal Rannen-Triki, Razvan Pascanu, Jörg Bornschein, Angeliki Lazaridou, Elena Gribovskaya, Marc'Aurelio Ranzato

As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks.

Continual Learning

Paper
Add Code

Towards Compute-Optimal Transfer Learning

no code implementations • 25 Apr 2023 • Massimo Caccia, Alexandre Galashov, Arthur Douillard, Amal Rannen-Triki, Dushyant Rao, Michela Paganini, Laurent Charlin, Marc'Aurelio Ranzato, Razvan Pascanu

The field of transfer learning is undergoing a significant shift with the introduction of large pretrained models which have demonstrated strong adaptability to a variety of downstream tasks.

Computational Efficiency Continual Learning +1

Paper
Add Code

NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

1 code implementation • 15 Nov 2022 • Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuang Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de Las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc'Aurelio Ranzato

A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks.

Continual Learning Meta-Learning +4

Paper
Code

Multi-step Planning for Automated Hyperparameter Optimization with OptFormer

no code implementations • 10 Oct 2022 • Lucio M. Dery, Abram L. Friesen, Nando de Freitas, Marc'Aurelio Ranzato, Yutian Chen

As machine learning permeates more industries and models become more expensive and time consuming to train, the need for efficient automated hyperparameter optimization (HPO) has never been more pressing.

Hyperparameter Optimization

Paper
Add Code

Towards Learning Universal Hyperparameter Optimizers with Transformers

1 code implementation • 26 May 2022 • Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc'Aurelio Ranzato, Sagi Perel, Nando de Freitas

Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution.

Hyperparameter Optimization Meta-Learning

Paper
Code

On Anytime Learning at Macroscale

1 code implementation • 17 Jun 2021 • Lucas Caccia, Jing Xu, Myle Ott, Marc'Aurelio Ranzato, Ludovic Denoyer

Practitioners have then to decide how to allocate their computational budget in order to obtain the best performance at any point in time.

Language Modelling Learning Theory

Paper
Code

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

2 code implementations • 6 Jun 2021 • Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc'Aurelio Ranzato, Francisco Guzman, Angela Fan

One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks.

Machine Translation Translation

660

Paper
Code

Efficient Continual Learning with Modular Networks and Task-Driven Priors

2 code implementations • ICLR 2021 • Tom Veniat, Ludovic Denoyer, Marc'Aurelio Ranzato

Finally, we introduce a new modular architecture, whose modules represent atomic skills that can be composed to perform a certain task.

Continual Learning

117

Paper
Code

Few-shot Sequence Learning with Transformers

no code implementations • 17 Dec 2020 • Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples.

Few-Shot Learning

Paper
Add Code

Multi-scale Transformer Language Models

no code implementations • 1 May 2020 • Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau

We investigate multi-scale transformer language models that learn representations of text at multiple scales, and present three different architectures that have an inductive bias to handle the hierarchical nature of language.

Inductive Bias Language Modelling

Paper
Add Code

Residual Energy-Based Models for Text Generation

1 code implementation • ICLR 2020 • Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

In this work, we investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level.

Language Modelling Machine Translation +2

Paper
Code

Residual Energy-Based Models for Text

no code implementations • 6 Apr 2020 • Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam

Current large-scale auto-regressive language models display impressive fluency and can generate convincing text.

Paper
Add Code

Facebook AI's WAT19 Myanmar-English Translation Task Submission

no code implementations • WS 2019 • Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc'Aurelio Ranzato

This paper describes Facebook AI's submission to the WAT 2019 Myanmar-English translation task.

Re-Ranking Translation

Paper
Add Code

Revisiting Self-Training for Neural Sequence Generation

1 code implementation • ICLR 2020 • Junxian He, Jiatao Gu, Jiajun Shen, Marc'Aurelio Ranzato

In this work, we first empirically show that self-training is able to decently improve the supervised baseline on neural sequence generation tasks.

Machine Translation Text Summarization +1

Paper
Code

The Source-Target Domain Mismatch Problem in Machine Translation

no code implementations • EACL 2021 • Jiajun Shen, Peng-Jen Chen, Matt Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in.

Machine Translation Translation

Paper
Add Code

On The Evaluation of Machine Translation Systems Trained With Back-Translation

1 code implementation • ACL 2020 • Sergey Edunov, Myle Ott, Marc'Aurelio Ranzato, Michael Auli

Back-translation is a widely used data augmentation technique which leverages target monolingual data.

Data Augmentation Language Modelling +2

Paper
Code

Large Memory Layers with Product Keys

8 code implementations • NeurIPS 2019 • Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

In our experiments we consider a dataset with up to 30 billion words, and we plug our memory layer in a state-of-the-art transformer-based architecture.

Language Modelling

2,859

Paper
Code

Real or Fake? Learning to Discriminate Machine from Human Generated Text

no code implementations • 7 Jun 2019 • Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc'Aurelio Ranzato, Arthur Szlam

Energy-based models (EBMs), a. k. a.

Language Modelling Text Generation

Paper
Add Code

Task-Driven Modular Networks for Zero-Shot Compositional Learning

1 code implementation • ICCV 2019 • Senthil Purushwalkam, Maximilian Nickel, Abhinav Gupta, Marc'Aurelio Ranzato

When extending the evaluation to the generalized setting which accounts also for pairs seen during training, we discover that naive baseline methods perform similarly or better than current approaches.

Attribute Novel Concepts +1

Paper
Code

Multiple-Attribute Text Rewriting

no code implementations • ICLR 2019 • Guillaume Lample, Sandeep Subramanian, Eric Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau

The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style".

Attribute Disentanglement +2

Paper
Add Code

On Tiny Episodic Memories in Continual Learning

6 code implementations • 27 Feb 2019 • Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K. Dokania, Philip H. S. Torr, Marc'Aurelio Ranzato

But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks.

Ranked #7 on Class Incremental Learning on cifar100

Class Incremental Learning Transfer Learning

1,680

Paper
Code

Mixture Models for Diverse Machine Translation: Tricks of the Trade

1 code implementation • 20 Feb 2019 • Tianxiao Shen, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Machine Translation Text Generation +1

29,334

Paper
Code

The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

2 code implementations • 4 Feb 2019 • Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available.

Machine Translation Translation

660

Paper
Code

Efficient Lifelong Learning with A-GEM

2 code implementations • ICLR 2019 • Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task.

Ranked #6 on Continual Learning on ASC (19 tasks)

Class Incremental Learning

1,680

Paper
Code

Multiple-Attribute Text Style Transfer

3 code implementations • 1 Nov 2018 • Sandeep Subramanian, Guillaume Lample, Eric Michael Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau

The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style".

Attribute Disentanglement +3

222

Paper
Code

GenEval: A Benchmark Suite for Evaluating Generative Models

no code implementations • 27 Sep 2018 • Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato

In this work, we aim at addressing this problem by introducing a new benchmark evaluation suite, dubbed \textit{GenEval}.

Paper
Add Code

Lightweight Adaptive Mixture of Neural and N-gram Language Models

no code implementations • 20 Apr 2018 • Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave

It is often the case that the best performing language model is an ensemble of a neural language model with n-grams.

Language Modelling

Paper
Add Code

Phrase-Based & Neural Unsupervised Machine Translation

15 code implementations • EMNLP 2018 • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs.

Ranked #2 on Machine Translation on WMT2016 English-Russian

NMT Sentence +2

125,725

Paper
Code

Analyzing Uncertainty in Neural Machine Translation

1 code implementation • ICML 2018 • Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations.

Machine Translation Sentence +2

Paper
Code

Fader Networks:Manipulating Images by Sliding Attributes

no code implementations • NeurIPS 2017 • Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato

This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.

Attribute Decoder

Paper
Add Code

Classical Structured Prediction Losses for Sequence to Sequence Learning

1 code implementation • NAACL 2018 • Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam.

Ranked #4 on Machine Translation on IWSLT2015 German-English

Abstractive Text Summarization Machine Translation +3

29,334

Paper
Code

Unsupervised Machine Translation Using Monolingual Corpora Only

15 code implementations • ICLR 2018 • Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

By learning to reconstruct in both languages from this shared feature space, the model effectively learns to translate without using any labeled data.

Ranked #7 on Machine Translation on WMT2016 German-English

Sentence Translation +1

3,168

Paper
Code

Word Translation Without Parallel Data

19 code implementations • ICLR 2018 • Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation.

Ranked #2 on Word Alignment on en-es

Cross-Lingual Word Embeddings Translation +4

3,168

Paper
Code

Gradient Episodic Memory for Continual Learning

5 code implementations • NeurIPS 2017 • David Lopez-Paz, Marc'Aurelio Ranzato

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge.

Continual Learning Incremental Learning

1,680

Paper
Code

Fader Networks: Manipulating Images by Sliding Attributes

3 code implementations • 1 Jun 2017 • Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato

Attribute Decoder

757

Paper
Code

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

no code implementations • CVPR 2017 • Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam

In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks.

Paper
Add Code

Training Language Models Using Target-Propagation

1 code implementation • 15 Feb 2017 • Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps.

Paper
Code

Transformation-Based Models of Video Sequences

no code implementations • 29 Jan 2017 • Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala

In this work we propose a simple unsupervised approach for next frame prediction in video.

Paper
Add Code

Learning through Dialogue Interactions by Asking Questions

2 code implementations • 15 Dec 2016 • Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston

A good dialogue agent should have the ability to interact with users by both responding to questions and by asking questions, and importantly to learn from both types of interaction.

reinforcement-learning Reinforcement Learning (RL)

1,755

Paper
Code

Dialogue Learning With Human-In-The-Loop

2 code implementations • 29 Nov 2016 • Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston

An important aspect of developing conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes.

Question Answering reinforcement-learning +1

1,755

Paper
Code

Sequence Level Training with Recurrent Neural Networks

5 code implementations • 20 Nov 2015 • Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba

Many natural language processing applications use language models to generate text.

Ranked #14 on Machine Translation on IWSLT2015 German-English

Machine Translation

388

Paper
Code

Convolutional networks and learning invariant to homogeneous multiplicative scalings

no code implementations • 26 Jun 2015 • Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation.

Classification General Classification +1

Paper
Add Code

Learning Longer Memory in Recurrent Neural Networks

5 code implementations • 24 Dec 2014 • Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, Marc'Aurelio Ranzato

In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent.

Language Modelling

169

Paper
Code

Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

4 code implementations • 17 Dec 2014 • Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio

Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review).

Binary Classification General Classification +1

246

Paper
Code

Web-Scale Training for Face Identification

no code implementations • CVPR 2015 • Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf

Scaling machine learning methods to very large datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web.

Face Identification Face Recognition +1

Paper
Add Code

On Learning Where To Look

no code implementations • 24 Apr 2014 • Marc'Aurelio Ranzato

Current automatic vision systems face two major challenges: scalability and extreme variability of appearance.

Paper
Add Code

Multi-GPU Training of ConvNets

no code implementations • 20 Dec 2013 • Omry Yadan, Keith Adams, Yaniv Taigman, Marc'Aurelio Ranzato

In this work we evaluate different approaches to parallelize computation of convolutional neural networks across several GPUs.

Paper
Add Code

Learning Factored Representations in a Deep Mixture of Experts

no code implementations • 16 Dec 2013 • David Eigen, Marc'Aurelio Ranzato, Ilya Sutskever

In addition, we see that the different combinations are in use when the model is applied to a dataset of speech monophones.

Paper
Add Code

DeViSE: A Deep Visual-Semantic Embedding Model

no code implementations • NeurIPS 2013 • Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, Tomas Mikolov

Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories.

Ranked #13 on Zero-Shot Action Recognition on Kinetics

Object Object Recognition +1

Paper
Add Code

PANDA: Pose Aligned Networks for Deep Attribute Modeling

1 code implementation • CVPR 2014 • Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev

We propose a method for inferring human attributes (such as gender, hair style, clothes style, expression, action) from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion.

Ranked #7 on Facial Attribute Classification on LFWA

Attribute Facial Attribute Classification +2

Paper
Code

Predicting Parameters in Deep Learning

no code implementations • NeurIPS 2013 • Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas

We demonstrate that there is significant redundancy in the parameterization of several deep learning models.

Paper
Add Code

Large Scale Distributed Deep Networks

no code implementations • NeurIPS 2012 • Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, Andrew Y. Ng

Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance.

Object Recognition speech-recognition +1

Paper
Add Code

Building high-level features using large scale unsupervised learning

1 code implementation • 29 Dec 2011 • Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng

For example, is it possible to learn a face detector using only unlabeled images?

Vocal Bursts Intensity Prediction

158

Paper
Code

Generating more realistic images using gated MRF's

no code implementations • NeurIPS 2010 • Marc'Aurelio Ranzato, Volodymyr Mnih, Geoffrey E. Hinton

Probabilistic models of natural images are usually evaluated by measuring performance on rather indirect tasks, such as denoising and inpainting.

Denoising

Paper
Add Code

Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

no code implementations • NeurIPS 2010 • George Dahl, Marc'Aurelio Ranzato, Abdel-rahman Mohamed, Geoffrey E. Hinton

Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task.

Paper
Add Code

Sparse Feature Learning for Deep Belief Networks

no code implementations • NeurIPS 2007 • Marc'Aurelio Ranzato, Y-Lan Boureau, Yann L. Cun

Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.