Search Results for author: Marc'Aurelio Ranzato

Found 59 papers, 31 papers with code

Asynchronous Local-SGD Training for Language Modeling

1 code implementation17 Jan 2024 Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato

Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication.

Distributed Optimization Language Modelling

DiLoCo: Distributed Low-Communication Training of Language Models

no code implementations14 Nov 2023 Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc'Aurelio Ranzato, Arthur Szlam, Jiajun Shen

In this work, we propose a distributed optimization algorithm, Distributed Low-Communication (DiLoCo), that enables training of language models on islands of devices that are poorly connected.

Distributed Optimization

Towards Robust and Efficient Continual Language Learning

no code implementations11 Jul 2023 Adam Fisch, Amal Rannen-Triki, Razvan Pascanu, Jörg Bornschein, Angeliki Lazaridou, Elena Gribovskaya, Marc'Aurelio Ranzato

As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks.

Continual Learning

Towards Compute-Optimal Transfer Learning

no code implementations25 Apr 2023 Massimo Caccia, Alexandre Galashov, Arthur Douillard, Amal Rannen-Triki, Dushyant Rao, Michela Paganini, Laurent Charlin, Marc'Aurelio Ranzato, Razvan Pascanu

The field of transfer learning is undergoing a significant shift with the introduction of large pretrained models which have demonstrated strong adaptability to a variety of downstream tasks.

Computational Efficiency Continual Learning +1

Multi-step Planning for Automated Hyperparameter Optimization with OptFormer

no code implementations10 Oct 2022 Lucio M. Dery, Abram L. Friesen, Nando de Freitas, Marc'Aurelio Ranzato, Yutian Chen

As machine learning permeates more industries and models become more expensive and time consuming to train, the need for efficient automated hyperparameter optimization (HPO) has never been more pressing.

Hyperparameter Optimization

Towards Learning Universal Hyperparameter Optimizers with Transformers

1 code implementation26 May 2022 Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc'Aurelio Ranzato, Sagi Perel, Nando de Freitas

Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution.

Hyperparameter Optimization Meta-Learning

On Anytime Learning at Macroscale

1 code implementation17 Jun 2021 Lucas Caccia, Jing Xu, Myle Ott, Marc'Aurelio Ranzato, Ludovic Denoyer

Practitioners have then to decide how to allocate their computational budget in order to obtain the best performance at any point in time.

Language Modelling Learning Theory

Efficient Continual Learning with Modular Networks and Task-Driven Priors

2 code implementations ICLR 2021 Tom Veniat, Ludovic Denoyer, Marc'Aurelio Ranzato

Finally, we introduce a new modular architecture, whose modules represent atomic skills that can be composed to perform a certain task.

Continual Learning

Few-shot Sequence Learning with Transformers

no code implementations17 Dec 2020 Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples.

Few-Shot Learning

Multi-scale Transformer Language Models

no code implementations1 May 2020 Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau

We investigate multi-scale transformer language models that learn representations of text at multiple scales, and present three different architectures that have an inductive bias to handle the hierarchical nature of language.

Inductive Bias Language Modelling

Residual Energy-Based Models for Text Generation

1 code implementation ICLR 2020 Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

In this work, we investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level.

Language Modelling Machine Translation +2

Residual Energy-Based Models for Text

no code implementations6 Apr 2020 Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam

Current large-scale auto-regressive language models display impressive fluency and can generate convincing text.

Revisiting Self-Training for Neural Sequence Generation

1 code implementation ICLR 2020 Junxian He, Jiatao Gu, Jiajun Shen, Marc'Aurelio Ranzato

In this work, we first empirically show that self-training is able to decently improve the supervised baseline on neural sequence generation tasks.

Machine Translation Text Summarization +1

The Source-Target Domain Mismatch Problem in Machine Translation

no code implementations EACL 2021 Jiajun Shen, Peng-Jen Chen, Matt Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in.

Machine Translation Translation

Large Memory Layers with Product Keys

8 code implementations NeurIPS 2019 Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

In our experiments we consider a dataset with up to 30 billion words, and we plug our memory layer in a state-of-the-art transformer-based architecture.

Language Modelling

Task-Driven Modular Networks for Zero-Shot Compositional Learning

1 code implementation ICCV 2019 Senthil Purushwalkam, Maximilian Nickel, Abhinav Gupta, Marc'Aurelio Ranzato

When extending the evaluation to the generalized setting which accounts also for pairs seen during training, we discover that naive baseline methods perform similarly or better than current approaches.

Attribute Novel Concepts +1

Multiple-Attribute Text Rewriting

no code implementations ICLR 2019 Guillaume Lample, Sandeep Subramanian, Eric Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau

The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style".

Attribute Disentanglement +2

Efficient Lifelong Learning with A-GEM

2 code implementations ICLR 2019 Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task.

Class Incremental Learning

Multiple-Attribute Text Style Transfer

3 code implementations1 Nov 2018 Sandeep Subramanian, Guillaume Lample, Eric Michael Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau

The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style".

Attribute Disentanglement +3

GenEval: A Benchmark Suite for Evaluating Generative Models

no code implementations27 Sep 2018 Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato

In this work, we aim at addressing this problem by introducing a new benchmark evaluation suite, dubbed \textit{GenEval}.

Lightweight Adaptive Mixture of Neural and N-gram Language Models

no code implementations20 Apr 2018 Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave

It is often the case that the best performing language model is an ensemble of a neural language model with n-grams.

Language Modelling

Phrase-Based & Neural Unsupervised Machine Translation

15 code implementations EMNLP 2018 Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs.

NMT Sentence +2

Analyzing Uncertainty in Neural Machine Translation

1 code implementation ICML 2018 Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations.

Machine Translation Sentence +2

Fader Networks:Manipulating Images by Sliding Attributes

no code implementations NeurIPS 2017 Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato

This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.

Attribute

Classical Structured Prediction Losses for Sequence to Sequence Learning

1 code implementation NAACL 2018 Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam.

Abstractive Text Summarization Machine Translation +3

Unsupervised Machine Translation Using Monolingual Corpora Only

15 code implementations ICLR 2018 Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

By learning to reconstruct in both languages from this shared feature space, the model effectively learns to translate without using any labeled data.

Sentence Translation +1

Word Translation Without Parallel Data

19 code implementations ICLR 2018 Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation.

Cross-Lingual Word Embeddings Translation +4

Gradient Episodic Memory for Continual Learning

5 code implementations NeurIPS 2017 David Lopez-Paz, Marc'Aurelio Ranzato

One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge.

Continual Learning Incremental Learning

Fader Networks: Manipulating Images by Sliding Attributes

3 code implementations1 Jun 2017 Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato

This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.

Attribute

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

no code implementations CVPR 2017 Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam

In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks.

Training Language Models Using Target-Propagation

1 code implementation15 Feb 2017 Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps.

Transformation-Based Models of Video Sequences

no code implementations29 Jan 2017 Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala

In this work we propose a simple unsupervised approach for next frame prediction in video.

Learning through Dialogue Interactions by Asking Questions

2 code implementations15 Dec 2016 Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston

A good dialogue agent should have the ability to interact with users by both responding to questions and by asking questions, and importantly to learn from both types of interaction.

reinforcement-learning Reinforcement Learning (RL)

Dialogue Learning With Human-In-The-Loop

2 code implementations29 Nov 2016 Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston

An important aspect of developing conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes.

Question Answering reinforcement-learning +1

Convolutional networks and learning invariant to homogeneous multiplicative scalings

no code implementations26 Jun 2015 Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation.

Classification General Classification +1

Learning Longer Memory in Recurrent Neural Networks

5 code implementations24 Dec 2014 Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, Marc'Aurelio Ranzato

In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent.

Language Modelling

Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

4 code implementations17 Dec 2014 Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio

Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review).

Binary Classification General Classification +1

Web-Scale Training for Face Identification

no code implementations CVPR 2015 Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf

Scaling machine learning methods to very large datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web.

Face Identification Face Recognition +1

On Learning Where To Look

no code implementations24 Apr 2014 Marc'Aurelio Ranzato

Current automatic vision systems face two major challenges: scalability and extreme variability of appearance.

Multi-GPU Training of ConvNets

no code implementations20 Dec 2013 Omry Yadan, Keith Adams, Yaniv Taigman, Marc'Aurelio Ranzato

In this work we evaluate different approaches to parallelize computation of convolutional neural networks across several GPUs.

Learning Factored Representations in a Deep Mixture of Experts

no code implementations16 Dec 2013 David Eigen, Marc'Aurelio Ranzato, Ilya Sutskever

In addition, we see that the different combinations are in use when the model is applied to a dataset of speech monophones.

PANDA: Pose Aligned Networks for Deep Attribute Modeling

1 code implementation CVPR 2014 Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev

We propose a method for inferring human attributes (such as gender, hair style, clothes style, expression, action) from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion.

Attribute Facial Attribute Classification +2

Predicting Parameters in Deep Learning

no code implementations NeurIPS 2013 Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas

We demonstrate that there is significant redundancy in the parameterization of several deep learning models.

Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

no code implementations NeurIPS 2010 George Dahl, Marc'Aurelio Ranzato, Abdel-rahman Mohamed, Geoffrey E. Hinton

Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task.

Generating more realistic images using gated MRF's

no code implementations NeurIPS 2010 Marc'Aurelio Ranzato, Volodymyr Mnih, Geoffrey E. Hinton

Probabilistic models of natural images are usually evaluated by measuring performance on rather indirect tasks, such as denoising and inpainting.

Denoising

Sparse Feature Learning for Deep Belief Networks

no code implementations NeurIPS 2007 Marc'Aurelio Ranzato, Y-Lan Boureau, Yann L. Cun

Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input.

Cannot find the paper you are looking for? You can Submit a new open access paper.