Search Results for author: Mahmoud Assran

Found 13 papers, 8 papers with code

Predicting masked tokens in stochastic locations improves masked image modeling

no code implementations31 Jul 2023 Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann Lecun

Specifically, we condition the model on stochastic masked token positions to guide the model toward learning features that are more robust to location uncertainties.

Language Modelling Masked Language Modeling +3

The Hidden Uniform Cluster Prior in Self-Supervised Learning

no code implementations13 Oct 2022 Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas

A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e. g., SimCLR, VICReg, SwAV, MSN).

Clustering Representation Learning +1

Memory Augmented Optimizers for Deep Learning

2 code implementations ICLR 2022 Paul-Aymeric McRae, Prasanna Parthasarathi, Mahmoud Assran, Sarath Chandar

Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates.

A Closer Look at Codistillation for Distributed Training

no code implementations6 Oct 2020 Shagun Sodhani, Olivier Delalleau, Mahmoud Assran, Koustuv Sinha, Nicolas Ballas, Michael Rabbat

Surprisingly, we find that even at moderate batch sizes, models trained with codistillation can perform as well as models trained with synchronous data-parallel methods, despite using a much weaker synchronization mechanism.

Distributed Computing

Advances in Asynchronous Parallel and Distributed Optimization

no code implementations24 Jun 2020 Mahmoud Assran, Arda Aytekin, Hamid Feyzmahdavian, Mikael Johansson, Michael Rabbat

Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade.

Distributed Optimization

Supervision Accelerates Pre-training in Contrastive Semi-Supervised Learning of Visual Representations

2 code implementations18 Jun 2020 Mahmoud Assran, Nicolas Ballas, Lluis Castrejon, Michael Rabbat

We investigate a strategy for improving the efficiency of contrastive learning of visual representations by leveraging a small amount of supervised information during pre-training.

Contrastive Learning

On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings

no code implementations ICML 2020 Mahmoud Assran, Michael Rabbat

We study Nesterov's accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches).

Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

1 code implementation NeurIPS 2019 Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Michael Rabbat

We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws.

reinforcement-learning Reinforcement Learning (RL)

Stochastic Gradient Push for Distributed Deep Learning

2 code implementations ICLR 2019 Mahmoud Assran, Nicolas Loizou, Nicolas Ballas, Michael Rabbat

Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes.

General Classification Image Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.