Search Results for author: Michael Rabbat

Found 37 papers, 18 papers with code

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

no code implementations13 Oct 2024 DiJia Su, Sainbayar Sukhbaatar, Michael Rabbat, Yuandong Tian, Qinqing Zheng

In all cases, Dualformer outperforms the corresponding baseline models in both performance and computational efficiency: (1) in slow mode, Dualformer optimally solves unseen 30 x 30 maze navigation tasks 97. 6% of the time, surpassing the Searchformer (trained on data with complete reasoning traces) baseline performance of 93. 3%, while only using 45. 5% fewer reasoning steps; (2) in fast mode, Dualformer completes those tasks with an 80% optimal rate, significantly outperforming the Solution-Only model (trained on solution-only data), which has an optimal rate of only 30%.

Computational Efficiency Math

Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class

no code implementations25 Apr 2024 Mazda Moayeri, Michael Rabbat, Mark Ibrahim, Diane Bouchacourt

We propose a method to encode and account for diversity within a class using inferred attributes, still in the zero-shot setting without retraining.

Diversity Zero-Shot Learning

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

1 code implementation21 Feb 2024 Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul McVay, Michael Rabbat, Yuandong Tian

We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93. 7% of the time, while using up to 26. 8% fewer search steps than the $A^*$ implementation that was used for training initially.

Decision Making Decoder +1

Revisiting Feature Prediction for Learning Visual Representations from Video

1 code implementation arXiv preprint 2024 Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann Lecun, Mahmoud Assran, Nicolas Ballas

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision.

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

2 code implementations12 Sep 2023 Hao-Jun Michael Shi, Tsung-Hsien Lee, Shintaro Iwasaki, Jose Gallego-Posada, Zhijing Li, Kaushik Rangadurai, Dheevatsa Mudigere, Michael Rabbat

It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network.

Stochastic Optimization

Green Federated Learning

no code implementations26 Mar 2023 Ashkan Yousefpour, Shen Guo, Ashish Shenoy, Sayan Ghosh, Pierre Stock, Kiwan Maeng, Schalk-Willem Krüger, Michael Rabbat, Carole-Jean Wu, Ilya Mironov

The rapid progress of AI is fueled by increasingly large and computationally intensive machine learning models and datasets.

Federated Learning

lo-fi: distributed fine-tuning without communication

no code implementations19 Oct 2022 Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos

When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step.

Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning

1 code implementation14 Oct 2022 John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat

Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity.

Federated Learning

The Hidden Uniform Cluster Prior in Self-Supervised Learning

1 code implementation13 Oct 2022 Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas

A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e. g., SimCLR, VICReg, SwAV, MSN).

Clustering Representation Learning +1

Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning

2 code implementations30 Jun 2022 John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat

Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity.

Federated Learning

Positive Unlabeled Contrastive Learning

no code implementations1 Jun 2022 Anish Acharya, Sujay Sanghavi, Li Jing, Bhargav Bhushanam, Dhruv Choudhary, Michael Rabbat, Inderjit Dhillon

We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive or negative).

Contrastive Learning Pseudo Label

FedShuffle: Recipes for Better Use of Local Work in Federated Learning

no code implementations27 Apr 2022 Samuel Horváth, Maziar Sanjabi, Lin Xiao, Peter Richtárik, Michael Rabbat

The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL).

Federated Learning

Federated Learning with Partial Model Personalization

2 code implementations8 Apr 2022 Krishna Pillutla, Kshitiz Malik, Abdelrahman Mohamed, Michael Rabbat, Maziar Sanjabi, Lin Xiao

We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices.

Federated Learning

Stochastic Polyak Stepsize with a Moving Target

no code implementations22 Jun 2021 Robert M. Gower, Aaron Defazio, Michael Rabbat

MOTAPS can be seen as a variant of the Stochastic Polyak (SP) which is also a method that also uses loss values to adjust the stepsize.

Image Classification Translation

Federated Learning with Buffered Asynchronous Aggregation

no code implementations11 Jun 2021 John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Michael Rabbat, Mani Malek, Dzmitry Huba

On the other hand, asynchronous aggregation of client updates in FL (i. e., asynchronous FL) alleviates the scalability issue.

Federated Learning Privacy Preserving

A Closer Look at Codistillation for Distributed Training

no code implementations6 Oct 2020 Shagun Sodhani, Olivier Delalleau, Mahmoud Assran, Koustuv Sinha, Nicolas Ballas, Michael Rabbat

Surprisingly, we find that even at moderate batch sizes, models trained with codistillation can perform as well as models trained with synchronous data-parallel methods, despite using a much weaker synchronization mechanism.

Distributed Computing

Advances in Asynchronous Parallel and Distributed Optimization

no code implementations24 Jun 2020 Mahmoud Assran, Arda Aytekin, Hamid Feyzmahdavian, Mikael Johansson, Michael Rabbat

Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade.

Distributed Optimization

Supervision Accelerates Pre-training in Contrastive Semi-Supervised Learning of Visual Representations

2 code implementations18 Jun 2020 Mahmoud Assran, Nicolas Ballas, Lluis Castrejon, Michael Rabbat

We investigate a strategy for improving the efficiency of contrastive learning of visual representations by leveraging a small amount of supervised information during pre-training.

Contrastive Learning

On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings

no code implementations ICML 2020 Mahmoud Assran, Michael Rabbat

We study Nesterov's accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches).

Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge

1 code implementation6 Jan 2020 Florian Knoll, Tullie Murrell, Anuroop Sriram, Nafissa Yakubova, Jure Zbontar, Michael Rabbat, Aaron Defazio, Matthew J. Muckley, Daniel K. Sodickson, C. Lawrence Zitnick, Michael P. Recht

Conclusion: The challenge led to new developments in machine learning for image reconstruction, provided insight into the current state of the art in the field, and highlighted remaining hurdles for clinical adoption.

BIG-bench Machine Learning Image Reconstruction

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

1 code implementation ICLR 2020 Jianyu Wang, Vinayak Tantia, Nicolas Ballas, Michael Rabbat

We provide theoretical convergence guarantees showing that SlowMo converges to a stationary point of smooth non-convex losses.

Blocking Distributed Optimization +3

Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

1 code implementation NeurIPS 2019 Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Michael Rabbat

We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws.

Deep Reinforcement Learning reinforcement-learning +1

A Graph-CNN for 3D Point Cloud Classification

1 code implementation28 Nov 2018 Yingxue Zhang, Michael Rabbat

Graph convolutional neural networks (Graph-CNNs) extend traditional CNNs to handle data that is supported on a graph.

3D Object Classification Classification +2

Stochastic Gradient Push for Distributed Deep Learning

3 code implementations ICLR 2019 Mahmoud Assran, Nicolas Loizou, Nicolas Ballas, Michael Rabbat

Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes.

Deep Learning General Classification +3

Provably Accelerated Randomized Gossip Algorithms

no code implementations31 Oct 2018 Nicolas Loizou, Michael Rabbat, Peter Richtárik

In this work we present novel provably accelerated gossip algorithms for solving the average consensus problem.

TarMAC: Targeted Multi-Agent Communication

no code implementations ICLR 2019 Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael Rabbat, Joelle Pineau

We propose a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments.

Multi-agent Reinforcement Learning Reinforcement Learning

Learning graphs from data: A signal representation perspective

no code implementations3 Jun 2018 Xiaowen Dong, Dorina Thanou, Michael Rabbat, Pascal Frossard

The construction of a meaningful graph topology plays a crucial role in the effective representation, processing, analysis and visualization of structured data.

Graph Learning

Efficient Large-Scale Similarity Search Using Matrix Factorization

no code implementations CVPR 2016 Ahmet Iscen, Michael Rabbat, Teddy Furon

Experiments with standard image search benchmarks, including the Yahoo100M dataset comprising 100 million images, show that our method gives comparable (and sometimes superior) accuracy compared to exhaustive search while requiring only 10% of the vector operations and memory.

Dictionary Learning Dimensionality Reduction +2

Memory vectors for similarity search in high-dimensional spaces

no code implementations10 Dec 2014 Ahmet Iscen, Teddy Furon, Vincent Gripon, Michael Rabbat, Hervé Jégou

We study an indexing architecture to store and search in a database of high-dimensional vectors from the perspective of statistical signal processing and decision theory.

Image Retrieval Vocal Bursts Intensity Prediction

Combating Corrupt Messages in Sparse Clustered Associative Memories

no code implementations27 Sep 2014 Zhe Yao, Vincent Gripon, Michael Rabbat

In this paper we analyze and extend the neural network based associative memory proposed by Gripon and Berrou.

Retrieval

Storing sequences in binary tournament-based neural networks

no code implementations1 Sep 2014 Xiaoran Jiang, Vincent Gripon, Claude Berrou, Michael Rabbat

An extension to a recently introduced architecture of clique-based neural networks is presented.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.