Search Results for author: Ari S. Morcos

Found 34 papers, 16 papers with code

Brevity is the soul of wit: Pruning long files for code generation

no code implementations29 Jun 2024 Aaditya K. Singh, Yu Yang, Kushal Tirumala, Mostafa Elhoushi, Ari S. Morcos

Specifically, many have shown that de-duplicating data, or sub-selecting higher quality data, can lead to efficiency or performance improvements.

Code Generation

Effective pruning of web-scale datasets based on complexity of concept clusters

1 code implementation9 Jan 2024 Amro Abbas, Evgenia Rusak, Kushal Tirumala, Wieland Brendel, Kamalika Chaudhuri, Ari S. Morcos

Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training.

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

no code implementations5 Dec 2023 Yu Yang, Aaditya K. Singh, Mostafa Elhoushi, Anas Mahmoud, Kushal Tirumala, Fabian Gloeckle, Baptiste Rozière, Carole-Jean Wu, Ari S. Morcos, Newsha Ardalani

Armed with this knowledge, we devise novel pruning metrics that operate in embedding space to identify and remove low-quality entries in the Stack dataset.

Code Generation

PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning

1 code implementation NeurIPS 2023 Florian Bordes, Shashank Shekhar, Mark Ibrahim, Diane Bouchacourt, Pascal Vincent, Ari S. Morcos

Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation.

Representation Learning

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

1 code implementation16 Mar 2023 Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos

Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time.

Emergence of Maps in the Memories of Blind Navigation Agents

no code implementations30 Jan 2023 Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra

A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial.

Inductive Bias PointGoal Navigation

lo-fi: distributed fine-tuning without communication

no code implementations19 Oct 2022 Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos

When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step.

Beyond neural scaling laws: beating power law scaling via data pruning

3 code implementations29 Jun 2022 Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos

Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning.

Benchmarking

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

5 code implementations10 Mar 2022 Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder.

 Ranked #1 on Image Classification on ImageNet V2 (using extra training data)

Domain Generalization Image Classification +2

Learning Background Invariance Improves Generalization and Robustness in Self-Supervised Learning on ImageNet and Beyond

no code implementations NeurIPS Workshop ImageNet_PPF 2021 Chaitanya Ryali, David J. Schwab, Ari S. Morcos

Through a systematic, comprehensive investigation, we show that background augmentations lead to improved generalization with substantial improvements ($\sim$1-2% on ImageNet) in performance across a spectrum of state-of-the-art self-supervised methods (MoCo-v2, BYOL, SwAV) on a variety of tasks, even enabling performance on par with the supervised baseline.

Data Augmentation Self-Supervised Learning +1

Grounding inductive biases in natural images:invariance stems from variations in data

1 code implementation NeurIPS 2021 Diane Bouchacourt, Mark Ibrahim, Ari S. Morcos

While prior work has focused on synthetic data, we attempt here to characterize the factors of variation in a real dataset, ImageNet, and study the invariance of both standard residual networks and the recently proposed vision transformer with respect to changes in these factors.

Data Augmentation Translation

Grounding inductive biases in natural images: invariance stems from variations in data

1 code implementation NeurIPS 2021 Diane Bouchacourt, Mark Ibrahim, Ari S. Morcos

While prior work has focused on synthetic data, we attempt here to characterize the factors of variation in a real dataset, ImageNet, and study the invariance of both standard residual networks and the recently proposed vision transformer with respect to changes in these factors.

Data Augmentation Translation

Width Transfer: On the (In)variance of Width Optimization

no code implementations24 Apr 2021 Ting-Wu Chin, Diana Marculescu, Ari S. Morcos

In this work, we propose width transfer, a technique that harnesses the assumptions that the optimized widths (or channel counts) are regular across sizes and depths.

Uncovering the impact of learning rate for global magnitude pruning

no code implementations1 Jan 2021 Janice Lan, Rudy Chin, Alexei Baevski, Ari S. Morcos

However, prior work has implicitly assumed that the best training configuration for model performance was also the best configuration for mask discovery.

Reservoir Transformers

no code implementations ACL 2021 Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela

We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated.

BIG-bench Machine Learning Language Modelling +2

Are all negatives created equal in contrastive instance discrimination?

no code implementations13 Oct 2020 Tiffany Tianhui Cai, Jonathan Frankle, David J. Schwab, Ari S. Morcos

Using methodology from MoCo v2 (Chen et al., 2020), we divided negatives by their difficulty for a given query and studied which difficulty ranges were most important for learning useful representations.

Image Classification Self-Supervised Learning

PareCO: Pareto-aware Channel Optimization for Slimmable Neural Networks

no code implementations28 Sep 2020 Rudy Chin, Ari S. Morcos, Diana Marculescu

Slimmable neural networks provide a flexible trade-off front between prediction error and computational cost (such as the number of floating-point operations or FLOPs) with the same storage cost as a single model.

Linking average- and worst-case perturbation robustness via class selectivity and dimensionality

no code implementations28 Sep 2020 Matthew L Leavitt, Ari S. Morcos

We also found that the input-unit gradient was more variable across samples and units in high-selectivity networks compared to low-selectivity networks.

Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks

2 code implementations23 Jul 2020 Ting-Wu Chin, Ari S. Morcos, Diana Marculescu

In this work, we propose a general framework to enable joint optimization for both width configurations and weights of slimmable networks.

On the relationship between class selectivity, dimensionality, and robustness

no code implementations8 Jul 2020 Matthew L. Leavitt, Ari S. Morcos

While the relative trade-offs between sparse and distributed representations in deep neural networks (DNNs) are well-studied, less is known about how these trade-offs apply to representations of semantically-meaningful information.

Plan2Vec: Unsupervised Representation Learning by Latent Plans

1 code implementation7 May 2020 Ge Yang, Amy Zhang, Ari S. Morcos, Joelle Pineau, Pieter Abbeel, Roberto Calandra

In this paper we introduce plan2vec, an unsupervised representation learning approach that is inspired by reinforcement learning.

Motion Planning reinforcement-learning +2

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

4 code implementations ICLR 2021 Jonathan Frankle, David J. Schwab, Ari S. Morcos

A wide variety of deep learning techniques from style transfer to multitask learning rely on training affine transformations of features.

Style Transfer

The Early Phase of Neural Network Training

1 code implementation ICLR 2020 Jonathan Frankle, David J. Schwab, Ari S. Morcos

We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset.

The Generalization-Stability Tradeoff In Neural Network Pruning

no code implementations NeurIPS 2020 Brian R. Bartoldson, Ari S. Morcos, Adrian Barbu, Gordon Erlebacher

Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting.

Network Pruning

One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

2 code implementations NeurIPS 2019 Ari S. Morcos, Haonan Yu, Michela Paganini, Yuandong Tian

Perhaps surprisingly, we found that, within the natural images domain, winning ticket initializations generalized across a variety of datasets, including Fashion MNIST, SVHN, CIFAR-10/100, ImageNet, and Places365, often achieving performance close to that of winning tickets generated on the same dataset.

Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP

no code implementations ICLR 2020 Haonan Yu, Sergey Edunov, Yuandong Tian, Ari S. Morcos

The lottery ticket hypothesis proposes that over-parameterization of deep neural networks (DNNs) aids training by increasing the probability of a "lucky" sub-network initialization being present rather than by helping the optimization process (Frankle & Carbin, 2019).

Image Classification Reinforcement Learning (RL)

Learning to Make Analogies by Contrasting Abstract Relational Structure

2 code implementations ICLR 2019 Felix Hill, Adam Santoro, David G. T. Barrett, Ari S. Morcos, Timothy Lillicrap

Here, we study how analogical reasoning can be induced in neural networks that learn to perceive and reason about raw visual data.

Analyzing biological and artificial neural networks: challenges with opportunities for synergy?

no code implementations31 Oct 2018 David G. T. Barrett, Ari S. Morcos, Jakob H. Macke

We explore opportunities for synergy between the two fields, such as the use of DNNs as in-silico model systems for neuroscience, and how this synergy can lead to new hypotheses about the operating principles of biological neural networks.

Object Recognition

Measuring abstract reasoning in neural networks

2 code implementations ICML 2018 David G. T. Barrett, Felix Hill, Adam Santoro, Ari S. Morcos, Timothy Lillicrap

To succeed at this challenge, models must cope with various generalisation `regimes' in which the training and test data differ in clearly-defined ways.

Insights on representational similarity in neural networks with canonical correlation

2 code implementations NeurIPS 2018 Ari S. Morcos, Maithra Raghu, Samy Bengio

Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training.

Pooling is neither necessary nor sufficient for appropriate deformation stability in CNNs

no code implementations ICLR 2019 Avraham Ruderman, Neil C. Rabinowitz, Ari S. Morcos, Daniel Zoran

In this work, we rigorously test these questions, and find that deformation stability in convolutional networks is more nuanced than it first appears: (1) Deformation invariance is not a binary property, but rather that different tasks require different degrees of deformation stability at different layers.

General Classification Image Classification +1

On the importance of single directions for generalization

1 code implementation ICLR 2018 Ari S. Morcos, David G. T. Barrett, Neil C. Rabinowitz, Matthew Botvinick

Finally, we find that class selectivity is a poor predictor of task importance, suggesting not only that networks which generalize well minimize their dependence on individual units by reducing their selectivity, but also that individually selective units may not be necessary for strong network performance.

Cannot find the paper you are looking for? You can Submit a new open access paper.