no code implementations • 15 Oct 2021 • Jose Javier Gonzalez Ortiz, Jonathan Frankle, Mike Rabbat, Ari Morcos, Nicolas Ballas
As datasets and models become increasingly large, distributed training has become a necessary component to allow deep neural networks to train in reasonable amounts of time.
no code implementations • 10 Jun 2021 • Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Ari Morcos
Finally, we experiment initializing the T-CNN from a partially trained CNN, and find that it reaches better performance than the corresponding hybrid model trained from scratch, while reducing training time.
4 code implementations • 19 Mar 2021 • Stéphane d'Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, Levent Sagun
We initialise the GPSA layers to mimic the locality of convolutional layers, then give each attention head the freedom to escape locality by adjusting a gating parameter regulating the attention paid to position versus content information.
Ranked #261 on
Image Classification
on ImageNet
no code implementations • 22 Oct 2020 • Matthew L. Leavitt, Ari Morcos
Methods for understanding the decisions of and mechanisms underlying deep neural networks (DNNs) typically rely on building intuition by emphasizing sensory or semantic features of individual examples.
no code implementations • 14 Oct 2020 • Matthew L. Leavitt, Ari Morcos
Furthermore, the input-unit gradient is more variable across samples and units in high-selectivity networks compared to low-selectivity networks.
1 code implementation • 6 Oct 2020 • Ramakrishna Vedantam, Arthur Szlam, Maximilian Nickel, Ari Morcos, Brenden Lake
Humans can learn and reason under substantial uncertainty in a space of infinitely many concepts, including structured relational concepts ("a scene with objects that have the same color") and ad-hoc categories defined through goals ("objects that could fall on one's head").
no code implementations • 12 Mar 2020 • Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos
Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task.
no code implementations • ICLR 2021 • Matthew L. Leavitt, Ari Morcos
For ResNet20 trained on CIFAR10 we could reduce class selectivity by a factor of 2. 5 with no impact on test accuracy, and reduce it nearly to zero with only a small ($\sim$2%) drop in test accuracy.
no code implementations • 26 Feb 2020 • Or Litany, Ari Morcos, Srinath Sridhar, Leonidas Guibas, Judy Hoffman
We seek to learn a representation on a large annotated data source that generalizes to a target domain using limited new supervision.
no code implementations • 10 Jan 2020 • Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin
In this work, we investigate the use of standard pruning methods, developed primarily for supervised learning, for networks trained without labels (i. e. on self-supervised tasks).
no code implementations • ICLR 2020 • Erik Wijmans, Julian Straub, Irfan Essa, Dhruv Batra, Judy Hoffman, Ari Morcos
Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures.
6 code implementations • ICLR 2020 • Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra
We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.
Ranked #1 on
PointGoal Navigation
on Gibson PointGoal Navigation
no code implementations • 25 Sep 2019 • Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin
The lottery ticket hypothesis argues that neural networks contain sparse subnetworks, which, if appropriately initialized (the winning tickets), are capable of matching the accuracy of the full network when trained in isolation.
1 code implementation • 31 May 2019 • Yuandong Tian, Tina Jiang, Qucheng Gong, Ari Morcos
We analyze the dynamics of training deep ReLU networks and their implications on generalization capability.