no code implementations • 8 Feb 2022 • Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler
Our main contribution is a pseudo-automatic method to discover such groups in foresight by performing causal interventions on simulated scenes.
no code implementations • 29 Sep 2021 • Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler
We verify that the prioritized groups found via intervention are challenging for the object detector and show that retraining with data collected from these groups helps inordinately compared to adding more IID data.
1 code implementation • 22 Apr 2021 • James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse
Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective.
no code implementations • 1 Jan 2021 • Mengye Ren, Eleni Triantafillou, Kuan-Chieh Wang, James Lucas, Jake Snell, Xaq Pitkow, Andreas S. Tolias, Richard Zemel
In this work, we consider a realistic setting where the relationship between examples can change from episode to episode depending on the task context, which is not given to the learner.
no code implementations • 10 Dec 2020 • Mengye Ren, Eleni Triantafillou, Kuan-Chieh Wang, James Lucas, Jake Snell, Xaq Pitkow, Andreas S. Tolias, Richard Zemel
Compared to standard few-shot learning of semantic classes, in which novel classes may be defined by attributes that were relevant at training time, learning new attributes imposes a stiffer challenge.
no code implementations • ICLR 2021 • James Lucas, Mengye Ren, Irene Kameni, Toniann Pitassi, Richard Zemel
Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly.
1 code implementation • NeurIPS 2020 • Xuchan Bao, James Lucas, Sushant Sachdeva, Roger Grosse
Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs).
no code implementations • NeurIPS 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi
Posterior collapse in Variational Autoencoders (VAEs) arises when the variational posterior distribution closely matches the prior for a subset of latent variables.
1 code implementation • NeurIPS 2019 • Qiyang Li, Saminul Haque, Cem Anil, James Lucas, Roger Grosse, Jörn-Henrik Jacobsen
Our BCOP parameterization allows us to train large convolutional networks with provable Lipschitz bounds.
19 code implementations • NeurIPS 2019 • Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba
The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms.
no code implementations • ICLR Workshop DeepGenStruct 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi
Posterior collapse in Variational Autoencoders (VAEs) arises when the variational distribution closely matches the uninformative prior for a subset of latent variables.
1 code implementation • 13 Nov 2018 • Cem Anil, James Lucas, Roger Grosse
We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation.
no code implementations • ICML 2018 • Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel
We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).
1 code implementation • 27 Jun 2018 • Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel
We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).
2 code implementations • ICLR 2019 • James Lucas, Shengyang Sun, Richard Zemel, Roger Grosse
Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed along low curvature directions.