1 code implementation • ICLR 2019 • James Lucas, Shengyang Sun, Richard Zemel, Roger Grosse
Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed along low curvature directions.
1 code implementation • 27 Jun 2018 • Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel
We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).
no code implementations • ICML 2018 • Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel
We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN).
1 code implementation • 13 Nov 2018 • Cem Anil, James Lucas, Roger Grosse
We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation.
no code implementations • ICLR Workshop DeepGenStruct 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi
Posterior collapse in Variational Autoencoders (VAEs) arises when the variational distribution closely matches the uninformative prior for a subset of latent variables.
19 code implementations • NeurIPS 2019 • Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba
The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms.
1 code implementation • NeurIPS 2019 • Qiyang Li, Saminul Haque, Cem Anil, James Lucas, Roger Grosse, Jörn-Henrik Jacobsen
Our BCOP parameterization allows us to train large convolutional networks with provable Lipschitz bounds.
no code implementations • NeurIPS 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi
Posterior collapse in Variational Autoencoders (VAEs) arises when the variational posterior distribution closely matches the prior for a subset of latent variables.
1 code implementation • NeurIPS 2020 • Xuchan Bao, James Lucas, Sushant Sachdeva, Roger Grosse
Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs).
no code implementations • ICLR 2021 • James Lucas, Mengye Ren, Irene Kameni, Toniann Pitassi, Richard Zemel
Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly.
no code implementations • 10 Dec 2020 • Mengye Ren, Eleni Triantafillou, Kuan-Chieh Wang, James Lucas, Jake Snell, Xaq Pitkow, Andreas S. Tolias, Richard Zemel
Despite impressive progress in deep learning, generalizing far beyond the training distribution is an important open challenge.
no code implementations • 1 Jan 2021 • Mengye Ren, Eleni Triantafillou, Kuan-Chieh Wang, James Lucas, Jake Snell, Xaq Pitkow, Andreas S. Tolias, Richard Zemel
In this work, we consider a realistic setting where the relationship between examples can change from episode to episode depending on the task context, which is not given to the learner.
1 code implementation • 22 Apr 2021 • James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse
Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective.
no code implementations • 29 Sep 2021 • Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler
We verify that the prioritized groups found via intervention are challenging for the object detector and show that retraining with data collected from these groups helps inordinately compared to adding more IID data.
no code implementations • 8 Feb 2022 • Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler
Our main contribution is a pseudo-automatic method to discover such groups in foresight by performing causal interventions on simulated scenes.
no code implementations • CVPR 2022 • Rafid Mahmood, James Lucas, David Acuna, Daiqing Li, Jonah Philion, Jose M. Alvarez, Zhiding Yu, Sanja Fidler, Marc T. Law
Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance?
no code implementations • 3 Oct 2022 • Rafid Mahmood, James Lucas, Jose M. Alvarez, Sanja Fidler, Marc T. Law
Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect.
1 code implementation • 5 Oct 2022 • A. Michael Carrell, Neil Mallinar, James Lucas, Preetum Nakkiran
We propose a systematic way to study the calibration error: by decomposing it into (1) calibration error on the train set, and (2) the calibration generalization gap.
no code implementations • 9 Feb 2023 • Viraj Prabhu, David Acuna, Andrew Liao, Rafid Mahmood, Marc T. Law, Judy Hoffman, Sanja Fidler, James Lucas
Sim2Real domain adaptation (DA) research focuses on the constrained setting of adapting from a labeled synthetic source domain to an unlabeled or sparsely labeled real target domain.
no code implementations • ICCV 2023 • Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas
Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields.
no code implementations • 7 Dec 2023 • Derek Lim, Haggai Maron, Marc T. Law, Jonathan Lorraine, James Lucas
However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging.
no code implementations • 22 Mar 2024 • Kevin Xie, Jonathan Lorraine, Tianshi Cao, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng
Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt.