no code implementations • 24 Oct 2023 • Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran
Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.
no code implementations • 23 Oct 2023 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly
Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges.
no code implementations • 13 Oct 2023 • Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio
We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i. e., the depth of the computation graph).
no code implementations • 11 Oct 2023 • Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Josh Susskind, Shuangfei Zhai
In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.}
1 code implementation • 21 Sep 2023 • Stéphane d'Ascoli, Samy Bengio, Josh Susskind, Emmanuel Abbé
In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions.
no code implementations • 20 Sep 2023 • Ali Mousavi, Xin Zhan, He Bai, Peng Shi, Theo Rekatsinas, Benjamin Han, Yunyao Li, Jeff Pound, Josh Susskind, Natalie Schluter, Ihab Ilyas, Navdeep Jaitly
Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation.
no code implementations • 9 Jun 2023 • Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev, Josh Susskind
A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data.
no code implementations • 8 Jun 2023 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind
Diffusion models have demonstrated excellent potential for generating diverse images.
no code implementations • 13 Apr 2023 • Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, Josh Susskind
To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets.
1 code implementation • 11 Mar 2023 • Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind
We show that $\sigma$Reparam provides stability and robustness with respect to the choice of hyperparameters, going so far as enabling training (a) a Vision Transformer {to competitive performance} without warmup, weight decay, layer normalization or adaptive optimizers; (b) deep architectures in machine translation and (c) speech recognition to competitive performance without warmup and adaptive optimizers.
no code implementations • 7 Mar 2023 • Chen Huang, Hanlin Goh, Jiatao Gu, Josh Susskind
We do so by Masked Augmentation Subspace Training (or MAST) to encode in the single feature space the priors from different data augmentations in a factorized way.
no code implementations • 20 Feb 2023 • Jiatao Gu, Alex Trevithick, Kai-En Lin, Josh Susskind, Christian Theobalt, Lingjie Liu, Ravi Ramamoorthi
Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input.
no code implementations • 10 Oct 2022 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Miguel Angel Bautista, Josh Susskind
In this work, we propose f-DM, a generalized family of DMs which allows progressive signal transformation.
1 code implementation • 27 Jul 2022 • Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind
We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
Ranked #1 on
Image Generation
on ARKitScenes
no code implementations • 5 Jul 2022 • Chen Huang, Walter Talbott, Navdeep Jaitly, Josh Susskind
Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer.
no code implementations • ICLR 2022 • Ruixiang Zhang, Shuangfei Zhai, Etai Littwin, Josh Susskind
We show that the low-rank approximation of NFKs derived from unsupervised generative models and supervised learning models gives rise to high-quality compact representations of data, achieving competitive results on a variety of machine learning tasks.
1 code implementation • 2 Dec 2021 • Nitish Srivastava, Walter Talbott, Martin Bertran Lopez, Shuangfei Zhai, Josh Susskind
Modeling the world can benefit robot learning by providing a rich training signal for shaping an agent's latent state space.
no code implementations • NAACL (ACL) 2022 • Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, Josh Susskind
In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.
5 code implementations • 28 May 2021 • Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang Zhang, Josh Susskind
We introduce Attention Free Transformer (AFT), an efficient variant of Transformers that eliminates the need for dot product self attention.
no code implementations • CVPR 2021 • Chen Huang, Shuangfei Zhai, Pengsheng Guo, Josh Susskind
This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision.
1 code implementation • ICML 2020 • Emilien Dupont, Miguel Angel Bautista, Alex Colburn, Aditya Sankar, Carlos Guestrin, Josh Susskind, Qi Shan
We propose a framework for learning neural scene representations directly from images, without 3D supervision.
no code implementations • 15 May 2019 • Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Shih-Yu Sun, Carlos Guestrin, Josh Susskind
In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric.
9 code implementations • CVPR 2017 • Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb
With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations.
Ranked #3 on
Image-to-Image Translation
on Cityscapes Labels-to-Photo
(Per-class Accuracy metric)