Search Results for author: Josh Susskind

Found 26 papers, 10 papers with code

How Far Are We from Intelligent Visual Deductive Reasoning?

1 code implementation7 Mar 2024 Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks.

In-Context Learning Visual Reasoning

What Algorithms can Transformers Learn? A Study in Length Generalization

no code implementations24 Oct 2023 Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.

Matryoshka Diffusion Models

no code implementations23 Oct 2023 Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly

Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges.

Image Generation Zero-shot Generalization

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

no code implementations13 Oct 2023 Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio

We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i. e., the depth of the computation graph).

Retrieval

Generative Modeling with Phase Stochastic Bridges

no code implementations11 Oct 2023 Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Josh Susskind, Shuangfei Zhai

In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.}

Image Generation Position

Boolformer: Symbolic Regression of Logic Functions with Transformers

1 code implementation21 Sep 2023 Stéphane d'Ascoli, Samy Bengio, Josh Susskind, Emmanuel Abbé

In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions.

Binary Classification regression +1

Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation

no code implementations20 Sep 2023 Ali Mousavi, Xin Zhan, He Bai, Peng Shi, Theo Rekatsinas, Benjamin Han, Yunyao Li, Jeff Pound, Josh Susskind, Natalie Schluter, Ihab Ilyas, Navdeep Jaitly

Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation.

Hallucination Knowledge Graphs

Value function estimation using conditional diffusion models for control

no code implementations9 Jun 2023 Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev, Josh Susskind

A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data.

Continuous Control

PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

1 code implementation NeurIPS 2023 Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation.

Denoising

Control3Diff: Learning Controllable 3D Diffusion Models from Single-view Images

no code implementations13 Apr 2023 Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, Josh Susskind

To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets.

3D-Aware Image Synthesis

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

1 code implementation11 Mar 2023 Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind

We show that $\sigma$Reparam provides stability and robustness with respect to the choice of hyperparameters, going so far as enabling training (a) a Vision Transformer {to competitive performance} without warmup, weight decay, layer normalization or adaptive optimizers; (b) deep architectures in machine translation and (c) speech recognition to competitive performance without warmup and adaptive optimizers.

Automatic Speech Recognition Image Classification +6

MAST: Masked Augmentation Subspace Training for Generalizable Self-Supervised Priors

no code implementations7 Mar 2023 Chen Huang, Hanlin Goh, Jiatao Gu, Josh Susskind

We do so by Masked Augmentation Subspace Training (or MAST) to encode in the single feature space the priors from different data augmentations in a factorized way.

Instance Segmentation Self-Supervised Learning +1

NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion

no code implementations20 Feb 2023 Jiatao Gu, Alex Trevithick, Kai-En Lin, Josh Susskind, Christian Theobalt, Lingjie Liu, Ravi Ramamoorthi

Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input.

Novel View Synthesis

f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation

no code implementations10 Oct 2022 Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Miguel Angel Bautista, Josh Susskind

In this work, we propose f-DM, a generalized family of DMs which allows progressive signal transformation.

Image Generation

GAUDI: A Neural Architect for Immersive 3D Scene Generation

1 code implementation27 Jul 2022 Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.

Image Generation Scene Generation

Efficient Representation Learning via Adaptive Context Pooling

no code implementations5 Jul 2022 Chen Huang, Walter Talbott, Navdeep Jaitly, Josh Susskind

Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer.

Representation Learning

Learning Representation from Neural Fisher Kernel with Low-rank Approximation

no code implementations ICLR 2022 Ruixiang Zhang, Shuangfei Zhai, Etai Littwin, Josh Susskind

We show that the low-rank approximation of NFKs derived from unsupervised generative models and supervised learning models gives rise to high-quality compact representations of data, achieving competitive results on a variety of machine learning tasks.

Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models

1 code implementation2 Dec 2021 Nitish Srivastava, Walter Talbott, Martin Bertran Lopez, Shuangfei Zhai, Josh Susskind

Modeling the world can benefit robot learning by providing a rich training signal for shaping an agent's latent state space.

Regularized Training of Nearest Neighbor Language Models

no code implementations NAACL (ACL) 2022 Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, Josh Susskind

In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.

L2 Regularization Language Modelling

An Attention Free Transformer

6 code implementations28 May 2021 Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang Zhang, Josh Susskind

We introduce Attention Free Transformer (AFT), an efficient variant of Transformers that eliminates the need for dot product self attention.

Position

MetricOpt: Learning to Optimize Black-Box Evaluation Metrics

no code implementations CVPR 2021 Chen Huang, Shuangfei Zhai, Pengsheng Guo, Josh Susskind

This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision.

Image Classification Image Retrieval +3

Equivariant Neural Rendering

1 code implementation ICML 2020 Emilien Dupont, Miguel Angel Bautista, Alex Colburn, Aditya Sankar, Carlos Guestrin, Josh Susskind, Qi Shan

We propose a framework for learning neural scene representations directly from images, without 3D supervision.

Neural Rendering

Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

no code implementations15 May 2019 Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Shih-Yu Sun, Carlos Guestrin, Josh Susskind

In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric.

General Classification Meta-Learning +2

Learning from Simulated and Unsupervised Images through Adversarial Training

9 code implementations CVPR 2017 Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb

With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations.

Ranked #3 on Image-to-Image Translation on Cityscapes Labels-to-Photo (Per-class Accuracy metric)

Domain Adaptation Gaze Estimation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.