1 code implementation • 9 Dec 2024 • Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran, David Berthelot, Jiatao Gu, Huangjie Zheng, Tianrong Chen, Miguel Angel Bautista, Navdeep Jaitly, Josh Susskind
Normalizing Flows (NFs) are likelihood-based models for continuous inputs.
Ranked #12 on
Image Generation
on ImageNet 128x128
no code implementations • 2 Dec 2024 • Qihang Zhang, Shuangfei Zhai, Miguel Angel Bautista, Kevin Miao, Alexander Toshev, Joshua Susskind, Jiatao Gu
Recent advancements in diffusion models have set new benchmarks in image and video generation, enabling realistic visual synthesis across single- and multi-frame contexts.
no code implementations • 2 Nov 2024 • Georgia Gabriela Sampaio, Ruixiang Zhang, Shuangfei Zhai, Jiatao Gu, Josh Susskind, Navdeep Jaitly, Yizhe Zhang
In this work, we focus on the text rendering aspect of these models, which provides a lens for evaluating a generative model's fine-grained instruction-following capabilities.
no code implementations • 10 Oct 2024 • Jiatao Gu, Yuyang Wang, Yizhe Zhang, Qihang Zhang, Dinghuai Zhang, Navdeep Jaitly, Josh Susskind, Shuangfei Zhai
Diffusion models have become the dominant approach for visual generation.
1 code implementation • 2 Jun 2024 • Dinghuai Zhang, Yizhe Zhang, Jiatao Gu, Ruixiang Zhang, Josh Susskind, Navdeep Jaitly, Shuangfei Zhai
Diffusion models have become the de-facto approach for generating visual data, which are trained to match the distribution of the training dataset.
no code implementations • 31 May 2024 • Jiatao Gu, Ying Shen, Shuangfei Zhai, Yizhe Zhang, Navdeep Jaitly, Joshua M. Susskind
Furthermore, we show that Kaleido adheres closely to the guidance provided by the generated latent variables, demonstrating its capability to effectively control and direct the image generation process.
no code implementations • 3 Apr 2024 • Ying Shen, Yizhe Zhang, Shuangfei Zhai, Lifu Huang, Joshua M. Susskind, Jiatao Gu
This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images, offering a scalable solution that obviates the need for task-specific solutions across different multi-image scenarios.
1 code implementation • 7 Mar 2024 • Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly
Vision-Language Models (VLMs) have recently demonstrated incredible strides on diverse vision language tasks.
2 code implementations • 16 Jan 2024 • Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin
Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.
Ranked #361 on
Image Classification
on ImageNet
1 code implementation • 23 Oct 2023 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly
Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges.
1 code implementation • 11 Oct 2023 • Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Joshua Susskind, Shuangfei Zhai
In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.}
no code implementations • 8 Jun 2023 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind
Diffusion models have demonstrated excellent potential for generating diverse images.
1 code implementation • NeurIPS 2023 • Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly
Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation.
1 code implementation • CVPR 2023 • Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin
To achieve this, we propose AutoFocusFormer (AFF), a local-attention transformer image recognition backbone, which performs adaptive downsampling by learning to retain the most important pixels for the task.
Ranked #3 on
Instance Segmentation
on Cityscapes val
no code implementations • 13 Apr 2023 • Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, Josh Susskind
To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets.
1 code implementation • 11 Mar 2023 • Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind
We show that $\sigma$Reparam provides stability and robustness with respect to the choice of hyperparameters, going so far as enabling training (a) a Vision Transformer {to competitive performance} without warmup, weight decay, layer normalization or adaptive optimizers; (b) deep architectures in machine translation and (c) speech recognition to competitive performance without warmup and adaptive optimizers.
no code implementations • 7 Mar 2023 • David Berthelot, Arnaud Autef, Jierui Lin, Dian Ang Yap, Shuangfei Zhai, Siyuan Hu, Daniel Zheng, Walter Talbot, Eric Gu
Denoising Diffusion models have demonstrated their proficiency for generative sampling.
no code implementations • 10 Oct 2022 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Miguel Angel Bautista, Josh Susskind
In this work, we propose f-DM, a generalized family of DMs which allows progressive signal transformation.
1 code implementation • 27 Jul 2022 • Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind
We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
Ranked #1 on
Image Generation
on ARKitScenes
1 code implementation • 15 Jul 2022 • Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind
This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives.
no code implementations • 10 Jun 2022 • Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss, Joshua Susskind
While common and easily reproduced in more general settings, the Slingshot Mechanism does not follow from any known optimization theories that we are aware of, and can be easily overlooked without an in depth examination.
no code implementations • ICLR 2022 • Ruixiang Zhang, Shuangfei Zhai, Etai Littwin, Josh Susskind
We show that the low-rank approximation of NFKs derived from unsupervised generative models and supervised learning models gives rise to high-quality compact representations of data, achieving competitive results on a variety of machine learning tasks.
1 code implementation • 2 Dec 2021 • Nitish Srivastava, Walter Talbott, Martin Bertran Lopez, Shuangfei Zhai, Josh Susskind
Modeling the world can benefit robot learning by providing a rich training signal for shaping an agent's latent state space.
no code implementations • 29 Sep 2021 • Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang Zhang, Joshua M. Susskind
We introduce Dot Product Attention Free Transformer (DAFT), an efficient variant of Transformers \citep{transformer} that eliminates the query-key dot product in self attention.
Ranked #678 on
Image Classification
on ImageNet
no code implementations • NAACL (ACL) 2022 • Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, Josh Susskind
In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.
no code implementations • 1 Jul 2021 • Etai Littwin, Omid Saremi, Shuangfei Zhai, Vimal Thilak, Hanlin Goh, Joshua M. Susskind, Greg Yang
We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck.
11 code implementations • 28 May 2021 • Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang Zhang, Josh Susskind
We introduce Attention Free Transformer (AFT), an efficient variant of Transformers that eliminates the need for dot product self attention.
2 code implementations • 17 May 2021 • Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
no code implementations • CVPR 2021 • Chen Huang, Shuangfei Zhai, Pengsheng Guo, Josh Susskind
This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision.
no code implementations • 1 Jan 2021 • Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
no code implementations • 27 Jun 2020 • Miguel Angel Bautista, Walter Talbott, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind
State-of-the-art learning-based monocular 3D reconstruction methods learn priors over object categories on the training set, and as a result struggle to achieve reasonable generalization to object categories unseen during training.
no code implementations • 18 Jun 2020 • Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Carlos Guestrin, Josh M. Susskind
We introduce Set Distribution Networks (SDNs), a novel framework that learns to autoencode and freely generate sets.
no code implementations • NeurIPS 2020 • Etai Littwin, Ben Myara, Sima Sabah, Joshua Susskind, Shuangfei Zhai, Oren Golan
Modern neural network performance typically improves as model size increases.
1 code implementation • NeurIPS 2019 • Shuangfei Zhai, Walter Talbott, Carlos Guestrin, Joshua M. Susskind
In contrast to a traditional view where the discriminator learns a constant function when reaching convergence, here we show that it can provide useful information for downstream tasks, e. g., feature extraction for classification.
no code implementations • 28 Oct 2019 • Alaaeldin El-Nouby, Shuangfei Zhai, Graham W. Taylor, Joshua M. Susskind
Deep neural networks require collecting and annotating large amounts of data to train successfully.
Ranked #44 on
Self-Supervised Action Recognition
on UCF101
no code implementations • 25 Sep 2019 • Shuangfei Zhai, Carlos Guestrin, Joshua M. Susskind
During inference time, the HBAE consists of two sampling steps: first a latent code for the input is sampled, and then this code is passed to the conditional generator to output a stochastic reconstruction.
no code implementations • 15 May 2019 • Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Shih-Yu Sun, Carlos Guestrin, Josh Susskind
In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric.
no code implementations • 14 Nov 2017 • Chen Zheng, Shuangfei Zhai, Zhongfei Zhang
This approach uses the convolutional neural network and combines user feature representations with question feature representations to compute scores that the user who gets the highest score is the expert on this question.
no code implementations • 6 Sep 2017 • Zhengping Che, Yu Cheng, Shuangfei Zhai, Zhaonan Sun, Yan Liu
We use this generative model together with a convolutional neural network (CNN) based prediction model to improve the onset prediction performance.
no code implementations • 26 Nov 2016 • Nana Li, Shuangfei Zhai, Zhongfei Zhang, Boying Liu
For simplicity, however, it assumes that the word translation oracle maps each pivot feature in source language to exactly only one word in target language.
Cross-Lingual Sentiment Classification
General Classification
+4
1 code implementation • CVPR 2017 • Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, Rogerio Feris
Multi-task learning aims to improve generalization performance of multiple prediction tasks by appropriately sharing relevant information across them.
4 code implementations • CVPR 2017 • Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, Rogerio Feris
We view the pooling operation in CNNs as a two-step procedure: first, a pooling window (e. g., $2\times 2$) slides over the feature map with stride one which leaves the spatial resolution intact, and second, downsampling is performed by selecting one pixel from each non-overlapping pooling window in an often uniform and deterministic (e. g., top-left) manner.
1 code implementation • 6 Nov 2016 • Shuangfei Zhai, Yu Cheng, Rogerio Feris, Zhongfei Zhang
We propose VGAN, which works by minimizing a variational lower bound of the negative log likelihood (NLL) of an energy based model (EBM), where the model density $p(\mathbf{x})$ is approximated by a variational distribution $q(\mathbf{x})$ that is easy to sample from.
no code implementations • NeurIPS 2016 • Shuangfei Zhai, Yu Cheng, Weining Lu, Zhongfei Zhang
Building large models with parameter sharing accounts for most of the success of deep convolutional neural networks (CNNs).
2 code implementations • 25 May 2016 • Shuangfei Zhai, Yu Cheng, Weining Lu, Zhongfei Zhang
In this paper, we attack the anomaly detection problem by directly modeling the data distribution with deep architectures.
no code implementations • 14 Dec 2015 • Shuangfei Zhai, Zhongfei Zhang
In this paper, we investigate the usage of autoencoders in modeling textual data.
no code implementations • 14 Dec 2015 • Shuangfei Zhai, Zhongfei Zhang
Matrix factorization (MF) and Autoencoder (AE) are among the most successful approaches of unsupervised learning.
no code implementations • 19 Nov 2015 • Shuangfei Zhai, Zhongfei Zhang
The first one, named Label-Aware Manifold Regularization, assumes the availability of labels and penalizes large norms of the loss function w. r. t.