Search Results for author: Shuangfei Zhai

Found 42 papers, 15 papers with code

Many-to-many Image Generation with Auto-regressive Diffusion Models

no code implementations • 3 Apr 2024 • Ying Shen, Yizhe Zhang, Shuangfei Zhai, Lifu Huang, Joshua M. Susskind, Jiatao Gu

This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images, offering a scalable solution that obviates the need for task-specific solutions across different multi-image scenarios.

Image Generation Novel View Synthesis

Paper
Add Code

How Far Are We from Intelligent Visual Deductive Reasoning?

1 code implementation • 7 Mar 2024 • Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks.

In-Context Learning Visual Reasoning

Paper
Code

Scalable Pre-training of Large Autoregressive Image Models

2 code implementations • 16 Jan 2024 • Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin

Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.

Ranked #333 on Image Classification on ImageNet (using extra training data)

Image Classification

2,740

Paper
Code

Matryoshka Diffusion Models

no code implementations • 23 Oct 2023 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly

Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges.

Image Generation Zero-shot Generalization

Paper
Add Code

Generative Modeling with Phase Stochastic Bridges

no code implementations • 11 Oct 2023 • Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Josh Susskind, Shuangfei Zhai

In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.}

Image Generation Position

Paper
Add Code

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

no code implementations • 8 Jun 2023 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind

Diffusion models have demonstrated excellent potential for generating diverse images.

Denoising Knowledge Distillation

Paper
Add Code

PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

1 code implementation • NeurIPS 2023 • Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation.

Denoising

Paper
Code

AutoFocusFormer: Image Segmentation off the Grid

1 code implementation • CVPR 2023 • Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin

To achieve this, we propose AutoFocusFormer (AFF), a local-attention transformer image recognition backbone, which performs adaptive downsampling by learning to retain the most important pixels for the task.

Ranked #4 on Instance Segmentation on Cityscapes val

Image Segmentation Instance Segmentation +2

103

Paper
Code

Control3Diff: Learning Controllable 3D Diffusion Models from Single-view Images

no code implementations • 13 Apr 2023 • Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, Josh Susskind

To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets.

3D-Aware Image Synthesis

Paper
Add Code

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

1 code implementation • 11 Mar 2023 • Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind

We show that $\sigma$Reparam provides stability and robustness with respect to the choice of hyperparameters, going so far as enabling training (a) a Vision Transformer {to competitive performance} without warmup, weight decay, layer normalization or adaptive optimizers; (b) deep architectures in machine translation and (c) speech recognition to competitive performance without warmup and adaptive optimizers.

Automatic Speech Recognition Image Classification +6

268

Paper
Code

TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

no code implementations • 7 Mar 2023 • David Berthelot, Arnaud Autef, Jierui Lin, Dian Ang Yap, Shuangfei Zhai, Siyuan Hu, Daniel Zheng, Walter Talbot, Eric Gu

Denoising Diffusion models have demonstrated their proficiency for generative sampling.

Denoising

Paper
Add Code

f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation

no code implementations • 10 Oct 2022 • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Miguel Angel Bautista, Josh Susskind

In this work, we propose f-DM, a generalized family of DMs which allows progressive signal transformation.

Image Generation

Paper
Add Code

GAUDI: A Neural Architect for Immersive 3D Scene Generation

1 code implementation • 27 Jul 2022 • Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.

Ranked #1 on Image Generation on ARKitScenes

Image Generation Scene Generation

605

Paper
Code

Position Prediction as an Effective Pretraining Strategy

1 code implementation • 15 Jul 2022 • Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind

This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives.

Position speech-recognition +1

Paper
Code

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

no code implementations • 10 Jun 2022 • Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss, Joshua Susskind

While common and easily reproduced in more general settings, the Slingshot Mechanism does not follow from any known optimization theories that we are aware of, and can be easily overlooked without an in depth examination.

Inductive Bias

Paper
Add Code

Learning Representation from Neural Fisher Kernel with Low-rank Approximation

no code implementations • ICLR 2022 • Ruixiang Zhang, Shuangfei Zhai, Etai Littwin, Josh Susskind

We show that the low-rank approximation of NFKs derived from unsupervised generative models and supervised learning models gives rise to high-quality compact representations of data, achieving competitive results on a variety of machine learning tasks.

Paper
Add Code

Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models

1 code implementation • 2 Dec 2021 • Nitish Srivastava, Walter Talbott, Martin Bertran Lopez, Shuangfei Zhai, Josh Susskind

Modeling the world can benefit robot learning by providing a rich training signal for shaping an agent's latent state space.

Paper
Code

A Dot Product Attention Free Transformer

no code implementations • 29 Sep 2021 • Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang Zhang, Joshua M. Susskind

We introduce Dot Product Attention Free Transformer (DAFT), an efficient variant of Transformers \citep{transformer} that eliminates the query-key dot product in self attention.

Ranked #620 on Image Classification on ImageNet

Image Classification Language Modelling

Paper
Add Code

Regularized Training of Nearest Neighbor Language Models

no code implementations • NAACL (ACL) 2022 • Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, Josh Susskind

In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.

L2 Regularization Language Modelling

Paper
Add Code

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

no code implementations • 1 Jul 2021 • Etai Littwin, Omid Saremi, Shuangfei Zhai, Vimal Thilak, Hanlin Goh, Joshua M. Susskind, Greg Yang

We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck.

Paper
Add Code

An Attention Free Transformer

6 code implementations • 28 May 2021 • Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang Zhang, Josh Susskind

We introduce Attention Free Transformer (AFT), an efficient variant of Transformers that eliminates the need for dot product self attention.

Position

47,519

Paper
Code

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

2 code implementations • 17 May 2021 • Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.

Offline RL Q-Learning +2

Paper
Code

MetricOpt: Learning to Optimize Black-Box Evaluation Metrics

no code implementations • CVPR 2021 • Chen Huang, Shuangfei Zhai, Pengsheng Guo, Josh Susskind

This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision.

Image Classification Image Retrieval +3

Paper
Add Code

Uncertainty Weighted Offline Reinforcement Learning

no code implementations • 1 Jan 2021 • Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.

Offline RL Q-Learning +2

Paper
Add Code

On the generalization of learning-based 3D reconstruction

no code implementations • 27 Jun 2020 • Miguel Angel Bautista, Walter Talbott, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind

State-of-the-art learning-based monocular 3D reconstruction methods learn priors over object categories on the training set, and as a result struggle to achieve reasonable generalization to object categories unseen during training.

3D Reconstruction Position

Paper
Add Code

Set Distribution Networks: a Generative Model for Sets of Images

no code implementations • 18 Jun 2020 • Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Carlos Guestrin, Josh M. Susskind

We introduce Set Distribution Networks (SDNs), a novel framework that learns to autoencode and freely generate sets.

3D Reconstruction Face Verification

Paper
Add Code

Collegial Ensembles

no code implementations • NeurIPS 2020 • Etai Littwin, Ben Myara, Sima Sabah, Joshua Susskind, Shuangfei Zhai, Oren Golan

Modern neural network performance typically improves as model size increases.

Paper
Add Code

Adversarial Fisher Vectors for Unsupervised Representation Learning

1 code implementation • NeurIPS 2019 • Shuangfei Zhai, Walter Talbott, Carlos Guestrin, Joshua M. Susskind

In contrast to a traditional view where the discriminator learns a constant function when reaching convergence, here we show that it can provide useful information for downstream tasks, e. g., feature extraction for classification.

General Classification Representation Learning

Paper
Code

Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking

no code implementations • 28 Oct 2019 • Alaaeldin El-Nouby, Shuangfei Zhai, Graham W. Taylor, Joshua M. Susskind

Deep neural networks require collecting and annotating large amounts of data to train successfully.

Ranked #44 on Self-Supervised Action Recognition on UCF101

Future prediction Representation Learning +1

Paper
Add Code

Hierarchical Bayes Autoencoders

no code implementations • 25 Sep 2019 • Shuangfei Zhai, Carlos Guestrin, Joshua M. Susskind

During inference time, the HBAE consists of two sampling steps: first a latent code for the input is sampled, and then this code is passed to the conditional generator to output a stochastic reconstruction.

Variational Inference

Paper
Add Code

Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

no code implementations • 15 May 2019 • Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Shih-Yu Sun, Carlos Guestrin, Josh Susskind

In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric.

General Classification Meta-Learning +2

Paper
Add Code

A Deep Learning Approach for Expert Identification in Question Answering Communities

no code implementations • 14 Nov 2017 • Chen Zheng, Shuangfei Zhai, Zhongfei Zhang

This approach uses the convolutional neural network and combines user feature representations with question feature representations to compute scores that the user who gets the highest score is the expert on this question.

Question Answering Sentence

Paper
Add Code

Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records

no code implementations • 6 Sep 2017 • Zhengping Che, Yu Cheng, Shuangfei Zhai, Zhaonan Sun, Yan Liu

We use this generative model together with a convolutional neural network (CNN) based prediction model to improve the onset prediction performance.

Generative Adversarial Network

Paper
Add Code

Structural Correspondence Learning for Cross-lingual Sentiment Classification with One-to-many Mappings

no code implementations • 26 Nov 2016 • Nana Li, Shuangfei Zhai, Zhongfei Zhang, Boying Liu

For simplicity, however, it assumes that the word translation oracle maps each pivot feature in source language to exactly only one word in target language.

Cross-Lingual Sentiment Classification General Classification +4

Paper
Add Code

Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification

1 code implementation • CVPR 2017 • Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, Rogerio Feris

Multi-task learning aims to improve generalization performance of multiple prediction tasks by appropriately sharing relevant information across them.

Attribute General Classification +1

Paper
Code

S3Pool: Pooling with Stochastic Spatial Sampling

4 code implementations • CVPR 2017 • Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, Rogerio Feris

We view the pooling operation in CNNs as a two-step procedure: first, a pooling window (e. g., $2\times 2$) slides over the feature map with stride one which leaves the spatial resolution intact, and second, downsampling is performed by selecting one pixel from each non-overlapping pooling window in an often uniform and deterministic (e. g., top-left) manner.

Data Augmentation Image Classification

Paper
Code

Generative Adversarial Networks as Variational Training of Energy Based Models

1 code implementation • 6 Nov 2016 • Shuangfei Zhai, Yu Cheng, Rogerio Feris, Zhongfei Zhang

We propose VGAN, which works by minimizing a variational lower bound of the negative log likelihood (NLL) of an energy based model (EBM), where the model density $p(\mathbf{x})$ is approximated by a variational distribution $q(\mathbf{x})$ that is easy to sample from.

Paper
Code

Doubly Convolutional Neural Networks

no code implementations • NeurIPS 2016 • Shuangfei Zhai, Yu Cheng, Weining Lu, Zhongfei Zhang

Building large models with parameter sharing accounts for most of the success of deep convolutional neural networks (CNNs).

Image Classification

Paper
Add Code

Deep Structured Energy Based Models for Anomaly Detection

2 code implementations • 25 May 2016 • Shuangfei Zhai, Yu Cheng, Weining Lu, Zhongfei Zhang

In this paper, we attack the anomaly detection problem by directly modeling the data distribution with deep architectures.

Anomaly Detection

Paper
Code

Semisupervised Autoencoder for Sentiment Analysis

no code implementations • 14 Dec 2015 • Shuangfei Zhai, Zhongfei Zhang

In this paper, we investigate the usage of autoencoders in modeling textual data.

Sentiment Analysis

Paper
Add Code

Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs

no code implementations • 14 Dec 2015 • Shuangfei Zhai, Zhongfei Zhang

Matrix factorization (MF) and Autoencoder (AE) are among the most successful approaches of unsupervised learning.

Link Prediction Multiview Learning

Paper
Add Code

Manifold Regularized Discriminative Neural Networks

no code implementations • 19 Nov 2015 • Shuangfei Zhai, Zhongfei Zhang

The first one, named Label-Aware Manifold Regularization, assumes the availability of labels and penalizes large norms of the loss function w. r. t.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.