Search Results for author: Shuangfei Zhai

Found 42 papers, 15 papers with code

Many-to-many Image Generation with Auto-regressive Diffusion Models

no code implementations3 Apr 2024 Ying Shen, Yizhe Zhang, Shuangfei Zhai, Lifu Huang, Joshua M. Susskind, Jiatao Gu

This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images, offering a scalable solution that obviates the need for task-specific solutions across different multi-image scenarios.

Image Generation Novel View Synthesis

How Far Are We from Intelligent Visual Deductive Reasoning?

1 code implementation7 Mar 2024 Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks.

In-Context Learning Visual Reasoning

Scalable Pre-training of Large Autoregressive Image Models

2 code implementations16 Jan 2024 Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin

Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.

Ranked #333 on Image Classification on ImageNet (using extra training data)

Image Classification

Matryoshka Diffusion Models

no code implementations23 Oct 2023 Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly

Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges.

Image Generation Zero-shot Generalization

Generative Modeling with Phase Stochastic Bridges

no code implementations11 Oct 2023 Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Josh Susskind, Shuangfei Zhai

In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.}

Image Generation Position

PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

1 code implementation NeurIPS 2023 Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation.

Denoising

AutoFocusFormer: Image Segmentation off the Grid

1 code implementation CVPR 2023 Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin

To achieve this, we propose AutoFocusFormer (AFF), a local-attention transformer image recognition backbone, which performs adaptive downsampling by learning to retain the most important pixels for the task.

Image Segmentation Instance Segmentation +2

Control3Diff: Learning Controllable 3D Diffusion Models from Single-view Images

no code implementations13 Apr 2023 Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, Josh Susskind

To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets.

3D-Aware Image Synthesis

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

1 code implementation11 Mar 2023 Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind

We show that $\sigma$Reparam provides stability and robustness with respect to the choice of hyperparameters, going so far as enabling training (a) a Vision Transformer {to competitive performance} without warmup, weight decay, layer normalization or adaptive optimizers; (b) deep architectures in machine translation and (c) speech recognition to competitive performance without warmup and adaptive optimizers.

Automatic Speech Recognition Image Classification +6

f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation

no code implementations10 Oct 2022 Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Miguel Angel Bautista, Josh Susskind

In this work, we propose f-DM, a generalized family of DMs which allows progressive signal transformation.

Image Generation

GAUDI: A Neural Architect for Immersive 3D Scene Generation

1 code implementation27 Jul 2022 Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.

Image Generation Scene Generation

Position Prediction as an Effective Pretraining Strategy

1 code implementation15 Jul 2022 Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind

This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives.

Position speech-recognition +1

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

no code implementations10 Jun 2022 Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss, Joshua Susskind

While common and easily reproduced in more general settings, the Slingshot Mechanism does not follow from any known optimization theories that we are aware of, and can be easily overlooked without an in depth examination.

Inductive Bias

Learning Representation from Neural Fisher Kernel with Low-rank Approximation

no code implementations ICLR 2022 Ruixiang Zhang, Shuangfei Zhai, Etai Littwin, Josh Susskind

We show that the low-rank approximation of NFKs derived from unsupervised generative models and supervised learning models gives rise to high-quality compact representations of data, achieving competitive results on a variety of machine learning tasks.

Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models

1 code implementation2 Dec 2021 Nitish Srivastava, Walter Talbott, Martin Bertran Lopez, Shuangfei Zhai, Josh Susskind

Modeling the world can benefit robot learning by providing a rich training signal for shaping an agent's latent state space.

A Dot Product Attention Free Transformer

no code implementations29 Sep 2021 Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang Zhang, Joshua M. Susskind

We introduce Dot Product Attention Free Transformer (DAFT), an efficient variant of Transformers \citep{transformer} that eliminates the query-key dot product in self attention.

Image Classification Language Modelling

Regularized Training of Nearest Neighbor Language Models

no code implementations NAACL (ACL) 2022 Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, Josh Susskind

In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.

L2 Regularization Language Modelling

An Attention Free Transformer

6 code implementations28 May 2021 Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang Zhang, Josh Susskind

We introduce Attention Free Transformer (AFT), an efficient variant of Transformers that eliminates the need for dot product self attention.

Position

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

2 code implementations17 May 2021 Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.

Offline RL Q-Learning +2

MetricOpt: Learning to Optimize Black-Box Evaluation Metrics

no code implementations CVPR 2021 Chen Huang, Shuangfei Zhai, Pengsheng Guo, Josh Susskind

This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision.

Image Classification Image Retrieval +3

Uncertainty Weighted Offline Reinforcement Learning

no code implementations1 Jan 2021 Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.

Offline RL Q-Learning +2

On the generalization of learning-based 3D reconstruction

no code implementations27 Jun 2020 Miguel Angel Bautista, Walter Talbott, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind

State-of-the-art learning-based monocular 3D reconstruction methods learn priors over object categories on the training set, and as a result struggle to achieve reasonable generalization to object categories unseen during training.

3D Reconstruction Position

Collegial Ensembles

no code implementations NeurIPS 2020 Etai Littwin, Ben Myara, Sima Sabah, Joshua Susskind, Shuangfei Zhai, Oren Golan

Modern neural network performance typically improves as model size increases.

Adversarial Fisher Vectors for Unsupervised Representation Learning

1 code implementation NeurIPS 2019 Shuangfei Zhai, Walter Talbott, Carlos Guestrin, Joshua M. Susskind

In contrast to a traditional view where the discriminator learns a constant function when reaching convergence, here we show that it can provide useful information for downstream tasks, e. g., feature extraction for classification.

General Classification Representation Learning

Hierarchical Bayes Autoencoders

no code implementations25 Sep 2019 Shuangfei Zhai, Carlos Guestrin, Joshua M. Susskind

During inference time, the HBAE consists of two sampling steps: first a latent code for the input is sampled, and then this code is passed to the conditional generator to output a stochastic reconstruction.

Variational Inference

Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

no code implementations15 May 2019 Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Shih-Yu Sun, Carlos Guestrin, Josh Susskind

In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric.

General Classification Meta-Learning +2

A Deep Learning Approach for Expert Identification in Question Answering Communities

no code implementations14 Nov 2017 Chen Zheng, Shuangfei Zhai, Zhongfei Zhang

This approach uses the convolutional neural network and combines user feature representations with question feature representations to compute scores that the user who gets the highest score is the expert on this question.

Question Answering Sentence

Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records

no code implementations6 Sep 2017 Zhengping Che, Yu Cheng, Shuangfei Zhai, Zhaonan Sun, Yan Liu

We use this generative model together with a convolutional neural network (CNN) based prediction model to improve the onset prediction performance.

Generative Adversarial Network

Structural Correspondence Learning for Cross-lingual Sentiment Classification with One-to-many Mappings

no code implementations26 Nov 2016 Nana Li, Shuangfei Zhai, Zhongfei Zhang, Boying Liu

For simplicity, however, it assumes that the word translation oracle maps each pivot feature in source language to exactly only one word in target language.

Cross-Lingual Sentiment Classification General Classification +4

S3Pool: Pooling with Stochastic Spatial Sampling

4 code implementations CVPR 2017 Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, Rogerio Feris

We view the pooling operation in CNNs as a two-step procedure: first, a pooling window (e. g., $2\times 2$) slides over the feature map with stride one which leaves the spatial resolution intact, and second, downsampling is performed by selecting one pixel from each non-overlapping pooling window in an often uniform and deterministic (e. g., top-left) manner.

Data Augmentation Image Classification

Generative Adversarial Networks as Variational Training of Energy Based Models

1 code implementation6 Nov 2016 Shuangfei Zhai, Yu Cheng, Rogerio Feris, Zhongfei Zhang

We propose VGAN, which works by minimizing a variational lower bound of the negative log likelihood (NLL) of an energy based model (EBM), where the model density $p(\mathbf{x})$ is approximated by a variational distribution $q(\mathbf{x})$ that is easy to sample from.

Doubly Convolutional Neural Networks

no code implementations NeurIPS 2016 Shuangfei Zhai, Yu Cheng, Weining Lu, Zhongfei Zhang

Building large models with parameter sharing accounts for most of the success of deep convolutional neural networks (CNNs).

Image Classification

Deep Structured Energy Based Models for Anomaly Detection

2 code implementations25 May 2016 Shuangfei Zhai, Yu Cheng, Weining Lu, Zhongfei Zhang

In this paper, we attack the anomaly detection problem by directly modeling the data distribution with deep architectures.

Anomaly Detection

Semisupervised Autoencoder for Sentiment Analysis

no code implementations14 Dec 2015 Shuangfei Zhai, Zhongfei Zhang

In this paper, we investigate the usage of autoencoders in modeling textual data.

Sentiment Analysis

Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs

no code implementations14 Dec 2015 Shuangfei Zhai, Zhongfei Zhang

Matrix factorization (MF) and Autoencoder (AE) are among the most successful approaches of unsupervised learning.

Link Prediction Multiview Learning

Manifold Regularized Discriminative Neural Networks

no code implementations19 Nov 2015 Shuangfei Zhai, Zhongfei Zhang

The first one, named Label-Aware Manifold Regularization, assumes the availability of labels and penalizes large norms of the loss function w. r. t.

Cannot find the paper you are looking for? You can Submit a new open access paper.