Search Results for author: Zhe Gan

Found 105 papers, 53 papers with code

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

1 code implementation14 Jun 2022 Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang

In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling (MLM) is used as the common interface for all pre-training and downstream tasks.

Language Modelling Masked Language Modeling +5

GIT: A Generative Image-to-text Transformer for Vision and Language

no code implementations27 May 2022 JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.

Image Classification Language Modelling +4

K-LITE: Learning Transferable Visual Models with External Knowledge

no code implementations20 Apr 2022 Sheng Shen, Chunyuan Li, Xiaowei Hu, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Anna Rohrbach, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Jianfeng Gao

In this paper, we propose K-LITE (Knowledge-augmented Language-Image Training and Evaluation), a simple strategy to leverage external knowledge to build transferable visual systems: In training, it enriches entities in natural language with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that can understand both visual concepts and their knowledge; In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models.

Image Classification object-detection +2

Injecting Semantic Concepts into End-to-End Image Captioning

no code implementations CVPR 2022 Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.

Image Captioning

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

1 code implementation8 Dec 2021 Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang

Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs?

Language Modelling Visual Question Answering +1

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

1 code implementation CVPR 2022 Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e. g., video question answering).

Question Answering Video Captioning +2

Scaling Up Vision-Language Pre-training for Image Captioning

no code implementations CVPR 2022 Xiaowei Hu, Zhe Gan, JianFeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning.

Image Captioning

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

1 code implementation24 Nov 2021 Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.

Question Answering Text to Video Retrieval +3

Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling

no code implementations23 Nov 2021 Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we propose UNICORN, a vision-language (VL) model that unifies text generation and bounding box prediction into a single architecture.

Image Captioning Language Modelling +5

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

no code implementations19 Nov 2021 JianFeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we propose a single UniFied transfOrmer (UFO), which is capable of processing either unimodal inputs (e. g., image or language) or multimodal inputs (e. g., the concatenation of the image and the question), for vision-language (VL) representation learning.

Image Captioning Language Modelling +6

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

no code implementations4 Nov 2021 Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li

In this paper, we present Adversarial GLUE (AdvGLUE), a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.

Adversarial Attack Adversarial Robustness +1

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

no code implementations10 Sep 2021 Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang

To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA.

Image Captioning Question Answering +2

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

1 code implementation NeurIPS 2021 Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

For example, our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0. 28% top-1 accuracy, and meanwhile enjoys 49. 32% FLOPs and 4. 40% running time savings.

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models

no code implementations ICCV 2021 Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu

We hope our Adversarial VQA dataset can shed new light on robustness study in the community and serve as a valuable benchmark for future work.

Data Augmentation Question Answering +2

The Elastic Lottery Ticket Hypothesis

1 code implementation NeurIPS 2021 Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu, Zhangyang Wang

Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP.

Adversarial Feature Augmentation and Normalization for Visual Recognition

1 code implementation22 Mar 2021 Tianlong Chen, Yu Cheng, Zhe Gan, JianFeng Wang, Lijuan Wang, Zhangyang Wang, Jingjing Liu

Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.

Classification Data Augmentation +1

Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning

1 code implementation ICLR 2021 Siyang Yuan, Pengyu Cheng, Ruiyi Zhang, Weituo Hao, Zhe Gan, Lawrence Carin

Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to generate speech as if it came from another (target) speaker.

Representation Learning Style Transfer +1

Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective

1 code implementation NeurIPS 2021 Tianlong Chen, Yu Cheng, Zhe Gan, Jingjing Liu, Zhangyang Wang

Training generative adversarial networks (GANs) with limited real image data generally results in deteriorated performance and collapsed models.

Data Augmentation

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

1 code implementation CVPR 2021 Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu

Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms (or is on par with) existing methods that exploit full-length videos, suggesting that end-to-end learning with just a few sparsely sampled clips is often more accurate than using densely extracted offline features from full-length videos, proving the proverbial less-is-more principle.

Ranked #4 on Visual Question Answering on MSRVTT-QA (using extra training data)

Question Answering Text to Video Retrieval +3

ALFA: Adversarial Feature Augmentation for Enhanced Image Recognition

no code implementations1 Jan 2021 Tianlong Chen, Yu Cheng, Zhe Gan, Yu Hu, Zhangyang Wang, Jingjing Liu

Adversarial training is an effective method to combat adversarial attacks in order to create robust neural networks.

Adversarial Masking: Towards Understanding Robustness Trade-off for Generalization

no code implementations1 Jan 2021 Minhao Cheng, Zhe Gan, Yu Cheng, Shuohang Wang, Cho-Jui Hsieh, Jingjing Liu

By incorporating different feature maps after the masking, we can distill better features to help model generalization.

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

1 code implementation ACL 2021 Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu

Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks.

Model Compression

Wasserstein Contrastive Representation Distillation

no code implementations CVPR 2021 Liqun Chen, Dong Wang, Zhe Gan, Jingjing Liu, Ricardo Henao, Lawrence Carin

The primary goal of knowledge distillation (KD) is to encapsulate the information of a model learned from a teacher network into a student network, with the latter being more compact than the former.

Contrastive Learning Knowledge Distillation +2

A Closer Look at the Robustness of Vision-and-Language Pre-trained Models

no code implementations15 Dec 2020 Linjie Li, Zhe Gan, Jingjing Liu

Large-scale pre-trained multimodal transformers, such as ViLBERT and UNITER, have propelled the state of the art in vision-and-language (V+L) research to a new level.

Cross-Thought for Sentence Encoder Pre-training

1 code implementation EMNLP 2020 Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jing Jiang, Jingjing Liu

In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering.

Information Retrieval Language Modelling +3

Multi-Fact Correction in Abstractive Text Summarization

no code implementations EMNLP 2020 Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung, Jingjing Liu

Pre-trained neural abstractive summarization systems have dominated extractive strategies on news summarization performance, at least in terms of ROUGE.

Abstractive Text Summarization News Summarization +1

Efficient Robust Training via Backward Smoothing

1 code implementation3 Oct 2020 Jinghui Chen, Yu Cheng, Zhe Gan, Quanquan Gu, Jingjing Liu

In this work, we develop a new understanding towards Fast Adversarial Training, by viewing random initialization as performing randomized smoothing for better optimization of the inner maximization problem.

MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network

no code implementations3 Oct 2020 Yi Wei, Zhe Gan, Wenbo Li, Siwei Lyu, Ming-Ching Chang, Lei Zhang, Jianfeng Gao, Pengchuan Zhang

We present Mask-guided Generative Adversarial Network (MagGAN) for high-resolution face attribute editing, in which semantic facial masks from a pre-trained face parser are used to guide the fine-grained image editing process.

Contrastive Distillation on Intermediate Representations for Language Model Compression

1 code implementation EMNLP 2020 Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang, Jingjing Liu

Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one.

Knowledge Distillation Language Modelling +1

Accelerating Real-Time Question Answering via Question Generation

no code implementations10 Sep 2020 Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu, Chenguang Zhu

Although deep neural networks have achieved tremendous success for question answering (QA), they are still suffering from heavy computational and energy cost for real product deployment.

Data Augmentation Multi-Task Learning +2

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

1 code implementation10 Sep 2020 Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu

During inference, the model makes predictions based on the text input in the target language and its translation in the source language.


Graph Optimal Transport for Cross-Domain Alignment

1 code implementation ICML 2020 Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin, Jingjing Liu

In GOT, cross-domain alignment is formulated as a graph matching problem, by representing entities into a dynamically-constructed graph.

Graph Matching Image Captioning +6

MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients

1 code implementation21 Jun 2020 Chen Zhu, Yu Cheng, Zhe Gan, Furong Huang, Jingjing Liu, Tom Goldstein

Adaptive gradient methods such as RMSProp and Adam use exponential moving estimate of the squared gradient to compute adaptive step sizes, achieving better convergence than SGD in face of noisy objectives.

Image Classification Machine Translation +3

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

no code implementations ECCV 2020 Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu

To reveal the secrets behind the scene of these powerful models, we present VALUE (Vision-And-Language Understanding Evaluation), a set of meticulously designed probing tasks (e. g., Visual Coreference Resolution, Visual Relation Detection, Linguistic Probing Tasks) generalizable to standard pre-trained V+L models, aiming to decipher the inner workings of multimodal pre-training (e. g., the implicit knowledge garnered in individual attention heads, the inherent cross-modal alignment learned through contextualized multimodal embeddings).

Coreference Resolution

POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training

1 code implementation EMNLP 2020 Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, Bill Dolan

Large-scale pre-trained language models, such as BERT and GPT-2, have achieved excellent performance in language representation learning and free-form text generation.

Language Modelling Representation Learning +1

APo-VAE: Text Generation in Hyperbolic Space

no code implementations NAACL 2021 Shuyang Dai, Zhe Gan, Yu Cheng, Chenyang Tao, Lawrence Carin, Jingjing Liu

In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations.

Language Modelling Response Generation +1

Contextual Text Style Transfer

no code implementations Findings of the Association for Computational Linguistics 2020 Yu Cheng, Zhe Gan, Yizhe Zhang, Oussama Elachqar, Dianqi Li, Jingjing Liu

To realize high-quality style transfer with natural context preservation, we propose a Context-Aware Style Transfer (CAST) model, which uses two separate encoders for each input sentence and its surrounding context.

Style Transfer Text Style Transfer +1

BachGAN: High-Resolution Image Synthesis from Salient Object Layout

1 code implementation CVPR 2020 Yandong Li, Yu Cheng, Zhe Gan, Licheng Yu, Liqiang Wang, Jingjing Liu

We propose a new task towards more practical application for image generation - high-quality image synthesis from salient object layout.

Image Generation

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

1 code implementation CVPR 2020 Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang, Jingjing Liu

We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.

Graph-Driven Generative Models for Heterogeneous Multi-Task Learning

no code implementations20 Nov 2019 Wenlin Wang, Hongteng Xu, Zhe Gan, Bai Li, Guoyin Wang, Liqun Chen, Qian Yang, Wenqi Wang, Lawrence Carin

We propose a novel graph-driven generative model, that unifies multiple heterogeneous learning tasks into the same framework.

Multi-Task Learning Type prediction

Distilling Knowledge Learned in BERT for Text Generation

1 code implementation ACL 2020 Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, Jingjing Liu

Experiments show that the proposed approach significantly outperforms strong Transformer baselines on multiple language generation tasks such as machine translation and text summarization.

Language Modelling Machine Translation +4

Discourse-Aware Neural Extractive Text Summarization

1 code implementation ACL 2020 Jiacheng Xu, Zhe Gan, Yu Cheng, Jingjing Liu

Recently BERT has been adopted for document encoding in state-of-the-art text summarization models.

Extractive Text Summarization

Meta Module Network for Compositional Visual Reasoning

1 code implementation8 Oct 2019 Wenhu Chen, Zhe Gan, Linjie Li, Yu Cheng, William Wang, Jingjing Liu

To design a more powerful NMN architecture for practical use, we propose Meta Module Network (MMN) centered on a novel meta module, which can take in function recipes and morph into diverse instance modules dynamically.

Visual Reasoning

UNITER: UNiversal Image-TExt Representation Learning

5 code implementations ECCV 2020 Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

Language Modelling Masked Language Modeling +9

UNITER: Learning UNiversal Image-TExt Representations

no code implementations25 Sep 2019 Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding.

Language Modelling Masked Language Modeling +7

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

2 code implementations ICLR 2020 Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, Jingjing Liu

Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models.

Natural Language Understanding Overall - Test +1

What Makes A Good Story? Designing Composite Rewards for Visual Storytelling

1 code implementation11 Sep 2019 Junjie Hu, Yu Cheng, Zhe Gan, Jingjing Liu, Jianfeng Gao, Graham Neubig

Previous storytelling approaches mostly focused on optimizing traditional metrics such as BLEU, ROUGE and CIDEr.

reinforcement-learning Visual Storytelling

Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation

no code implementations11 Sep 2019 Shuyang Dai, Yu Cheng, Yizhe Zhang, Zhe Gan, Jingjing Liu, Lawrence Carin

Recent unsupervised approaches to domain adaptation primarily focus on minimizing the gap between the source and the target domains through refining the feature generator, in order to learn a better alignment between the two domains.

Unsupervised Domain Adaptation

Patient Knowledge Distillation for BERT Model Compression

2 code implementations IJCNLP 2019 Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu

Pre-trained language models such as BERT have proven to be highly effective for natural language processing (NLP) tasks.

Knowledge Distillation Model Compression +1

Adversarial Domain Adaptation for Machine Reading Comprehension

no code implementations IJCNLP 2019 Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang

In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain.

Machine Reading Comprehension Representation Learning +1

Relation-Aware Graph Attention Network for Visual Question Answering

1 code implementation ICCV 2019 Linjie Li, Zhe Gan, Yu Cheng, Jingjing Liu

In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the interactive dynamics between different objects.

Graph Attention Question Answering +2

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

1 code implementation CVPR 2019 Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa

We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et.

Vision and Language Navigation Vision-Language Navigation

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

no code implementations ACL 2019 Zhe Gan, Yu Cheng, Ahmed El Kholy, Linjie Li, Jingjing Liu, Jianfeng Gao

This paper presents a new model for visual dialog, Recurrent Dual Attention Network (ReDAN), using multi-step reasoning to answer a series of questions about an image.

Question Answering Visual Dialog

Sequential Attention GAN for Interactive Image Editing

no code implementations20 Dec 2018 Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu, Jianfeng Gao

The main challenges in this sequential and interactive image generation task are two-fold: 1) contextual consistency between a generated image and the provided textual description; 2) step-by-step region-level modification to maintain visual consistency across the generated image sequence in each session.

Text-to-Image Generation

Sequence Generation with Guider Network

no code implementations2 Nov 2018 Ruiyi Zhang, Changyou Chen, Zhe Gan, Wenlin Wang, Liqun Chen, Dinghan Shen, Guoyin Wang, Lawrence Carin

Sequence generation with reinforcement learning (RL) has received significant attention recently.


JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets

2 code implementations ICML 2018 Yunchen Pu, Shuyang Dai, Zhe Gan, Wei-Yao Wang, Guoyin Wang, Yizhe Zhang, Ricardo Henao, Lawrence Carin

Distinct from most existing approaches, that only learn conditional distributions, the proposed model aims to learn a joint distribution of multiple random variables (domains).

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

no code implementations21 May 2018 Qiuyuan Huang, Zhe Gan, Asli Celikyilmaz, Dapeng Wu, Jian-Feng Wang, Xiaodong He

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task.

reinforcement-learning Story Generation +1

Multi-Label Learning from Medical Plain Text with Convolutional Residual Models

no code implementations15 Jan 2018 Xinyuan Zhang, Ricardo Henao, Zhe Gan, Yitong Li, Lawrence Carin

Since diagnoses are typically correlated, a deep residual network is employed on top of the CNN encoder, to capture label (diagnosis) dependencies and incorporate information directly from the encoded sentence vector.

General Classification Multi-Label Classification +3

Topic Compositional Neural Language Model

no code implementations28 Dec 2017 Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, Sanjeev Satheesh, Lawrence Carin

The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence.

Language Modelling

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

17 code implementations CVPR 2018 Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He

In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation.

Text Matching Text to image generation +1

Adversarial Symmetric Variational Autoencoder

no code implementations NeurIPS 2017 Yunchen Pu, Wei-Yao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li, Lawrence Carin

A new form of variational autoencoder (VAE) is developed, in which the joint distribution of data and codes is considered in two (symmetric) forms: ($i$) from observed data fed through the encoder to yield codes, and ($ii$) from latent codes drawn from a simple prior and propagated through the decoder to manifest data.

Triangle Generative Adversarial Networks

1 code implementation NeurIPS 2017 Zhe Gan, Liqun Chen, Wei-Yao Wang, Yunchen Pu, Yizhe Zhang, Hao liu, Chunyuan Li, Lawrence Carin

The generators are designed to learn the two-way conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs.

Image-to-Image Translation Semi-Supervised Image Classification +1

Deconvolutional Paragraph Representation Learning

4 code implementations NeurIPS 2017 Yizhe Zhang, Dinghan Shen, Guoyin Wang, Zhe Gan, Ricardo Henao, Lawrence Carin

Learning latent representations from long text sequences is an important first step in many natural language processing applications.

General Classification Natural Language Processing +3

StyleNet: Generating Attractive Visual Captions With Styles

no code implementations CVPR 2017 Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, Li Deng

We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles.

Stochastic Gradient Monomial Gamma Sampler

no code implementations ICML 2017 Yizhe Zhang, Changyou Chen, Zhe Gan, Ricardo Henao, Lawrence Carin

A framework is proposed to improve the sampling efficiency of stochastic gradient MCMC, based on Hamiltonian Monte Carlo.

VAE Learning via Stein Variational Gradient Descent

no code implementations NeurIPS 2017 Yunchen Pu, Zhe Gan, Ricardo Henao, Chunyuan Li, Shaobo Han, Lawrence Carin

A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent.

Character-level Deep Conflation for Business Data Analytics

2 code implementations8 Feb 2017 Zhe Gan, P. D. Singh, Ameet Joshi, Xiaodong He, Jianshu Chen, Jianfeng Gao, Li Deng

Connecting different text attributes associated with the same entity (conflation) is important in business data analytics since it could help merge two different tables in a database to provide a more comprehensive profile of an entity.

Adaptive DCTNet for Audio Signal Classification

1 code implementation13 Dec 2016 Yin Xian, Yunchen Pu, Zhe Gan, Liang Lu, Andrew Thompson

Its output feature is related to Cohen's class of time-frequency distributions.


Semantic Compositional Networks for Visual Captioning

1 code implementation CVPR 2017 Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng

The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag.

Image Captioning Semantic Composition +1

Learning Generic Sentence Representations Using Convolutional Neural Networks

no code implementations EMNLP 2017 Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, Lawrence Carin

We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes.

Adaptive Feature Abstraction for Translating Video to Text

no code implementations23 Nov 2016 Yunchen Pu, Martin Renqiang Min, Zhe Gan, Lawrence Carin

Previous models for video captioning often use the output from a specific layer of a Convolutional Neural Network (CNN) as video features.

Video Captioning

Unsupervised Learning with Truncated Gaussian Graphical Models

no code implementations15 Nov 2016 Qinliang Su, Xuejun Liao, Chunyuan Li, Zhe Gan, Lawrence Carin

Gaussian graphical models (GGMs) are widely used for statistical modeling, because of ease of inference and the ubiquitous use of the normal distribution in practical approximations.

Unsupervised Pre-training

Factored Temporal Sigmoid Belief Networks for Sequence Learning

no code implementations22 May 2016 Jiaming Song, Zhe Gan, Lawrence Carin

Deep conditional generative models are developed to simultaneously learn the temporal dependencies of multiple sequences.

General Classification

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

1 code implementation25 Dec 2015 Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin

Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied.

Stochastic Optimization

Deep Poisson Factor Modeling

no code implementations NeurIPS 2015 Ricardo Henao, Zhe Gan, James Lu, Lawrence Carin

We propose a new deep architecture for topic modeling, based on Poisson Factor Analysis (PFA) modules.

Topic Models

Cannot find the paper you are looking for? You can Submit a new open access paper.