Search Results for author: Jingjing Liu

Found 103 papers, 64 papers with code

FuseGen: PLM Fusion for Data-generation based Zero-shot Learning

1 code implementation18 Jun 2024 Tianyuan Zou, Yang Liu, Peng Li, Jianqing Zhang, Jingjing Liu, Ya-Qin Zhang

To mitigate such bias, we propose FuseGen, a novel data generation-based zero-shot learning framework that introduces a new criteria for subset selection from synthetic datasets via utilizing multiple PLMs and trained STMs.

Zero-Shot Learning

Instruction-Guided Visual Masking

1 code implementation30 May 2024 Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan

To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model.

Instruction Following Visual Grounding +1

Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks

1 code implementation17 Mar 2024 Yuxuan Song, Jingjing Gong, Yanru Qu, Hao Zhou, Mingyue Zheng, Jingjing Liu, Wei-Ying Ma

Advanced generative model (e. g., diffusion model) derived from simplified continuity assumptions of data distribution, though showing promising progress, has been difficult to apply directly to geometry generation applications due to the multi-modality and noise-sensitive nature of molecule geometry.

3D Molecule Generation

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

1 code implementation28 Feb 2024 Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan

Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding.

Contrastive Learning Decision Making +1

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

2 code implementations23 Feb 2024 Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang

In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint.

3D Object Detection Autonomous Driving +2

Contextual Molecule Representation Learning from Chemical Reaction Knowledge

no code implementations21 Feb 2024 Han Tang, Shikun Feng, Bicheng Lin, Yuyan Ni, Jingjing Liu, Wei-Ying Ma, Yanyan Lan

REMO offers a novel solution to MRL by exploiting the underlying shared patterns in chemical reactions as \textit{context} for pre-training, which effectively infers meaningful representations of common chemistry knowledge.

molecular representation Representation Learning +1

Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers

no code implementations9 Feb 2024 Tongda Xu, Ziran Zhu, Jian Li, Dailan He, Yuanyuan Wang, Ming Sun, Ling Li, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

Diffusion Inverse Solvers (DIS) are designed to sample from the conditional distribution $p_{\theta}(X_0|y)$, with a predefined diffusion model $p_{\theta}(X_0)$, an operator $f(\cdot)$, and a measurement $y=f(x'_0)$ derived from an unknown image $x'_0$.

Image Captioning Semantic Segmentation

Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model

1 code implementation19 Jan 2024 Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu

Interestingly, we discover that via reachability analysis of safe-control theory, the hard safety constraint can be equivalently translated to identifying the largest feasible region given the offline dataset.

Offline RL reinforcement-learning

Idempotence and Perceptual Image Compression

1 code implementation17 Jan 2024 Tongda Xu, Ziran Zhu, Dailan He, Yanghao Li, Lina Guo, Yuanyuan Wang, Zhe Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec.

Image Compression

Generative Multimodal Models are In-Context Learners

1 code implementation CVPR 2024 Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang

The human ability to easily solve multimodal tasks in context (i. e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate.

In-Context Learning Personalized Image Generation +3

Machine learning's own Industrial Revolution

no code implementations4 Nov 2023 Yuan Luo, Song Han, Jingjing Liu

Machine learning is expected to enable the next Industrial Revolution.


CapsFusion: Rethinking Image-Text Data at Scale

1 code implementation CVPR 2024 Qiying Yu, Quan Sun, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Yue Cao, Xinlong Wang, Jingjing Liu

To provide higher-quality and more scalable multimodal pretraining data, we propose CapsFusion, an advanced framework that leverages large language models to consolidate and refine information from both web-based image-text pairs and synthetic captions.

World Knowledge

DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening

1 code implementation10 Oct 2023 Bowen Gao, Bo Qiang, Haichuan Tan, Minsi Ren, Yinjun Jia, Minsi Lu, Jingjing Liu, WeiYing Ma, Yanyan Lan

Virtual screening, which identifies potential drugs from vast compound databases to bind with a particular protein pocket, is a critical step in AI-assisted drug discovery.

Contrastive Learning Data Augmentation +3

Bandwidth-efficient Inference for Neural Image Compression

no code implementations6 Sep 2023 Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li, Yan Wang, Jingjing Liu

With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices.

Data Compression Image Compression +1

Conditional Perceptual Quality Preserving Image Compression

no code implementations16 Aug 2023 Tongda Xu, Qian Zhang, Yanghao Li, Dailan He, Zhe Wang, Yuanyuan Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

We propose conditional perceptual quality, an extension of the perceptual quality defined in \citet{blau2018perception}, by conditioning it on user defined information.

Image Compression

Eliminating Label Leakage in Tree-Based Vertical Federated Learning

no code implementations19 Jul 2023 Hideaki Takahashi, Jingjing Liu, Yang Liu

To counteract label leakage from the instance space, we propose two effective defense mechanisms, Grafting-LDP, which improves the utility of label differential privacy with post-processing, and andID-LMID, which focuses on mutual information regularization.

Inference Attack Vertical Federated Learning

Multimodal Molecular Pretraining via Modality Blending

no code implementations12 Jul 2023 Qiying Yu, Yudi Zhang, Yuyan Ni, Shikun Feng, Yanyan Lan, Hao Zhou, Jingjing Liu

Self-supervised learning has recently gained growing interest in molecular modeling for scientific tasks such as AI-assisted drug discovery.

Drug Discovery molecular representation +3

Emu: Generative Pretraining in Multimodality

2 code implementations11 Jul 2023 Quan Sun, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context.

Image Captioning Temporal/Casual QA +4

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

1 code implementation25 May 2023 Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Ya-Qin Zhang

Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance.

Computational Efficiency reinforcement-learning +1

Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack

1 code implementation CVPR 2023 Hideaki Takahashi, Jingjing Liu, Yang Liu

Federated Learning with Model Distillation (FedMD) is a nascent collaborative learning paradigm, where only output logits of public datasets are transmitted as distilled knowledge, instead of passing on private model parameters that are susceptible to gradient inversion attacks, a known privacy risk in federated learning.

Federated Learning

Feasible Policy Iteration

no code implementations18 Apr 2023 Yujie Yang, Zhilong Zheng, Shengbo Eben Li, Jingliang Duan, Jingjing Liu, Xianyuan Zhan, Ya-Qin Zhang

To address this challenge, we propose an indirect safe RL framework called feasible policy iteration, which guarantees that the feasible region monotonically expands and converges to the maximum one, and the state-value function monotonically improves and converges to the optimal one.

Reinforcement Learning (RL) Safe Reinforcement Learning

K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation

no code implementations17 Apr 2023 Shuyu Miao, Lin Zheng, Jingjing Liu, and Hong Jin

The label-free model evaluation aims to predict the model performance on various test sets without relying on ground truths.


VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection

2 code implementations20 Mar 2023 Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang

In autonomous driving, Vehicle-Infrastructure Cooperative 3D Object Detection (VIC3D) makes use of multi-view cameras from both vehicles and traffic infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint.

3D Object Detection Autonomous Driving +2

SpiderMesh: Spatial-aware Demand-guided Recursive Meshing for RGB-T Semantic Segmentation

1 code implementation15 Mar 2023 Siqi Fan, Zhe Wang, Yan Wang, Jingjing Liu

For semantic segmentation in urban scene understanding, RGB cameras alone often fail to capture a clear holistic topology in challenging lighting conditions.

Data Augmentation Segmentation +2

Calibration-free BEV Representation for Infrastructure Perception

1 code implementation7 Mar 2023 Siqi Fan, Zhe Wang, Xiaoliang Huo, Yan Wang, Jingjing Liu

Effective BEV object detection on infrastructure can greatly improve traffic scenes understanding and vehicle-toinfrastructure (V2I) cooperative perception.

3D Object Detection object-detection

Multimodal Federated Learning via Contrastive Representation Ensemble

1 code implementation17 Feb 2023 Qiying Yu, Yang Liu, Yimu Wang, Ke Xu, Jingjing Liu

In this work, we propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL), a multimodal federated learning framework that enables training larger server models from clients with heterogeneous model architectures and data modalities, while only communicating knowledge on public dataset.

Federated Learning Image-text Retrieval +3

Mind the Gap: Offline Policy Optimization for Imperfect Rewards

1 code implementation3 Feb 2023 Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang

RGM is formulated as a bi-level optimization problem: the upper layer optimizes a reward correction term that performs visitation distribution matching w. r. t.

Reinforcement Learning (RL)

ADAPT: Action-aware Driving Caption Transformer

1 code implementation1 Feb 2023 Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, Jingjing Liu

To bridge the gap, we propose an end-to-end transformer-based architecture, ADAPT (Action-aware Driving cAPtion Transformer), which provides user-friendly natural language narrations and reasoning for each decision making step of autonomous vehicular control and action.

Autonomous Driving Decision Making

OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization

no code implementations29 Dec 2022 Feihong Shen, Jingjing Liu, Haizhen Li, Bing Fang, Chenglong Ma, Jin Hao, Yang Feng, Youyi Zheng

We design a multi-modal encoder-decoder based generative model to synthesize identity-preserving frontal facial images with aligned teeth.

Decoder Image Generation

When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning

2 code implementations23 May 2022 Jianxiong Li, Xianyuan Zhan, Haoran Xu, Xiangyu Zhu, Jingjing Liu, Ya-Qin Zhang

In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumulation of deep Q function in out-of-distribution (OOD) areas.

D4RL Offline RL +2

TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

1 code implementation12 Aug 2021 Jinyu Yang, Jingjing Liu, Ning Xu, Junzhou Huang

With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge, however, remains unexplored in the literature.

Transfer Learning Unsupervised Domain Adaptation

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models

no code implementations ICCV 2021 Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu

We hope our Adversarial VQA dataset can shed new light on robustness study in the community and serve as a valuable benchmark for future work.

Data Augmentation Question Answering +1

The Elastic Lottery Ticket Hypothesis

1 code implementation NeurIPS 2021 Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu, Zhangyang Wang

Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP.

Adversarial Feature Augmentation and Normalization for Visual Recognition

1 code implementation22 Mar 2021 Tianlong Chen, Yu Cheng, Zhe Gan, JianFeng Wang, Lijuan Wang, Zhangyang Wang, Jingjing Liu

Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.

Classification Data Augmentation +2

Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective

1 code implementation NeurIPS 2021 Tianlong Chen, Yu Cheng, Zhe Gan, Jingjing Liu, Zhangyang Wang

Training generative adversarial networks (GANs) with limited real image data generally results in deteriorated performance and collapsed models.

Data Augmentation

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

1 code implementation CVPR 2021 Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu

Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms (or is on par with) existing methods that exploit full-length videos, suggesting that end-to-end learning with just a few sparsely sampled clips is often more accurate than using densely extracted offline features from full-length videos, proving the proverbial less-is-more principle.

Ranked #27 on Visual Question Answering (VQA) on MSRVTT-QA (using extra training data)

Question Answering Retrieval +4

Adversarial Masking: Towards Understanding Robustness Trade-off for Generalization

no code implementations1 Jan 2021 Minhao Cheng, Zhe Gan, Yu Cheng, Shuohang Wang, Cho-Jui Hsieh, Jingjing Liu

By incorporating different feature maps after the masking, we can distill better features to help model generalization.

ALFA: Adversarial Feature Augmentation for Enhanced Image Recognition

no code implementations1 Jan 2021 Tianlong Chen, Yu Cheng, Zhe Gan, Yu Hu, Zhangyang Wang, Jingjing Liu

Adversarial training is an effective method to combat adversarial attacks in order to create robust neural networks.

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

1 code implementation ACL 2021 Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu

Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks.

Model Compression

Wasserstein Contrastive Representation Distillation

no code implementations CVPR 2021 Liqun Chen, Dong Wang, Zhe Gan, Jingjing Liu, Ricardo Henao, Lawrence Carin

The primary goal of knowledge distillation (KD) is to encapsulate the information of a model learned from a teacher network into a student network, with the latter being more compact than the former.

Contrastive Learning Knowledge Distillation +2

A Closer Look at the Robustness of Vision-and-Language Pre-trained Models

no code implementations15 Dec 2020 Linjie Li, Zhe Gan, Jingjing Liu

Large-scale pre-trained multimodal transformers, such as ViLBERT and UNITER, have propelled the state of the art in vision-and-language (V+L) research to a new level.

Logical Reasoning

Cross-Thought for Sentence Encoder Pre-training

1 code implementation EMNLP 2020 Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jing Jiang, Jingjing Liu

In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering.

Information Retrieval Language Modelling +5

Multi-Fact Correction in Abstractive Text Summarization

no code implementations EMNLP 2020 Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung, Jingjing Liu

Pre-trained neural abstractive summarization systems have dominated extractive strategies on news summarization performance, at least in terms of ROUGE.

Abstractive Text Summarization News Summarization +1

Efficient Robust Training via Backward Smoothing

1 code implementation3 Oct 2020 Jinghui Chen, Yu Cheng, Zhe Gan, Quanquan Gu, Jingjing Liu

In this work, we develop a new understanding towards Fast Adversarial Training, by viewing random initialization as performing randomized smoothing for better optimization of the inner maximization problem.

Contrastive Distillation on Intermediate Representations for Language Model Compression

1 code implementation EMNLP 2020 Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang, Jingjing Liu

Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one.

Knowledge Distillation Language Modelling +1

Accelerating Real-Time Question Answering via Question Generation

no code implementations10 Sep 2020 Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu, Chenguang Zhu

Although deep neural networks have achieved tremendous success for question answering (QA), they are still suffering from heavy computational and energy cost for real product deployment.

Data Augmentation Multi-Task Learning +3

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

1 code implementation10 Sep 2020 Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu

During inference, the model makes predictions based on the text input in the target language and its translation in the source language.


Graph Optimal Transport for Cross-Domain Alignment

1 code implementation ICML 2020 Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin, Jingjing Liu

In GOT, cross-domain alignment is formulated as a graph matching problem, by representing entities into a dynamically-constructed graph.

Graph Matching Image Captioning +8

MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients

1 code implementation21 Jun 2020 Chen Zhu, Yu Cheng, Zhe Gan, Furong Huang, Jingjing Liu, Tom Goldstein

Adaptive gradient methods such as RMSProp and Adam use exponential moving estimate of the squared gradient to compute adaptive step sizes, achieving better convergence than SGD in face of noisy objectives.

Image Classification Machine Translation +3

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

2 code implementations NeurIPS 2020 Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, Jingjing Liu

We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.

Ranked #7 on Visual Entailment on SNLI-VE val (using extra training data)

Image-text Retrieval Question Answering +7

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

no code implementations ECCV 2020 Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu

To reveal the secrets behind the scene of these powerful models, we present VALUE (Vision-And-Language Understanding Evaluation), a set of meticulously designed probing tasks (e. g., Visual Coreference Resolution, Visual Relation Detection, Linguistic Probing Tasks) generalizable to standard pre-trained V+L models, aiming to decipher the inner workings of multimodal pre-training (e. g., the implicit knowledge garnered in individual attention heads, the inherent cross-modal alignment learned through contextualized multimodal embeddings).

coreference-resolution cross-modal alignment

APo-VAE: Text Generation in Hyperbolic Space

no code implementations NAACL 2021 Shuyang Dai, Zhe Gan, Yu Cheng, Chenyang Tao, Lawrence Carin, Jingjing Liu

In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations.

Language Modelling Response Generation +1

Contextual Text Style Transfer

no code implementations Findings of the Association for Computational Linguistics 2020 Yu Cheng, Zhe Gan, Yizhe Zhang, Oussama Elachqar, Dianqi Li, Jingjing Liu

To realize high-quality style transfer with natural context preservation, we propose a Context-Aware Style Transfer (CAST) model, which uses two separate encoders for each input sentence and its surrounding context.

Sentence Style Transfer +2

BachGAN: High-Resolution Image Synthesis from Salient Object Layout

1 code implementation CVPR 2020 Yandong Li, Yu Cheng, Zhe Gan, Licheng Yu, Liqiang Wang, Jingjing Liu

We propose a new task towards more practical application for image generation - high-quality image synthesis from salient object layout.

Generative Adversarial Network Hallucination +4

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

1 code implementation CVPR 2020 Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang, Jingjing Liu

We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.

Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection

no code implementations19 Mar 2020 Zongxian Li, Qixiang Ye, Chong Zhang, Jingjing Liu, Shijian Lu, Yonghong Tian

In this work, we propose a Self-Guided Adaptation (SGA) model, target at aligning feature representation and transferring object detection models across domains while considering the instantaneous alignment difficulty.

object-detection Object Detection +1

Distilling Knowledge Learned in BERT for Text Generation

1 code implementation ACL 2020 Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, Jingjing Liu

Experiments show that the proposed approach significantly outperforms strong Transformer baselines on multiple language generation tasks such as machine translation and text summarization.

Language Modelling Machine Translation +5

Meta Module Network for Compositional Visual Reasoning

1 code implementation8 Oct 2019 Wenhu Chen, Zhe Gan, Linjie Li, Yu Cheng, William Wang, Jingjing Liu

To design a more powerful NMN architecture for practical use, we propose Meta Module Network (MMN) centered on a novel meta module, which can take in function recipes and morph into diverse instance modules dynamically.

MORPH Visual Reasoning

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

2 code implementations ICLR 2020 Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, Jingjing Liu

Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models.

Natural Language Understanding Overall - Test +1

UNITER: UNiversal Image-TExt Representation Learning

7 code implementations ECCV 2020 Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

Image-text matching Image-text Retrieval +12

UNITER: Learning UNiversal Image-TExt Representations

no code implementations25 Sep 2019 Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu

Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding.

Image-text matching Image-text Retrieval +10

Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation

no code implementations11 Sep 2019 Shuyang Dai, Yu Cheng, Yizhe Zhang, Zhe Gan, Jingjing Liu, Lawrence Carin

Recent unsupervised approaches to domain adaptation primarily focus on minimizing the gap between the source and the target domains through refining the feature generator, in order to learn a better alignment between the two domains.

domain classification Unsupervised Domain Adaptation

What Makes A Good Story? Designing Composite Rewards for Visual Storytelling

1 code implementation11 Sep 2019 Junjie Hu, Yu Cheng, Zhe Gan, Jingjing Liu, Jianfeng Gao, Graham Neubig

Previous storytelling approaches mostly focused on optimizing traditional metrics such as BLEU, ROUGE and CIDEr.

Visual Storytelling

Patient Knowledge Distillation for BERT Model Compression

3 code implementations IJCNLP 2019 Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu

Pre-trained language models such as BERT have proven to be highly effective for natural language processing (NLP) tasks.

Knowledge Distillation Model Compression

Adversarial Domain Adaptation for Machine Reading Comprehension

no code implementations IJCNLP 2019 Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang

In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain.

Machine Reading Comprehension Representation Learning +1

A Hybrid Retrieval-Generation Neural Conversation Model

1 code implementation19 Apr 2019 Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jianfeng Gao, W. Bruce Croft, Xiaodong Liu, Yelong Shen, Jingjing Liu

In this paper, we propose a hybrid neural conversation model that combines the merits of both response retrieval and generation methods.

Diversity Text Generation +1

Relation-Aware Graph Attention Network for Visual Question Answering

1 code implementation ICCV 2019 Linjie Li, Zhe Gan, Yu Cheng, Jingjing Liu

In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the interactive dynamics between different objects.

Graph Attention Implicit Relations +3

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

1 code implementation CVPR 2019 Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa

We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et.

Vision and Language Navigation Vision-Language Navigation

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

no code implementations ACL 2019 Zhe Gan, Yu Cheng, Ahmed El Kholy, Linjie Li, Jingjing Liu, Jianfeng Gao

This paper presents a new model for visual dialog, Recurrent Dual Attention Network (ReDAN), using multi-step reasoning to answer a series of questions about an image.

Question Answering Visual Dialog

Sequential Attention GAN for Interactive Image Editing

no code implementations20 Dec 2018 Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu, Jianfeng Gao

The main challenges in this sequential and interactive image generation task are two-fold: 1) contextual consistency between a generated image and the provided textual description; 2) step-by-step region-level modification to maintain visual consistency across the generated image sequence in each session.

Text-to-Image Generation

Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning

1 code implementation19 Nov 2018 Yuexin Wu, Xiujun Li, Jingjing Liu, Jianfeng Gao, Yiming Yang

Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences.

Active Learning Q-Learning +1

Multi-task Learning with Sample Re-weighting for Machine Reading Comprehension

5 code implementations NAACL 2019 Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, Jianfeng Gao

We propose a multi-task learning framework to learn a joint Machine Reading Comprehension (MRC) model that can be applied to a wide range of MRC tasks in different domains.

Machine Reading Comprehension Machine Translation +3

Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

3 code implementations EMNLP 2018 Shang-Yu Su, Xiujun Li, Jianfeng Gao, Jingjing Liu, Yun-Nung Chen

This paper presents a Discriminative Deep Dyna-Q (D3Q) approach to improving the effectiveness and robustness of Deep Dyna-Q (DDQ), a recently proposed framework that extends the Dyna-Q algorithm to integrate planning for task-completion dialogue policy learning.

Task-Completion Dialogue Policy Learning

Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems

2 code implementations29 Jul 2018 Xiujun Li, Yu Wang, Siqi Sun, Sarah Panda, Jingjing Liu, Jianfeng Gao

This proposal introduces a Dialogue Challenge for building end-to-end task-completion dialogue systems, with the goal of encouraging the dialogue research community to collaborate and benchmark on standard datasets and unified experimental environment.

Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning

3 code implementations ACL 2018 Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, Kam-Fai Wong, Shang-Yu Su

During dialogue policy learning, the world model is constantly updated with real user experience to approach real user behavior, and in turn, the dialogue agent is optimized using both real experience and simulated experience.

Reinforcement Learning (RL) Task-Completion Dialogue Policy Learning

Dynamic Fusion Networks for Machine Reading Comprehension

no code implementations14 Nov 2017 Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen, Xiaodong Liu

This paper presents a novel neural model - Dynamic Fusion Network (DFN), for machine reading comprehension (MRC).

Machine Reading Comprehension

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

no code implementations31 Oct 2017 Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, Yun-Nung Chen, Kam-Fai Wong

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems.

Task-Completion Dialogue Policy Learning

Image Disguise based on Generative Model

no code implementations21 Oct 2017 Xintao Duan, Haoxian Song, En Zhang, Jingjing Liu

To protect image contents, most existing encryption algorithms are designed to transform an original image into a texture-like or noise-like image, which is, however, an obvious visual sign indicating the presence of an encrypted image, results in a significantly large number of attacks.

Multispectral Deep Neural Networks for Pedestrian Detection

2 code implementations8 Nov 2016 Jingjing Liu, Shaoting Zhang, Shu Wang, Dimitris N. Metaxas

Multispectral pedestrian detection is essential for around-the-clock applications, e. g., surveillance and autonomous driving.

Pedestrian Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.