Search Results for author: Xiaodan Liang

Found 215 papers, 91 papers with code

Don’t Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

1 code implementation NAACL 2022 Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu

Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.

Machine Translation Style Transfer +2

NLIP: Noise-robust Language-Image Pre-training

no code implementations14 Dec 2022 Runhui Huang, Yanxin Long, Jianhua Han, Hang Xu, Xiwen Liang, Chunjing Xu, Xiaodan Liang

Large-scale cross-modal pre-training paradigms have recently shown ubiquitous success on a wide range of downstream tasks, e. g., zero-shot classification, retrieval and image captioning.

Image Captioning Memorization +3

UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

2 code implementations6 Dec 2022 Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen, Xiaodan Liang

Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation.

Logical Reasoning

CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation

no code implementations4 Dec 2022 ZiCheng Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Wei Ke

Then in the Sentence-Mask Alignment (SMA) module, the masks are weighted by the sentence embedding to localize the referred object, and finally projected back to aggregate the pixels for the target.

Image Segmentation Semantic Segmentation +2

3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation

no code implementations2 Dec 2022 Zutao Jiang, Guangsong Lu, Xiaodan Liang, Jihua Zhu, Wei zhang, Xiaojun Chang, Hang Xu

Here, we make the first attempt to achieve generic text-guided cross-category 3D object generation via a new 3D-TOGO model, which integrates a text-to-views generation module and a views-to-3D generation module.

Contrastive Learning SSIM

Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

no code implementations25 Nov 2022 Zaiyu Huang, Hanhui Li, Zhenyu Xie, Michael Kampffmeyer, Qingling Cai, Xiaodan Liang

Existing methods are restricted in this setting as they estimate garment warping flows mainly based on 2D poses and appearance, which omits the geometric prior of the 3D human body shape.

Virtual Try-on

Structure-Preserving 3D Garment Modeling with Neural Sewing Machines

no code implementations12 Nov 2022 Xipeng Chen, Guangrun Wang, Dizhong Zhu, Xiaodan Liang, Philip H. S. Torr, Liang Lin

In this paper, we propose a novel Neural Sewing Machine (NSM), a learning-based framework for structure-preserving 3D garment modeling, which is capable of learning representations for garments with diverse shapes and topologies and is successfully applied to 3D garment reconstruction and controllable manipulation.

Representation Learning

P$^3$OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection

no code implementations2 Nov 2022 Yanxin Long, Jianhua Han, Runhui Huang, Xu Hang, Yi Zhu, Chunjing Xu, Xiaodan Liang

Inspired by the success of visual-language methods (VLMs) in zero-shot classification, recent works attempt to extend this line of work into object detection by leveraging the localization ability of pre-trained VLMs and generating pseudo labels for unseen classes in a self-training manner.

object-detection Open Vocabulary Object Detection +3

Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers

1 code implementation16 Oct 2022 Tao Tang, Changlin Li, Guangrun Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang

Despite the success, its development and application on self-supervised vision transformers have been hindered by several barriers, including the high search cost, the lack of supervision, and the unsuitable search space.

Data Augmentation Image Retrieval +3

MARLlib: Extending RLlib for Multi-agent Reinforcement Learning

1 code implementation11 Oct 2022 Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Zhihui Li, Xiaodan Liang, Xiaojun Chang, Yaodong Yang

Despite the fast development of multi-agent reinforcement learning (MARL) methods, there is a lack of commonly-acknowledged baseline implementation and evaluation platforms.

Multi-agent Reinforcement Learning reinforcement-learning +1

Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning

1 code implementation9 Oct 2022 Yi Cheng, Wenge Liu, Wenjie Li, Jiashuo Wang, Ruihui Zhao, Bang Liu, Xiaodan Liang, Yefeng Zheng

Providing Emotional Support (ES) to soothe people in emotional distress is an essential capability in social interactions.

Dialogue Generation

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

no code implementations20 Sep 2022 Lewei Yao, Jianhua Han, Youpeng Wen, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Chunjing Xu, Hang Xu

We further design a concept dictionary~(with descriptions) from various online sources and detection datasets to provide prior knowledge for each concept.

object-detection Open World Object Detection

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

no code implementations19 Sep 2022 Xiwen Liang, Yangxin Wu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan Liang

Aiming towards a holistic understanding of multiple downstream tasks simultaneously, there is a need for extracting features with better transferability.

Autonomous Driving Multi-Task Learning +4

ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design

no code implementations11 Aug 2022 Xujie Zhang, Yu Sha, Michael C. Kampffmeyer, Zhenyu Xie, Zequn Jie, Chengwen Huang, Jianqing Peng, Xiaodan Liang

ARMANI discretizes an image into uniform tokens based on a learned cross-modal codebook in its first stage and uses a Transformer to model the distribution of image tokens for a real image given the tokens of the control signals in its second stage.

Image Generation

Composable Text Controls in Latent Space with ODEs

1 code implementation1 Aug 2022 Guangyi Liu, Zeyu Feng, Yuan Gao, Zichao Yang, Xiaodan Liang, Junwei Bao, Xiaodong He, Shuguang Cui, Zhen Li, Zhiting Hu

This paper proposes a new efficient approach for composable text operations in the compact latent space of text.

Language Modelling

SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

1 code implementation27 Jul 2022 Mengxue Qu, Yu Wu, Wu Liu, Qiqi Gong, Xiaodan Liang, Olga Russakovsky, Yao Zhao, Yunchao Wei

Particularly, SiRi conveys a significant principle to the research of visual grounding, i. e., a better initialized vision-language encoder would help the model converge to a better local minimum, advancing the performance accordingly.

Visual Grounding

PASTA-GAN++: A Versatile Framework for High-Resolution Unpaired Virtual Try-on

1 code implementation27 Jul 2022 Zhenyu Xie, Zaiyu Huang, Fuwei Zhao, Haoye Dong, Michael Kampffmeyer, Xin Dong, Feida Zhu, Xiaodan Liang

In this work, we take a step forwards to explore versatile virtual try-on solutions, which we argue should possess three main properties, namely, they should support unsupervised training, arbitrary garment categories, and controllable garment editing.

Disentanglement Image Generation +1

Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

no code implementations18 Jul 2022 Quande Liu, Youpeng Wen, Jianhua Han, Chunjing Xu, Hang Xu, Xiaodan Liang

To bridge the gap between supervised semantic segmentation and real-world applications that acquires one model to recognize arbitrary new concepts, recent zero-shot segmentation attracts a lot of attention by exploring the relationships between unseen and seen object categories, yet requiring large amounts of densely-annotated data with diverse base classes.

Online Clustering Semantic Segmentation +1

Discourse-Aware Graph Networks for Textual Logical Reasoning

no code implementations4 Jul 2022 Yinya Huang, Lemao Liu, Kun Xu, Meng Fang, Liang Lin, Xiaodan Liang

In this work, we propose logic structural-constraint modeling to solve the logical reasoning QA and introduce discourse-aware graph networks (DAGNs).

graph construction Logical Reasoning +2

Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

no code implementations17 Jun 2022 Xiao Dong, Xunlin Zhan, Yunchao Wei, XiaoYong Wei, YaoWei Wang, Minlong Lu, Xiaochun Cao, Xiaodan Liang

Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.


Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation

no code implementations CVPR 2022 Mingjie Li, Wenjia Cai, Karin Verspoor, Shirui Pan, Xiaodan Liang, Xiaojun Chang

To endow models with the capability of incorporating expert knowledge, we propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG), in which clinical relation triples are injected into the visual features as prior knowledge to drive the decoding procedure.

Clinical Knowledge Medical Report Generation

Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL

no code implementations1 Jun 2022 Siyi Hu, Chuanlong Xie, Xiaodan Liang, Xiaojun Chang

In this study, we quantify the agent's behavior difference and build its relationship with the policy performance via {\bf Role Diversity}, a metric to measure the characteristics of MARL tasks.

SMAC+ Starcraft

ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts

no code implementations CVPR 2022 Bingqian Lin, Yi Zhu, Zicong Chen, Xiwen Liang, Jianzhuang Liu, Xiaodan Liang

Vision-Language Navigation (VLN) is a challenging task that requires an embodied agent to perform action-level modality alignment, i. e., make instruction-asked actions sequentially in complex visual environments.

Vision-Language Navigation

ZeroGen$^+$: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning

no code implementations25 May 2022 Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong

To address this problem, we propose a noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data.

text-classification Text Classification +1

Unbiased Math Word Problems Benchmark for Mitigating Solving Bias

1 code implementation Findings (NAACL) 2022 Zhicheng Yang, Jinghui Qin, Jiaqi Chen, Xiaodan Liang

However, current solvers exist solving bias which consists of data bias and learning bias due to biased dataset and improper training strategy.

LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning

1 code implementation17 May 2022 Zhicheng Yang, Jinghui Qin, Jiaqi Chen, Liang Lin, Xiaodan Liang

To address this issue and make a step towards interpretable MWP solving, we first construct a high-quality MWP dataset named InterMWP which consists of 11, 495 MWPs and annotates interpretable logical formulas based on algebraic knowledge as the grounded linguistic logic of each solution equation.

Math Word Problem Solving

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

1 code implementation CVPR 2022 BinBin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, Xiaodan Liang

To make ROSETTA automatically determine which experience is available and useful, a prototypical task correlation guided Gating Diversity Controller(GDC) is introduced to adaptively adjust the diversity of gates for the new task based on class-specific prototypes.

Continual Learning object-detection +1

Dressing in the Wild by Watching Dance Videos

no code implementations CVPR 2022 Xin Dong, Fuwei Zhao, Zhenyu Xie, Xijin Zhang, Daniel K. Du, Min Zheng, Xiang Long, Xiaodan Liang, Jianchao Yang

While significant progress has been made in garment transfer, one of the most applicable directions of human-centric image generation, existing works overlook the in-the-wild imagery, presenting severe garment-person misalignment as well as noticeable degradation in fine texture details.

Image Generation Virtual Try-on

Automated Progressive Learning for Efficient Training of Vision Transformers

1 code implementation CVPR 2022 Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang

First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth.

Beyond Fixation: Dynamic Window Visual Transformer

1 code implementation CVPR 2022 Pengzhen Ren, Changlin Li, Guangrun Wang, Yun Xiao, Qing Du, Xiaodan Liang, Xiaojun Chang

Recently, a surge of interest in visual transformers is to reduce the computational cost by limiting the calculation of self-attention to a local window.

Laneformer: Object-aware Row-Column Transformers for Lane Detection

no code implementations18 Mar 2022 Jianhua Han, Xiajun Deng, Xinyue Cai, Zhen Yang, Hang Xu, Chunjing Xu, Xiaodan Liang

We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is a long-standing research topic for visual perception in autonomous driving.

Autonomous Driving Lane Detection

elBERto: Self-supervised Commonsense Learning for Question Answering

no code implementations17 Mar 2022 Xunlin Zhan, Yuan Li, Xiao Dong, Xiaodan Liang, Zhiting Hu, Lawrence Carin

Commonsense question answering requires reasoning about everyday situations and causes and effects implicit in context.

Question Answering Representation Learning +1

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

no code implementations15 Mar 2022 Kaican Li, Kai Chen, Haoyu Wang, Lanqing Hong, Chaoqiang Ye, Jianhua Han, Yukuai Chen, Wei zhang, Chunjing Xu, Dit-yan Yeung, Xiaodan Liang, Zhenguo Li, Hang Xu

One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases.

Autonomous Driving object-detection +1

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

1 code implementation ACL 2022 Xiwen Liang, Fengda Zhu, Lingling Li, Hang Xu, Xiaodan Liang

To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP).

Domain Adaptation Vision-Language Navigation

Modern Augmented Reality: Applications, Trends, and Future Directions

no code implementations18 Feb 2022 Shervin Minaee, Xiaodan Liang, Shuicheng Yan

Augmented reality (AR) is one of the relatively old, yet trending areas in the intersection of computer vision and computer graphics with numerous applications in several areas, from gaming and entertainment, to education and healthcare.

Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation

1 code implementation8 Feb 2022 Li Liu, Qingle Huang, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun Chang, Xiaodan Liang

Extensive experiments on two vision tasks, includ-ing ImageNet classification and Pascal VOC segmentation, demonstrate the superiority of our ICKD, which consis-tently outperforms many existing methods, advancing thestate-of-the-art in the fields of Knowledge Distillation.

Knowledge Distillation

BodyGAN: General-Purpose Controllable Neural Human Body Generation

no code implementations CVPR 2022 Chaojie Yang, Hanhui Li, Shengjie Wu, Shengkai Zhang, Haonan Yan, Nianhong Jiao, Jie Tang, Runnan Zhou, Xiaodan Liang, Tianxiang Zheng

This is because current methods mainly rely on a single pose/appearance model, which is limited in disentangling various poses and appearance in human images.

Disentanglement Image Generation +1

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

1 code implementation8 Dec 2021 Xiwen Liang, Fengda Zhu, Yi Zhu, Bingqian Lin, Bing Wang, Xiaodan Liang

The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.

Contrastive Learning Navigate +1

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN

1 code implementation NeurIPS 2021 Zhenyu Xie, Zaiyu Huang, Fuwei Zhao, Haoye Dong, Michael Kampffmeyer, Xiaodan Liang

Image-based virtual try-on is one of the most promising applications of human-centric image generation due to its tremendous real-world potential.

Disentanglement Image Generation +1

FILIP: Fine-grained Interactive Language-Image Pre-Training

no code implementations ICLR 2022 Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu

In this paper, we introduce a large-scale Fine-grained Interactive Language-Image Pre-training (FILIP) to achieve finer-level alignment through a cross-modal late interaction mechanism, which uses a token-wise maximum similarity between visual and textual tokens to guide the contrastive objective.

Image Classification Retrieval +2

UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model

1 code implementation ICCV 2021 Haonan Yan, Jiaqi Chen, Xujie Zhang, Shengkai Zhang, Nianhong Jiao, Xiaodan Liang, Tianxiang Zheng

However, the popular DensePose-COCO dataset relies on a sophisticated manual annotation system, leading to severe limitations in acquiring the denser and more accurate annotated pose resources.

3D Reconstruction

Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis

no code implementations27 Oct 2021 Bowen Wu, Zhenyu Xie, Xiaodan Liang, Yubei Xiao, Haoye Dong, Liang Lin

The integration of human parsing and appearance flow effectively guides the generation of video frames with realistic appearance.

Human Parsing Video Generation

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

1 code implementation25 Oct 2021 Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu

Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.

Arithmetic Reasoning Mathematical Reasoning +3

Role Diversity Matters: A Study of Cooperative Training Strategies for Multi-Agent RL

no code implementations29 Sep 2021 Siyi Hu, Chuanlong Xie, Xiaodan Liang, Xiaojun Chang

In addition, role diversity can help to find a better training strategy and increase performance in cooperative MARL.

SMAC+ Starcraft +1

DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers

1 code implementation21 Sep 2021 Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, Xiaojun Chang

Dynamic networks have shown their promising capability in reducing theoretical computation complexity by adapting their architectures to the input during inference.

Fairness Model Compression

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

1 code implementation Findings (EMNLP) 2021 Chenhe Dong, Guangrun Wang, Hang Xu, Jiefeng Peng, Xiaozhe Ren, Xiaodan Liang

In this paper, we have a critical insight that improving the feed-forward network (FFN) in BERT has a higher gain than improving the multi-head attention (MHA) since the computational cost of FFN is 2$\sim$3 times larger than MHA.

Data Augmentation Knowledge Distillation

M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

no code implementations CVPR 2022 Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, XiaoYong Wei, Minlong Lu, YaoWei Wang, Xiaodan Liang

Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets.

Contrastive Learning

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

1 code implementation ICCV 2021 Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, Chunjing Xu

To resolve the problems, we propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest.

3D Object Detection object-detection

Voxel Transformer for 3D Object Detection

1 code implementation ICCV 2021 Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, Chunjing Xu

We present Voxel Transformer (VoTr), a novel and effective voxel-based Transformer backbone for 3D object detection from point clouds.

Ranked #3 on 3D Object Detection on waymo vehicle (L1 mAP metric)

3D Object Detection object-detection +1

M3D-VTON: A Monocular-to-3D Virtual Try-On Network

1 code implementation ICCV 2021 Fuwei Zhao, Zhenyu Xie, Michael Kampffmeyer, Haoye Dong, Songfang Han, Tianxiang Zheng, Tao Zhang, Xiaodan Liang

Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value.

Virtual Try-on

NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models

no code implementations ICCV 2021 Hang Xu, Ning Kang, Gengwei Zhang, Chuanlong Xie, Xiaodan Liang, Zhenguo Li

Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks.

Neural Architecture Search

WAS-VTON: Warping Architecture Search for Virtual Try-on Network

no code implementations1 Aug 2021 Zhenyu Xie, Xujie Zhang, Fuwei Zhao, Haoye Dong, Michael C. Kampffmeyer, Haonan Yan, Xiaodan Liang

Despite recent progress on image-based virtual try-on, current methods are constraint by shared warping networks and thus fail to synthesize natural try-on results when faced with clothing categories that require different warping operations.

Neural Architecture Search Virtual Try-on

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining

1 code implementation ICCV 2021 Xunlin Zhan, Yangxin Wu, Xiao Dong, Yunchao Wei, Minlong Lu, Yichi Zhang, Hang Xu, Xiaodan Liang

In this paper, we investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval among fine-grained product categories.


Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

1 code implementation23 Jul 2021 Bingqian Lin, Yi Zhu, Yanxin Long, Xiaodan Liang, Qixiang Ye, Liang Lin

Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.

Vision and Language Navigation Vision-Language Navigation

AutoBERT-Zero: Evolving BERT Backbone from Scratch

no code implementations15 Jul 2021 Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li

Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks.

Inductive Bias Language Modelling +2

Deep Learning for Embodied Vision Navigation: A Survey

no code implementations7 Jul 2021 Fengda Zhu, Yi Zhu, Vincent CS Lee, Xiaodan Liang, Xiaojun Chang

A navigation agent is supposed to have various intelligent skills, such as visual perceiving, mapping, planning, exploring and reasoning, etc.

Autonomous Driving Navigate +1

Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks

1 code implementation ACL 2021 Jinghui Qin, Xiaodan Liang, Yining Hong, Jianheng Tang, Liang Lin

Previous math word problem solvers following the encoder-decoder paradigm fail to explicitly incorporate essential math symbolic constraints, leading to unexplainable and unreasonable predictions.

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

1 code implementation29 Jun 2021 Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu

Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.

Machine Translation Style Transfer +3

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving

no code implementations21 Jun 2021 Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Jiageng Mao, Chaoqiang Ye, Wei zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu

Experiments show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when fine-tuning with different downstream tasks (i. e., detection, semantic/instance segmentation) in autonomous driving domain.

Autonomous Driving Instance Segmentation +5

One Million Scenes for Autonomous Driving: ONCE Dataset

1 code implementation21 Jun 2021 Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei zhang, Zhenguo Li, Jie Yu, Hang Xu, Chunjing Xu

To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.

3D Object Detection Autonomous Driving +1

Prototypical Graph Contrastive Learning

1 code implementation17 Jun 2021 Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang

However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.

Contrastive Learning Representation Learning

Towards Quantifiable Dialogue Coherence Evaluation

1 code implementation ACL 2021 Zheng Ye, Liucun Lu, Lishan Huang, Liang Lin, Xiaodan Liang

To address these limitations, we propose Quantifiable Dialogue Coherence Evaluation (QuantiDCE), a novel framework aiming to train a quantifiable dialogue coherence metric that can reflect the actual human rating standards.

Coherence Evaluation Dialogue Evaluation +1

GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

1 code implementation Findings (ACL) 2021 Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, Liang Lin

Therefore, we propose a Geometric Question Answering dataset GeoQA, containing 4, 998 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems.

Question Answering

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

2 code implementations CVPR 2021 Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

While existing NAS methods mostly design architectures on a single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.

Neural Architecture Search Transfer Learning

SOON: Scenario Oriented Object Navigation with Graph-based Exploration

no code implementations CVPR 2021 Fengda Zhu, Xiwen Liang, Yi Zhu, Xiaojun Chang, Xiaodan Liang

In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description.

Navigate Visual Navigation

DAGN: Discourse-Aware Graph Network for Logical Reasoning

2 code implementations NAACL 2021 Yinya Huang, Meng Fang, Yu Cao, LiWei Wang, Xiaodan Liang

The model encodes discourse information as a graph with elementary discourse units (EDUs) and discourse relations, and learns the discourse-aware features via a graph network for downstream QA tasks.

Logical Reasoning

Dynamic Slimmable Network

1 code implementation CVPR 2021 Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, Xiaojun Chang

Here, we explore a dynamic network slimming regime, named Dynamic Slimmable Network (DS-Net), which aims to achieve good hardware-efficiency via dynamically adjusting filter numbers of networks at test time with respect to different inputs, while keeping filters stored statically and contiguously in hardware to prevent the extra burden.

Fairness Model Compression

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

1 code implementation ICCV 2021 Changlin Li, Tao Tang, Guangrun Wang, Jiefeng Peng, Bing Wang, Xiaodan Liang, Xiaojun Chang

In this work, we present Block-wisely Self-supervised Neural Architecture Search (BossNAS), an unsupervised NAS method that addresses the problem of inaccurate architecture rating caused by large weight-sharing space and biased supervision in previous methods.

 Ranked #1 on Neural Architecture Search on ImageNet (Top 5 Accuracy metric)

Image Classification Neural Architecture Search

A Data-Centric Framework for Composable NLP Workflows

1 code implementation EMNLP 2020 Zhengzhong Liu, Guanxiong Ding, Avinash Bukkittu, Mansi Gupta, Pengzhi Gao, Atif Ahmed, Shikun Zhang, Xin Gao, Swapnil Singhavi, Linwei Li, Wei Wei, Zecong Hu, Haoran Shi, Haoying Zhang, Xiaodan Liang, Teruko Mitamura, Eric P. Xing, Zhiting Hu

Empirical natural language processing (NLP) systems in application domains (e. g., healthcare, finance, education) involve interoperation among multiple components, ranging from data ingestion, human annotation, to text retrieval, analysis, generation, and visualization.

Retrieval Text Retrieval

SparseBERT: Rethinking the Importance Analysis in Self-attention

1 code implementation25 Feb 2021 Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok

A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.

Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search

1 code implementation ICLR 2021 Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, Zhenguo Li

For object detection, the well-established classification and regression loss functions have been carefully designed by considering diverse learning challenges.

Model Optimization object-detection +1

Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer

2 code implementations26 Jan 2021 Liang Lin, Yiming Gao, Ke Gong, Meng Wang, Xiaodan Liang

Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e. g., sharing discrepant label granularity) without extensive re-training.

Graph Representation Learning Human Parsing +2

UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers

1 code implementation20 Jan 2021 Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang

Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task.

reinforcement-learning reinforcement Learning +1

Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition

no code implementations9 Jan 2021 Fuyu Wang, Xiaodan Liang, Lin Xu, Liang Lin

Beyond generating long and topic-coherent paragraphs in traditional captioning tasks, the medical image report composition task poses more task-oriented challenges by requiring both the highly-accurate medical term diagnosis and multiple heterogeneous forms of information including impression and findings.


Erasure for Advancing: Dynamic Self-Supervised Learning for Commonsense Reasoning

no code implementations1 Jan 2021 Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin

To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.

Question Answering Self-Supervised Learning +1

CAT-SAC: Soft Actor-Critic with Curiosity-Aware Entropy Temperature

no code implementations1 Jan 2021 Junfan Lin, Changxin Huang, Xiaodan Liang, Liang Lin

The curiosity is added to the target entropy to increase the entropy temperature for unfamiliar states and decrease the target entropy for familiar states.

NASOA: Towards Faster Task-oriented Online Fine-tuning

no code implementations1 Jan 2021 Hang Xu, Ning Kang, Gengwei Zhang, Xiaodan Liang, Zhenguo Li

The resulting model zoo is more training efficient than SOTA NAS models, e. g. 6x faster than RegNetY-16GF, and 1. 7x faster than EfficientNetB3.

Neural Architecture Search

TransNAS-Bench-101: Improving Transferrability and Generalizability of Cross-Task Neural Architecture Search

2 code implementations1 Jan 2021 Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

While existing NAS methods mostly design architectures on one single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.

Neural Architecture Search Transfer Learning

UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers

no code implementations ICLR 2021 Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang

Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task.

reinforcement-learning reinforcement Learning +1

Exploring Inter-Channel Correlation for Diversity-Preserved Knowledge Distillation

1 code implementation ICCV 2021 Li Liu, Qingle Huang, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun Chang, Xiaodan Liang

Extensive experiments on two vision tasks, including ImageNet classification and Pascal VOC segmentation, demonstrate the superiority of our ICKD, which consistently outperforms many existing methods, advancing the state-of-the-art in the fields of Knowledge Distillation.

Knowledge Distillation

Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering

no code implementations ICCV 2021 Qingxing Cao, Wentao Wan, Keze Wang, Xiaodan Liang, Liang Lin

The experimental results show that our proposed method can improve current VQA models on OOD split without losing performance on the in-domain test data.

Novel Concepts Question Answering +1

Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation

no code implementations ICCV 2021 Yi Zhu, Yue Weng, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Yutong Lu, Jianbin Jiao

Vision-Dialog Navigation (VDN) requires an agent to ask questions and navigate following the human responses to find target objects.

Imitation Learning Navigate

Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection

no code implementations ICCV 2021 Hanxue Liang, Chenhan Jiang, Dapeng Feng, Xin Chen, Hang Xu, Xiaodan Liang, Wei zhang, Zhenguo Li, Luc van Gool

Here we present a novel self-supervised 3D Object detection framework that seamlessly integrates the geometry-aware contrast and clustering harmonization to lift the unsupervised 3D representation learning, named GCC-3D.

3D Object Detection object-detection +2

REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

no code implementations24 Dec 2020 Yinya Huang, Meng Fang, Xunlin Zhan, Qingxing Cao, Xiaodan Liang, Liang Lin

It is crucial since the quality of the evidence is the key to answering commonsense questions, and even determines the upper bound on the QA systems performance.

Question Answering

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

1 code implementation22 Dec 2020 Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin

Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues.

Dialogue Generation Meta-Learning

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

1 code implementation14 Dec 2020 Qingxing Cao, Bailin Li, Xiaodan Liang, Keze Wang, Liang Lin

Specifically, we generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs to disentangle the knowledge from other biases.

Question Answering Visual Question Answering

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

1 code implementation30 Nov 2020 Junfan Lin, Zhongzhan Huang, Keze Wang, Xiaodan Liang, Weiwei Chen, Liang Lin

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency.

Continuous Control

Towards Robust Partially Supervised Multi-Structure Medical Image Segmentation on Small-Scale Data

no code implementations28 Nov 2020 Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Min Xu, Irina Voiculescu, Eric P. Xing

To bridge the methodological gaps in partially supervised learning (PSL) under data scarcity, we propose Vicinal Labels Under Uncertainty (VLUU), a simple yet efficient framework utilizing the human structure similarity for partially supervised medical image segmentation.

Data Augmentation Image Segmentation +3

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

2 code implementations NeurIPS 2020 Yangxin Wu, Gengwei Zhang, Hang Xu, Xiaodan Liang, Liang Lin

In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm.

Instance Segmentation Panoptic Segmentation +1

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

1 code implementation NeurIPS 2020 Wangchunshu Zhou, Jinyi Hu, HANLIN ZHANG, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang

In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.

Explanation Generation Natural Language Understanding

Iterative Graph Self-Distillation

no code implementations23 Oct 2020 HANLIN ZHANG, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric P. Xing

Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs.

Contrastive Learning Graph Learning +1

MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation

1 code implementation15 Oct 2020 Wenge Liu, Jianheng Tang, Yi Cheng, Wenjie Li, Yefeng Zheng, Xiaodan Liang

To push forward the future research on building expert-sensitive medical dialogue system, we proposes two kinds of medical dialogue tasks based on MedDG dataset.

Dialogue Generation Response Generation +1

Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems

1 code implementation EMNLP 2020 Jinghui Qin, Lihui Lin, Xiaodan Liang, Rumin Zhang, Liang Lin

A practical automatic textual math word problems (MWPs) solver should be able to solve various textual MWPs while most existing works only focused on one-unknown linear MWPs.

Math Word Problem Solving

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

1 code implementation EMNLP 2020 Lishan Huang, Zheng Ye, Jinghui Qin, Liang Lin, Xiaodan Liang

Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation.

Dialogue Evaluation

CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending

1 code implementation ECCV 2020 Hang Xu, Shaoju Wang, Xinyue Cai, Wei zhang, Xiaodan Liang, Zhenguo Li

In this paper, we propose a novel lane-sensitive architecture search framework named CurveLane-NAS to automatically capture both long-ranged coherent and accurate short-range curve information while unifying both architecture search and post-processing on curve lane predictions via point blending.

Autonomous Driving Lane Detection

Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation

2 code implementations6 Jun 2020 Mingjie Li, Fuyu Wang, Xiaojun Chang, Xiaodan Liang

Firstly, the regions of primary interest to radiologists are usually located in a small area of the global image, meaning that the remainder parts of the image could be considered as irrelevant noise in the training procedure.

Image Captioning Medical Report Generation +1

Bidirectional Graph Reasoning Network for Panoptic Segmentation

no code implementations CVPR 2020 Yangxin Wu, Gengwei Zhang, Yiming Gao, Xiajun Deng, Ke Gong, Xiaodan Liang, Liang Lin

We introduce a Bidirectional Graph Reasoning Network (BGRNet), which incorporates graph structure into the conventional panoptic segmentation network to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.

Instance Segmentation Panoptic Segmentation

Linguistically Driven Graph Capsule Network for Visual Question Reasoning

no code implementations23 Mar 2020 Qingxing Cao, Xiaodan Liang, Keze Wang, Liang Lin

Inspired by the property of a capsule network that can carve a tree structure inside a regular convolutional neural network (CNN), we propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network", where the compositional process is guided by the linguistic parse tree.

Question Answering Visual Question Answering

Vision-Dialog Navigation by Exploring Cross-modal Memory

1 code implementation CVPR 2020 Yi Zhu, Fengda Zhu, Zhaohuan Zhan, Bingqian Lin, Jianbin Jiao, Xiaojun Chang, Xiaodan Liang

Benefiting from the collaborative learning of the L-mem and the V-mem, our CMN is able to explore the memory about the decision making of historical navigation actions which is for the current step.

Decision Making

Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis

1 code implementation14 Mar 2020 Junfan Lin, Keze Wang, Ziliang Chen, Xiaodan Liang, Liang Lin

To eliminate this bias and inspired by the propensity score matching technique with causal diagram, we propose a propensity-based patient simulator to effectively answer unrecorded inquiry by drawing knowledge from the other records; Bias (ii) inherently comes along with the passively collected data, and is one of the key obstacles for training the agent towards "learning how" rather than "remembering what".

Medical Diagnosis

ElixirNet: Relation-aware Network Architecture Adaptation for Medical Lesion Detection

no code implementations3 Mar 2020 Chenhan Jiang, Shaoju Wang, Hang Xu, Xiaodan Liang, Nong Xiao

Is a hand-crafted detection network tailored for natural image undoubtedly good enough over a discrepant medical lesion domain?

Lesion Detection medical image detection

Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN

no code implementations18 Feb 2020 Hang Xu, Linpu Fang, Xiaodan Liang, Wenxiong Kang, Zhenguo Li

Finally, an InterDomain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally.

object-detection Object Detection +1

Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation

1 code implementation4 Feb 2020 Jinghui Qin, Zheng Ye, Jianheng Tang, Xiaodan Liang

Target-guided open-domain conversation aims to proactively and naturally guide a dialogue agent or human to achieve specific goals, topics or keywords during open-ended conversations.


SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection

no code implementations22 Nov 2019 Lewei Yao, Hang Xu, Wei zhang, Xiaodan Liang, Zhenguo Li

In this paper, we present a two-stage coarse-to-fine searching strategy named Structural-to-Modular NAS (SM-NAS) for searching a GPU-friendly design of both an efficient combination of modules and better modular-level architecture for object detection.

Neural Architecture Search object-detection +1

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

no code implementations CVPR 2020 Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang

In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information.

Navigate Vision-Language Navigation

Heterogeneous Graph Learning for Visual Commonsense Reasoning

1 code implementation NeurIPS 2019 Weijiang Yu, Jingwen Zhou, Weihao Yu, Xiaodan Liang, Nong Xiao

Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.

Graph Learning Visual Commonsense Reasoning

Layout-Graph Reasoning for Fashion Landmark Detection

no code implementations CVPR 2019 Weijiang Yu, Xiaodan Liang, Ke Gong, Chenhan Jiang, Nong Xiao, Liang Lin

Each Layout-Graph Reasoning(LGR) layer aims to map feature representations into structural graph nodes via a Map-to-Node module, performs reasoning over structural graph nodes to achieve global layout coherency via a layout-graph reasoning module, and then maps graph nodes back to enhance feature representations via a Node-to-Map module.

Graph Clustering

Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning

no code implementations28 Sep 2019 Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, Liang Lin

Resembling the rapid learning capability of human, few-shot learning empowers vision systems to understand new concepts by training with few samples.

Few-Shot Learning Few-Shot Object Detection +2

Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network

no code implementations23 Sep 2019 Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin

Explanation and high-order reasoning capabilities are crucial for real-world visual question answering with diverse levels of inference complexity (e. g., what is the dog that is near the girl playing with?)

Question Answering Visual Question Answering

Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching

1 code implementation8 Jul 2019 Ziliang Chen, Zhanfu Yang, Xiaoxi Wang, Xiaodan Liang, Xiaopeng Yan, Guanbin Li, Liang Lin

A broad range of cross-$m$-domain generation researches boil down to matching a joint distribution by deep generative models (DGMs).

Blending-target Domain Adaptation by Adversarial Meta-Adaptation Networks

1 code implementation CVPR 2019 Ziliang Chen, Jingyu Zhuang, Xiaodan Liang, Liang Lin

(Unsupervised) Domain Adaptation (DA) seeks for classifying target instances when solely provided with source labeled and target unlabeled examples for training.

Multi-target Domain Adaptation Unsupervised Domain Adaptation

Fashion Editing with Adversarial Parsing Learning

no code implementations CVPR 2020 Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin

Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value.

Human Parsing Image Manipulation

Learning Personalized Modular Network Guided by Structured Knowledge

no code implementations CVPR 2019 Xiaodan Liang

Learning semantic configurations and activation of modules to align well with structured knowledge can be regarded as a decision-making procedure, which is solved by a new graph-based reinforcement learning algorithm.

Decision Making Semantic Segmentation

Graph Transformer

no code implementations ICLR 2019 Yuan Li, Xiaodan Liang, Zhiting Hu, Yinbo Chen, Eric P. Xing

Graph neural networks (GNN) have gained increasing research interests as a mean to the challenging goal of robust and universal graph learning.

Few-Shot Learning General Classification +3

Graphonomy: Universal Human Parsing via Graph Transfer Learning

1 code implementation CVPR 2019 Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin

By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity.

Human Parsing Transfer Learning

Knowledge-driven Encode, Retrieve, Paraphrase for Medical Image Report Generation

no code implementations25 Mar 2019 Christy Y. Li, Xiaodan Liang, Zhiting Hu, Eric P. Xing

Generating long and semantic-coherent reports to describe medical images poses great challenges towards bridging visual and linguistic modalities, incorporating medical domain knowledge, and generating realistic and accurate descriptions.

Graph Learning Knowledge Graphs +3

Towards Multi-pose Guided Virtual Try-on Network

no code implementations ICCV 2019 Haoye Dong, Xiaodan Liang, Bochao Wang, Hanjiang Lai, Jia Zhu, Jian Yin

Given an input person image, a desired clothes image, and a desired pose, the proposed Multi-pose Guided Virtual Try-on Network (MG-VTON) can generate a new person image after fitting the desired clothes into the input image and manipulating human poses.

Fashion Synthesis Human Parsing +2

End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis

1 code implementation30 Jan 2019 Lin Xu, Qixian Zhou, Ke Gong, Xiaodan Liang, Jianheng Tang, Liang Lin

Besides the challenges for conversational dialogue systems (e. g. topic transition coherency and question understanding), automatic medical diagnosis further poses more critical requirements for the dialogue rationality in the context of medical knowledge and symptom-disease relations.

Decision Making Dialogue Management +5

Data-to-Text Generation with Style Imitation

1 code implementation Findings of the Association for Computational Linguistics 2020 Shuai Lin, Wentao Wang, Zichao Yang, Xiaodan Liang, Frank F. Xu, Eric Xing, Zhiting Hu

That is, the model learns to imitate the writing style of any given exemplar sentence, with automatic adaptions to faithfully describe the content record.

Data-to-Text Generation Style Transfer

Symbolic Graph Reasoning Meets Convolutions

1 code implementation NeurIPS 2018 Xiaodan Liang, Zhiting Hu, Hao Zhang, Liang Lin, Eric P. Xing

To cooperate with local convolutions, each SGR is constituted by three modules: a) a primal local-to-semantic voting module where the features of all symbolic nodes are generated by voting from local representations; b) a graph reasoning module propagates information over knowledge graph to achieve global semantic coherency; c) a dual semantic-to-local mapping module learns new associations of the evolved symbolic nodes with local representations, and accordingly enhances local features.

Image Classification Semantic Segmentation

Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis

no code implementations NeurIPS 2018 Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, Jian Yin

Despite remarkable advances in image synthesis research, existing works often fail in manipulating images under the context of large geometric transformations.

Image Generation

AutoLoss: Learning Discrete Schedules for Alternate Optimization

1 code implementation4 Oct 2018 Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing

Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters.

Image Generation Machine Translation +4

AutoLoss: Learning Discrete Schedule for Alternate Optimization

no code implementations ICLR 2019 Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing

Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters.

Image Generation Machine Translation +3

Interpretable Visual Question Answering by Reasoning on Dependency Trees

no code implementations6 Sep 2018 Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin

Collaborative reasoning for understanding image-question pairs is a very critical but underexplored topic in interpretable visual question answering systems.

Question Answering Visual Question Answering

RCAA: Relational Context-Aware Agents for Person Search

no code implementations ECCV 2018 Xiaojun Chang, Po-Yao Huang, Yi-Dong Shen, Xiaodan Liang, Yi Yang, Alexander G. Hauptmann

In this paper, we address this problem by training relational context-aware agents which learn the actions to localize the target person from the gallery of whole scene images.

Person Search

Adversarial Geometry-Aware Human Motion Prediction

no code implementations ECCV 2018 Liang-Yan Gui, Yu-Xiong Wang, Xiaodan Liang, Jose M. F. Moura

We explore an approach to forecasting human motion in a few milliseconds given an input 3D skeleton sequence based on a recurrent encoder-decoder framework.

Human motion prediction motion prediction

Generative Semantic Manipulation with Mask-Contrasting GAN

no code implementations ECCV 2018 Xiaodan Liang, Hao Zhang, Liang Lin, Eric Xing

Despite the promising results on paired/unpaired image-to-image translation achieved by Generative Adversarial Networks (GANs), prior works often only transfer the low-level information (e. g. color or texture changes), but fail to manipulate high-level semantic meanings (e. g., geometric structure or content) of different object regions.

Image-to-Image Translation

Jointly Deep Multi-View Learning for Clustering Analysis

no code implementations19 Aug 2018 Bingqian Lin, Yuan Xie, Yanyun Qu, Cuihua Li, Xiaodan Liang

To our best knowledge, this is the first work to model the multi-view clustering in a deep joint framework, which will provide a meaningful thinking in unsupervised multi-view learning.

Multiview Clustering MULTI-VIEW LEARNING

Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

1 code implementation2 Aug 2018 Qixian Zhou, Xiaodan Liang, Ke Gong, Liang Lin

Beyond the existing single-person and multiple-person human parsing tasks in static images, this paper makes the first attempt to investigate a more realistic video instance-level human parsing that simultaneously segments out each person instance and parses each instance into more fine-grained parts (e. g., head, leg, dress).

Human Parsing Semantic Segmentation +3

Instance-level Human Parsing via Part Grouping Network

1 code implementation ECCV 2018 Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin

Instance-level human parsing towards real-world human analysis scenarios is still under-explored due to the absence of sufficient data resources and technical difficulty in parsing multiple instances in a single pass.

Edge Detection Human Parsing +2

Reinforced Auto-Zoom Net: Towards Accurate and Fast Breast Cancer Segmentation in Whole-slide Images

no code implementations29 Jul 2018 Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Zeya Wang, Wei Dai, Eric P. Xing

Motivated by the zoom-in operation of a pathologist using a digital microscope, RAZN learns a policy network to decide whether zooming is required in a given region of interest.

whole slide images

Toward Characteristic-Preserving Image-based Virtual Try-On Network

5 code implementations ECCV 2018 Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, Meng Yang

Second, to alleviate boundary artifacts of warped clothes and make the results more realistic, we employ a Try-On Module that learns a composition mask to integrate the warped clothes and the rendered image to ensure smoothness.

Geometric Matching Virtual Try-on