Search Results for author: Xiaodan Liang

Found 156 papers, 52 papers with code

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

1 code implementation15 Sep 2021 Chenhe Dong, Guangrun Wang, Hang Xu, Jiefeng Peng, Xiaozhe Ren, Xiaodan Liang

In this paper, we have a critical insight that improving the feed-forward network (FFN) in BERT has a higher gain than improving the multi-head attention (MHA) since the computational cost of FFN is 2$\sim$3 times larger than MHA.

Data Augmentation Knowledge Distillation

M5Product: A Multi-modal Pretraining Benchmark for E-commercial Product Downstream Tasks

no code implementations9 Sep 2021 Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, XiaoYong Wei, Minlong Lu, Xiaodan Liang

In this paper, we aim to advance the research of multi-modal pre-training on E-commerce and subsequently contribute a large-scale dataset, named M5Product, which consists of over 6 million multimodal pairs, covering more than 6, 000 categories and 5, 000 attributes.

Voxel Transformer for 3D Object Detection

1 code implementation6 Sep 2021 Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, Chunjing Xu

We present Voxel Transformer (VoTr), a novel and effective voxel-based Transformer backbone for 3D object detection from point clouds.

 Ranked #1 on 3D Object Detection on waymo vehicle (L1 mAP metric)

3D Object Detection Object Recognition

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

1 code implementation6 Sep 2021 Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, Chunjing Xu

To resolve the problems, we propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest.

 Ranked #1 on 3D Object Detection on waymo vehicle (AP metric)

3D Object Detection

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift

1 code implementation22 Aug 2021 Jiefeng Peng, Jiqi Zhang, Changlin Li, Guangrun Wang, Xiaodan Liang, Liang Lin

We attribute this ranking correlation problem to the supernet training consistency shift, including feature shift and parameter shift.

Neural Architecture Search

M3D-VTON: A Monocular-to-3D Virtual Try-On Network

1 code implementation11 Aug 2021 Fuwei Zhao, Zhenyu Xie, Michael Kampffmeyer, Haoye Dong, Songfang Han, Tianxiang Zheng, Tao Zhang, Xiaodan Liang

Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value.

Virtual Try-on

NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models

no code implementations7 Aug 2021 Hang Xu, Ning Kang, Gengwei Zhang, Chuanlong Xie, Xiaodan Liang, Zhenguo Li

Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks.

Neural Architecture Search

WAS-VTON: Warping Architecture Search for Virtual Try-on Network

no code implementations1 Aug 2021 Zhenyu Xie, Xujie Zhang, Fuwei Zhao, Haoye Dong, Michael C. Kampffmeyer, Haonan Yan, Xiaodan Liang

Despite recent progress on image-based virtual try-on, current methods are constraint by shared warping networks and thus fail to synthesize natural try-on results when faced with clothing categories that require different warping operations.

Neural Architecture Search Virtual Try-on

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining

no code implementations30 Jul 2021 Xunlin Zhan, Yangxin Wu, Xiao Dong, Yunchao Wei, Minlong Lu, Yichi Zhang, Hang Xu, Xiaodan Liang

In this paper, we investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval among fine-grained product categories.

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

1 code implementation23 Jul 2021 Bingqian Lin, Yi Zhu, Yanxin Long, Xiaodan Liang, Qixiang Ye, Liang Lin

Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.

Vision and Language Navigation Vision-Language Navigation

AutoBERT-Zero: Evolving BERT Backbone from Scratch

no code implementations15 Jul 2021 Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li

We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.

Neural Architecture Search

Deep Learning for Embodied Vision Navigation: A Survey

no code implementations7 Jul 2021 Fengda Zhu, Yi Zhu, Xiaodan Liang, Xiaojun Chang

Navigation is one of the fundamental features of a autonomous robot.

Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks

no code implementations ACL 2021 Jinghui Qin, Xiaodan Liang, Yining Hong, Jianheng Tang, Liang Lin

Previous math word problem solvers following the encoder-decoder paradigm fail to explicitly incorporate essential math symbolic constraints, leading to unexplainable and unreasonable predictions.

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

1 code implementation29 Jun 2021 Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Zhen Li, BoWen Zhou, Shuguang Cui, Zhiting Hu

Neural text generation models are typically trained by maximizing log-likelihood with the sequence cross entropy loss, which encourages an exact token-by-token match between a target sequence with a generated sequence.

Machine Translation Style Transfer +2

One Million Scenes for Autonomous Driving: ONCE Dataset

1 code implementation21 Jun 2021 Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei zhang, Zhenguo Li, Jie Yu, Hang Xu, Chunjing Xu

To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.

3D Object Detection Autonomous Driving

SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving

no code implementations21 Jun 2021 Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Chaoqiang Ye, Wei zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu

Aiming at facilitating a real-world, ever-evolving and scalable autonomous driving system, we present a large-scale benchmark for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data, which is the first and largest benchmark to date.

Autonomous Driving Object Detection

Prototypical Graph Contrastive Learning

no code implementations17 Jun 2021 Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang

However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.

Contrastive Learning Unsupervised Representation Learning

Towards Quantifiable Dialogue Coherence Evaluation

1 code implementation ACL 2021 Zheng Ye, Liucun Lu, Lishan Huang, Liang Lin, Xiaodan Liang

To address these limitations, we propose Quantifiable Dialogue Coherence Evaluation (QuantiDCE), a novel framework aiming to train a quantifiable dialogue coherence metric that can reflect the actual human rating standards.

Coherence Evaluation Knowledge Distillation

GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

1 code implementation30 May 2021 Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, Liang Lin

Therefore, we propose a Geometric Question Answering dataset GeoQA, containing 5, 010 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems.

Question Answering

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

no code implementations CVPR 2021 Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

While existing NAS methods mostly design architectures on a single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.

Neural Architecture Search Transfer Learning

SOON: Scenario Oriented Object Navigation with Graph-based Exploration

no code implementations CVPR 2021 Fengda Zhu, Xiwen Liang, Yi Zhu, Xiaojun Chang, Xiaodan Liang

In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description.

Visual Navigation

DAGN: Discourse-Aware Graph Network for Logical Reasoning

1 code implementation NAACL 2021 Yinya Huang, Meng Fang, Yu Cao, LiWei Wang, Xiaodan Liang

The model encodes discourse information as a graph with elementary discourse units (EDUs) and discourse relations, and learns the discourse-aware features via a graph network for downstream QA tasks.

Dynamic Slimmable Network

1 code implementation CVPR 2021 Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, Xiaojun Chang

Here, we explore a dynamic network slimming regime, named Dynamic Slimmable Network (DS-Net), which aims to achieve good hardware-efficiency via dynamically adjusting filter numbers of networks at test time with respect to different inputs, while keeping filters stored statically and contiguously in hardware to prevent the extra burden.

Fairness Model Compression

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

1 code implementation23 Mar 2021 Changlin Li, Tao Tang, Guangrun Wang, Jiefeng Peng, Bing Wang, Xiaodan Liang, Xiaojun Chang

In this work, we present Block-wisely Self-supervised Neural Architecture Search (BossNAS), an unsupervised NAS method that addresses the problem of inaccurate architecture rating caused by large weight-sharing space and biased supervision in previous methods.

Image Classification Neural Architecture Search

A Data-Centric Framework for Composable NLP Workflows

1 code implementation EMNLP 2020 Zhengzhong Liu, Guanxiong Ding, Avinash Bukkittu, Mansi Gupta, Pengzhi Gao, Atif Ahmed, Shikun Zhang, Xin Gao, Swapnil Singhavi, Linwei Li, Wei Wei, Zecong Hu, Haoran Shi, Haoying Zhang, Xiaodan Liang, Teruko Mitamura, Eric P. Xing, Zhiting Hu

Empirical natural language processing (NLP) systems in application domains (e. g., healthcare, finance, education) involve interoperation among multiple components, ranging from data ingestion, human annotation, to text retrieval, analysis, generation, and visualization.

SparseBERT: Rethinking the Importance Analysis in Self-attention

no code implementations25 Feb 2021 Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok

A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.

Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search

1 code implementation ICLR 2021 Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, Zhenguo Li

For object detection, the well-established classification and regression loss functions have been carefully designed by considering diverse learning challenges.

Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer

2 code implementations26 Jan 2021 Liang Lin, Yiming Gao, Ke Gong, Meng Wang, Xiaodan Liang

Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e. g., sharing discrepant label granularity) without extensive re-training.

Graph Representation Learning Human Parsing +2

UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers

1 code implementation20 Jan 2021 Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang

Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task.

SMAC

Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition

no code implementations9 Jan 2021 Fuyu Wang, Xiaodan Liang, Lin Xu, Liang Lin

Beyond generating long and topic-coherent paragraphs in traditional captioning tasks, the medical image report composition task poses more task-oriented challenges by requiring both the highly-accurate medical term diagnosis and multiple heterogeneous forms of information including impression and findings.

Erasure for Advancing: Dynamic Self-Supervised Learning for Commonsense Reasoning

no code implementations1 Jan 2021 Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin

To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.

Question Answering Self-Supervised Learning +1

CAT-SAC: Soft Actor-Critic with Curiosity-Aware Entropy Temperature

no code implementations1 Jan 2021 Junfan Lin, Changxin Huang, Xiaodan Liang, Liang Lin

The curiosity is added to the target entropy to increase the entropy temperature for unfamiliar states and decrease the target entropy for familiar states.

NASOA: Towards Faster Task-oriented Online Fine-tuning

no code implementations1 Jan 2021 Hang Xu, Ning Kang, Gengwei Zhang, Xiaodan Liang, Zhenguo Li

The resulting model zoo is more training efficient than SOTA NAS models, e. g. 6x faster than RegNetY-16GF, and 1. 7x faster than EfficientNetB3.

Neural Architecture Search

TransNAS-Bench-101: Improving Transferrability and Generalizability of Cross-Task Neural Architecture Search

no code implementations1 Jan 2021 Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

While existing NAS methods mostly design architectures on one single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.

Neural Architecture Search Transfer Learning

UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers

no code implementations ICLR 2021 Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang

Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task.

SMAC

REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

no code implementations24 Dec 2020 Yinya Huang, Meng Fang, Xunlin Zhan, Qingxing Cao, Xiaodan Liang, Liang Lin

It is crucial since the quality of the evidence is the key to answering commonsense questions, and even determines the upper bound on the QA systems performance.

Question Answering

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

1 code implementation22 Dec 2020 Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin

Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues.

Dialogue Generation Meta-Learning

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

no code implementations14 Dec 2020 Qingxing Cao, Bailin Li, Xiaodan Liang, Keze Wang, Liang Lin

Specifically, we generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs to disentangle the knowledge from other biases.

Question Answering Visual Question Answering

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

1 code implementation30 Nov 2020 Junfan Lin, Zhongzhan Huang, Keze Wang, Xiaodan Liang, Weiwei Chen, Liang Lin

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency.

Continuous Control

Towards Robust Medical Image Segmentation on Small-Scale Data with Incomplete Labels

no code implementations28 Nov 2020 Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Min Xu, Irina Voiculescu, Eric P. Xing

It is time to step back and think about the robustness of partially supervised methods and how to maximally utilize small-scale and partially labeled data for medical image segmentation tasks.

Data Augmentation Medical Image Segmentation

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

2 code implementations NeurIPS 2020 Yangxin Wu, Gengwei Zhang, Hang Xu, Xiaodan Liang, Liang Lin

In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm.

Instance Segmentation Panoptic Segmentation +1

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

1 code implementation NeurIPS 2020 Wangchunshu Zhou, Jinyi Hu, HANLIN ZHANG, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang

In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.

Natural Language Understanding

Iterative Graph Self-Distillation

no code implementations23 Oct 2020 HANLIN ZHANG, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric P. Xing

How to discriminatively vectorize graphs is a fundamental challenge that attracts increasing attentions in recent years.

Contrastive Learning Graph Learning +1

MedDG: A Large-scale Medical Consultation Dataset for Building Medical Dialogue System

1 code implementation15 Oct 2020 Wenge Liu, Jianheng Tang, Jinghui Qin, Lin Xu, Zhen Li, Xiaodan Liang

To push forward the future research on building expert-sensitive medical dialogue system, we proposes two kinds of medical dialogue tasks based on MedDG dataset.

Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems

1 code implementation EMNLP 2020 Jinghui Qin, Lihui Lin, Xiaodan Liang, Rumin Zhang, Liang Lin

A practical automatic textual math word problems (MWPs) solver should be able to solve various textual MWPs while most existing works only focused on one-unknown linear MWPs.

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

1 code implementation EMNLP 2020 Lishan Huang, Zheng Ye, Jinghui Qin, Liang Lin, Xiaodan Liang

Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation.

Dialogue Evaluation

CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending

1 code implementation ECCV 2020 Hang Xu, Shaoju Wang, Xinyue Cai, Wei zhang, Xiaodan Liang, Zhenguo Li

In this paper, we propose a novel lane-sensitive architecture search framework named CurveLane-NAS to automatically capture both long-ranged coherent and accurate short-range curve information while unifying both architecture search and post-processing on curve lane predictions via point blending.

Autonomous Driving Lane Detection

Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation

no code implementations6 Jun 2020 Mingjie Li, Fuyu Wang, Xiaojun Chang, Xiaodan Liang

Firstly, the regions of primary interest to radiologists are usually located in a small area of the global image, meaning that the remainder parts of the image could be considered as irrelevant noise in the training procedure.

Image Captioning Medical Report Generation +1

Bidirectional Graph Reasoning Network for Panoptic Segmentation

no code implementations CVPR 2020 Yangxin Wu, Gengwei Zhang, Yiming Gao, Xiajun Deng, Ke Gong, Xiaodan Liang, Liang Lin

We introduce a Bidirectional Graph Reasoning Network (BGRNet), which incorporates graph structure into the conventional panoptic segmentation network to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.

Instance Segmentation Panoptic Segmentation

Linguistically Driven Graph Capsule Network for Visual Question Reasoning

no code implementations23 Mar 2020 Qingxing Cao, Xiaodan Liang, Keze Wang, Liang Lin

Inspired by the property of a capsule network that can carve a tree structure inside a regular convolutional neural network (CNN), we propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network", where the compositional process is guided by the linguistic parse tree.

Question Answering Visual Question Answering

Vision-Dialog Navigation by Exploring Cross-modal Memory

1 code implementation CVPR 2020 Yi Zhu, Fengda Zhu, Zhaohuan Zhan, Bingqian Lin, Jianbin Jiao, Xiaojun Chang, Xiaodan Liang

Benefiting from the collaborative learning of the L-mem and the V-mem, our CMN is able to explore the memory about the decision making of historical navigation actions which is for the current step.

Decision Making

Learning Reinforced Agents with Counterfactual Simulation for Medical Automatic Diagnosis

no code implementations14 Mar 2020 Junfan Lin, Ziliang Chen, Xiaodan Liang, Keze Wang, Liang Lin

To address this problem, this paper presents a propensity-based patient simulator (PBPS), which is capable of facilitating the training of MAD agents by generating informative counterfactual answers along with the disease diagnosis.

ElixirNet: Relation-aware Network Architecture Adaptation for Medical Lesion Detection

no code implementations3 Mar 2020 Chenhan Jiang, Shaoju Wang, Hang Xu, Xiaodan Liang, Nong Xiao

Is a hand-crafted detection network tailored for natural image undoubtedly good enough over a discrepant medical lesion domain?

Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN

no code implementations18 Feb 2020 Hang Xu, Linpu Fang, Xiaodan Liang, Wenxiong Kang, Zhenguo Li

Finally, an InterDomain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally.

Object Detection Transfer Learning

Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation

no code implementations4 Feb 2020 Jinghui Qin, Zheng Ye, Jianheng Tang, Xiaodan Liang

Target-guided open-domain conversation aims to proactively and naturally guide a dialogue agent or human to achieve specific goals, topics or keywords during open-ended conversations.

Blockwisely Supervised Neural Architecture Search with Knowledge Distillation

1 code implementation29 Nov 2019 Changlin Li, Jiefeng Peng, Liuchun Yuan, Guangrun Wang, Xiaodan Liang, Liang Lin, Xiaojun Chang

Moreover, we find that the knowledge of a network model lies not only in the network parameters but also in the network architecture.

Knowledge Distillation Neural Architecture Search

SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection

no code implementations22 Nov 2019 Lewei Yao, Hang Xu, Wei zhang, Xiaodan Liang, Zhenguo Li

In this paper, we present a two-stage coarse-to-fine searching strategy named Structural-to-Modular NAS (SM-NAS) for searching a GPU-friendly design of both an efficient combination of modules and better modular-level architecture for object detection.

Neural Architecture Search Object Detection

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

no code implementations CVPR 2020 Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang

In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information.

Vision-Language Navigation

Heterogeneous Graph Learning for Visual Commonsense Reasoning

1 code implementation NeurIPS 2019 Weijiang Yu, Jingwen Zhou, Weihao Yu, Xiaodan Liang, Nong Xiao

Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.

Graph Learning Visual Commonsense Reasoning

Layout-Graph Reasoning for Fashion Landmark Detection

no code implementations CVPR 2019 Weijiang Yu, Xiaodan Liang, Ke Gong, Chenhan Jiang, Nong Xiao, Liang Lin

Each Layout-Graph Reasoning(LGR) layer aims to map feature representations into structural graph nodes via a Map-to-Node module, performs reasoning over structural graph nodes to achieve global layout coherency via a layout-graph reasoning module, and then maps graph nodes back to enhance feature representations via a Node-to-Map module.

Graph Clustering Hierarchical structure

Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning

no code implementations28 Sep 2019 Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, Liang Lin

Resembling the rapid learning capability of human, few-shot learning empowers vision systems to understand new concepts by training with few samples.

Few-Shot Learning Few-Shot Object Detection +1

Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network

no code implementations23 Sep 2019 Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin

Explanation and high-order reasoning capabilities are crucial for real-world visual question answering with diverse levels of inference complexity (e. g., what is the dog that is near the girl playing with?)

Question Answering Visual Question Answering

Blending-target Domain Adaptation by Adversarial Meta-Adaptation Networks

1 code implementation CVPR 2019 Ziliang Chen, Jingyu Zhuang, Xiaodan Liang, Liang Lin

(Unsupervised) Domain Adaptation (DA) seeks for classifying target instances when solely provided with source labeled and target unlabeled examples for training.

Multi-target Domain Adaptation Transfer Learning +1

Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching

1 code implementation8 Jul 2019 Ziliang Chen, Zhanfu Yang, Xiaoxi Wang, Xiaodan Liang, Xiaopeng Yan, Guanbin Li, Liang Lin

A broad range of cross-$m$-domain generation researches boil down to matching a joint distribution by deep generative models (DGMs).

Fashion Editing with Adversarial Parsing Learning

no code implementations CVPR 2020 Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin

Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value.

Human Parsing Image Manipulation

Learning Personalized Modular Network Guided by Structured Knowledge

no code implementations CVPR 2019 Xiaodan Liang

Learning semantic configurations and activation of modules to align well with structured knowledge can be regarded as a decision-making procedure, which is solved by a new graph-based reinforcement learning algorithm.

Decision Making Semantic Segmentation

Graph Transformer

no code implementations ICLR 2019 Yuan Li, Xiaodan Liang, Zhiting Hu, Yinbo Chen, Eric P. Xing

Graph neural networks (GNN) have gained increasing research interests as a mean to the challenging goal of robust and universal graph learning.

Few-Shot Learning General Classification +3

Graphonomy: Universal Human Parsing via Graph Transfer Learning

1 code implementation CVPR 2019 Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin

By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity.

Human Parsing Transfer Learning

Knowledge-driven Encode, Retrieve, Paraphrase for Medical Image Report Generation

no code implementations25 Mar 2019 Christy Y. Li, Xiaodan Liang, Zhiting Hu, Eric P. Xing

Generating long and semantic-coherent reports to describe medical images poses great challenges towards bridging visual and linguistic modalities, incorporating medical domain knowledge, and generating realistic and accurate descriptions.

Graph Learning Knowledge Graphs +2

Towards Multi-pose Guided Virtual Try-on Network

no code implementations ICCV 2019 Haoye Dong, Xiaodan Liang, Bochao Wang, Hanjiang Lai, Jia Zhu, Jian Yin

Given an input person image, a desired clothes image, and a desired pose, the proposed Multi-pose Guided Virtual Try-on Network (MG-VTON) can generate a new person image after fitting the desired clothes into the input image and manipulating human poses.

Fashion Synthesis Human Parsing +2

End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis

no code implementations30 Jan 2019 Lin Xu, Qixian Zhou, Ke Gong, Xiaodan Liang, Jianheng Tang, Liang Lin

Besides the challenges for conversational dialogue systems (e. g. topic transition coherency and question understanding), automatic medical diagnosis further poses more critical requirements for the dialogue rationality in the context of medical knowledge and symptom-disease relations.

Decision Making Dialogue Management +4

Data-to-Text Generation with Style Imitation

1 code implementation Findings of the Association for Computational Linguistics 2020 Shuai Lin, Wentao Wang, Zichao Yang, Xiaodan Liang, Frank F. Xu, Eric Xing, Zhiting Hu

That is, the model learns to imitate the writing style of any given exemplar sentence, with automatic adaptions to faithfully describe the content record.

Data-to-Text Generation Style Transfer

Symbolic Graph Reasoning Meets Convolutions

1 code implementation NeurIPS 2018 Xiaodan Liang, Zhiting Hu, Hao Zhang, Liang Lin, Eric P. Xing

To cooperate with local convolutions, each SGR is constituted by three modules: a) a primal local-to-semantic voting module where the features of all symbolic nodes are generated by voting from local representations; b) a graph reasoning module propagates information over knowledge graph to achieve global semantic coherency; c) a dual semantic-to-local mapping module learns new associations of the evolved symbolic nodes with local representations, and accordingly enhances local features.

Image Classification Semantic Segmentation

Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis

no code implementations NeurIPS 2018 Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, Jian Yin

Despite remarkable advances in image synthesis research, existing works often fail in manipulating images under the context of large geometric transformations.

Image Generation

AutoLoss: Learning Discrete Schedules for Alternate Optimization

1 code implementation4 Oct 2018 Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing

Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters.

Image Generation Machine Translation +1

Interpretable Visual Question Answering by Reasoning on Dependency Trees

no code implementations6 Sep 2018 Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin

Collaborative reasoning for understanding image-question pairs is a very critical but underexplored topic in interpretable visual question answering systems.

Question Answering Visual Question Answering

RCAA: Relational Context-Aware Agents for Person Search

no code implementations ECCV 2018 Xiaojun Chang, Po-Yao Huang, Yi-Dong Shen, Xiaodan Liang, Yi Yang, Alexander G. Hauptmann

In this paper, we address this problem by training relational context-aware agents which learn the actions to localize the target person from the gallery of whole scene images.

Person Search

Generative Semantic Manipulation with Mask-Contrasting GAN

no code implementations ECCV 2018 Xiaodan Liang, Hao Zhang, Liang Lin, Eric Xing

Despite the promising results on paired/unpaired image-to-image translation achieved by Generative Adversarial Networks (GANs), prior works often only transfer the low-level information (e. g. color or texture changes), but fail to manipulate high-level semantic meanings (e. g., geometric structure or content) of different object regions.

Image-to-Image Translation

Adversarial Geometry-Aware Human Motion Prediction

no code implementations ECCV 2018 Liang-Yan Gui, Yu-Xiong Wang, Xiaodan Liang, Jose M. F. Moura

We explore an approach to forecasting human motion in a few milliseconds given an input 3D skeleton sequence based on a recurrent encoder-decoder framework.

Human motion prediction motion prediction

Jointly Deep Multi-View Learning for Clustering Analysis

no code implementations19 Aug 2018 Bingqian Lin, Yuan Xie, Yanyun Qu, Cuihua Li, Xiaodan Liang

To our best knowledge, this is the first work to model the multi-view clustering in a deep joint framework, which will provide a meaningful thinking in unsupervised multi-view learning.

MULTI-VIEW LEARNING

Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

1 code implementation2 Aug 2018 Qixian Zhou, Xiaodan Liang, Ke Gong, Liang Lin

Beyond the existing single-person and multiple-person human parsing tasks in static images, this paper makes the first attempt to investigate a more realistic video instance-level human parsing that simultaneously segments out each person instance and parses each instance into more fine-grained parts (e. g., head, leg, dress).

Human Parsing Semantic Segmentation +3

Instance-level Human Parsing via Part Grouping Network

1 code implementation ECCV 2018 Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin

Instance-level human parsing towards real-world human analysis scenarios is still under-explored due to the absence of sufficient data resources and technical difficulty in parsing multiple instances in a single pass.

Edge Detection Human Parsing +2

Reinforced Auto-Zoom Net: Towards Accurate and Fast Breast Cancer Segmentation in Whole-slide Images

no code implementations29 Jul 2018 Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Zeya Wang, Wei Dai, Eric P. Xing

Motivated by the zoom-in operation of a pathologist using a digital microscope, RAZN learns a policy network to decide whether zooming is required in a given region of interest.

whole slide images

Toward Characteristic-Preserving Image-based Virtual Try-On Network

3 code implementations ECCV 2018 Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, Meng Yang

Second, to alleviate boundary artifacts of warped clothes and make the results more realistic, we employ a Try-On Module that learns a composition mask to integrate the warped clothes and the rendered image to ensure smoothness.

Geometric Matching Virtual Try-on

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

no code implementations ECCV 2018 Xiaodan Liang, Tairui Wang, Luona Yang, Eric Xing

To our knowledge, this is the first successful case of the learned driving policy through reinforcement learning in the high-fidelity simulator, which performs better-than supervised imitation learning.

Imitation Learning

Geometric Generalization Based Zero-Shot Learning Dataset Infinite World: Simple Yet Powerful

no code implementations10 Jul 2018 Rajesh Chidambaram, Michael Kampffmeyer, Willie Neiswanger, Xiaodan Liang, Thomas Lachmann, Eric Xing

Analogously, this paper introduces geometric generalization based zero-shot learning tests to measure the rapid learning ability and the internal consistency of deep generative models.

Zero-Shot Learning

Reinforcement Cutting-Agent Learning for Video Object Segmentation

no code implementations CVPR 2018 Junwei Han, Le Yang, Dingwen Zhang, Xiaojun Chang, Xiaodan Liang

In this paper, we formulate this problem as a Markov Decision Process, where agents are learned to segment object regions under a deep reinforcement learning framework.

Decision Making Semantic Segmentation +2

Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation

no code implementations NeurIPS 2018 Christy Y. Li, Xiaodan Liang, Zhiting Hu, Eric P. Xing

Experiments show that our approach achieves the state-of-the-art results on two medical report datasets, generating well-balanced structured sentences with robust coverage of heterogeneous medical report contents.

Decision Making

Image-derived generative modeling of pseudo-macromolecular structures - towards the statistical assessment of Electron CryoTomography template matching

no code implementations12 May 2018 Kai Wen Wang, Xiangrui Zeng, Xiaodan Liang, Zhiguang Huo, Eric P. Xing, Min Xu

Cellular Electron CryoTomography (CECT) is a 3D imaging technique that captures information about the structure and spatial organization of macromolecular complexes within single cells, in near-native state and at sub-molecular resolution.

Template Matching Two-sample testing

Dilated Temporal Relational Adversarial Network for Generic Video Summarization

no code implementations30 Apr 2018 Yu-jia Zhang, Michael Kampffmeyer, Xiaodan Liang, Dingwen Zhang, Min Tan, Eric P. Xing

Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner.

Video Summarization Video Understanding

ConnNet: A Long-Range Relation-Aware Pixel-Connectivity Network for Salient Segmentation

no code implementations20 Apr 2018 Michael Kampffmeyer, Nanqing Dong, Xiaodan Liang, Yu-jia Zhang, Eric P. Xing

We argue that semantic salient segmentation can instead be effectively resolved by reformulating it as a simple yet intuitive pixel-pair based connectivity prediction task.

Semantic Segmentation

Look into Person: Joint Body Parsing & Pose Estimation Network and A New Benchmark

3 code implementations5 Apr 2018 Xiaodan Liang, Ke Gong, Xiaohui Shen, Liang Lin

To further explore and take advantage of the semantic correlation of these two tasks, we propose a novel joint human parsing and pose estimation network to explore efficient context modeling, which can simultaneously predict parsing and pose with extremely high quality.

Human Parsing Pose Estimation +1

Visual Question Reasoning on General Dependency Tree

no code implementations CVPR 2018 Qingxing Cao, Xiaodan Liang, Bailing Li, Guanbin Li, Liang Lin

This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to compose the previously mined evidence.

Question Answering Visual Question Answering

Dynamic-structured Semantic Propagation Network

no code implementations CVPR 2018 Xiaodan Liang, Hongfei Zhou, Eric Xing

Moreoever, we demonstrate a universal segmentation model that is jointly trained on diverse datasets can surpass the performance of the common fine-tuning scheme for exploiting multiple domain knowledge.

Semantic Segmentation

Deep learning based supervised semantic segmentation of Electron Cryo-Subtomograms

no code implementations12 Feb 2018 Chang Liu, Xiangrui Zeng, Ruogu Lin, Xiaodan Liang, Zachary Freyberg, Eric Xing, Min Xu

Cellular Electron Cryo-Tomography (CECT) is a powerful imaging technique for the 3D visualization of cellular structure and organization at submolecular resolution.

Semantic Segmentation

Real-to-Virtual Domain Unification for End-to-End Autonomous Driving

no code implementations ECCV 2018 Luona Yang, Xiaodan Liang, Tairui Wang, Eric Xing

In the spectrum of vision-based autonomous driving, vanilla end-to-end models are not interpretable and suboptimal in performance, while mediated perception models require additional intermediate representations such as segmentation masks or detection bounding boxes, whose annotation can be prohibitively expensive as we move to a larger scale.

Autonomous Driving

Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

1 code implementation5 Jan 2018 Peilun Li, Xiaodan Liang, Daoyuan Jia, Eric P. Xing

It presents two main contributions to traditional GANs: 1) a soft gradient-sensitive objective for keeping semantic boundaries; 2) a semantic-aware discriminator for validating the fidelity of personalized adaptions with respect to each semantic region.

Domain Adaptation Semantic Segmentation

Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder

no code implementations2 Jan 2018 Yu-jia Zhang, Xiaodan Liang, Dingwen Zhang, Min Tan, Eric P. Xing

Unsupervised video summarization plays an important role on digesting, browsing, and searching the ever-growing videos every day, and the underlying fine-grained semantic and motion information (i. e., objects of interest and their key motions) in online videos has been barely touched.

Unsupervised Video Summarization

Learning to Segment Human by Watching YouTube

no code implementations4 Oct 2017 Xiaodan Liang, Yunchao Wei, Liang Lin, Yunpeng Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan

An intuition on human segmentation is that when a human is moving in a video, the video-context (e. g., appearance and motion clues) may potentially infer reasonable mask information for the whole human body.

Human Detection Semantic Segmentation +2

Attention-Aware Face Hallucination via Deep Reinforcement Learning

no code implementations CVPR 2017 Qingxing Cao, Liang Lin, Yukai Shi, Xiaodan Liang, Guanbin Li

Face hallucination is a domain-specific super-resolution problem with the goal to generate high-resolution (HR) faces from low-resolution (LR) input images.

Face Hallucination Super-Resolution

Temporal Dynamic Graph LSTM for Action-driven Video Object Detection

no code implementations ICCV 2017 Yuan Yuan, Xiaodan Liang, Xiaolong Wang, Dit-yan Yeung, Abhinav Gupta

A common issue, however, is that objects of interest that are not involved in human actions are often absent in global action descriptions known as "missing label".

Object Recognition Video Object Detection +1

Generative Semantic Manipulation with Contrasting GAN

no code implementations1 Aug 2017 Xiaodan Liang, Hao Zhang, Eric P. Xing

Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired image-to-image translation, such as photo$\rightarrow$ sketch and artist painting style transfer.

Image-to-Image Translation Style Transfer

Dual Motion GAN for Future-Flow Embedded Video Prediction

no code implementations ICCV 2017 Xiaodan Liang, Lisa Lee, Wei Dai, Eric P. Xing

To make both synthesized future frames and flows indistinguishable from reality, a dual adversarial training method is proposed to ensure that the future-flow prediction is able to help infer realistic future-frames, while the future-frame prediction in turn leads to realistic optical flows.

Representation Learning Video Prediction

Recurrent 3D Pose Sequence Machines

no code implementations CVPR 2017 Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, Hui Cheng

3D human articulated pose recovery from monocular image sequences is very challenging due to the diverse appearances, viewpoints, occlusions, and also the human 3D pose is inherently ambiguous from the monocular imagery.

3D Pose Estimation

Perceptual Generative Adversarial Networks for Small Object Detection

no code implementations CVPR 2017 Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, Shuicheng Yan

In this work, we address the small object detection problem by developing a single architecture that internally lifts representations of small objects to "super-resolved" ones, achieving similar characteristics as large objects and thus more discriminative for detection.

Small Object Detection

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

no code implementations11 Jun 2017 Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P. Xing

We show that Poseidon enables Caffe and TensorFlow to achieve 15. 5x speed-up on 16 single-GPU machines, even with limited bandwidth (10GbE) and the challenging VGG19-22K network for image classification.

Image Classification

SCAN: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-rays

no code implementations26 Mar 2017 Wei Dai, Joseph Doyle, Xiaodan Liang, Hao Zhang, Nanqing Dong, Yuan Li, Eric P. Xing

Through this adversarial process the critic network learns the higher order structures and guides the segmentation model to achieve realistic segmentation outcomes.

Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach

no code implementations CVPR 2017 Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, Shuicheng Yan

We investigate a principle way to progressively mine discriminative object regions using classification networks to address the weakly-supervised semantic segmentation problems.

Classification General Classification +1

Nonparametric Variational Auto-encoders for Hierarchical Representation Learning

no code implementations ICCV 2017 Prasoon Goyal, Zhiting Hu, Xiaodan Liang, Chenyu Wang, Eric Xing

In this work, we propose hierarchical nonparametric variational autoencoders, which combines tree-structured Bayesian nonparametric priors with VAEs, to enable infinite flexibility of the latent representation space.

Hierarchical structure Representation Learning +1

Recurrent Topic-Transition GAN for Visual Paragraph Generation

no code implementations ICCV 2017 Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing

The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators.

Image Paragraph Captioning

ZM-Net: Real-time Zero-shot Image Manipulation Network

no code implementations21 Mar 2017 Hao Wang, Xiaodan Liang, Hao Zhang, Dit-yan Yeung, Eric P. Xing

We cast this problem as manipulating an input image according to a parametric model whose key parameters can be conditionally generated from any guiding signal (even unseen ones).

Colorization Image Manipulation +1

Tree-Structured Reinforcement Learning for Sequential Object Localization

no code implementations NeurIPS 2016 Zequn Jie, Xiaodan Liang, Jiashi Feng, Xiaojie Jin, Wen Feng Lu, Shuicheng Yan

Therefore, Tree-RL can better cover different objects with various scales which is quite appealing in the context of object proposal.

Object Localization

Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection

1 code implementation CVPR 2017 Xiaodan Liang, Lisa Lee, Eric P. Xing

To capture such global interdependency, we propose a deep Variation-structured Reinforcement Learning (VRL) framework to sequentially discover object relationships and attributes in the whole image.

Image Classification Visual Relationship Detection

Interpretable Structure-Evolving LSTM

no code implementations CVPR 2017 Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing

Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network optimization.

Small Data Image Classification

Toward Controlled Generation of Text

2 code implementations ICML 2017 Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, Eric P. Xing

Generic generation and manipulation of text is challenging and has limited success compared to recent deep generative modeling in visual domain.

Multi-stage Object Detection with Group Recursive Learning

no code implementations18 Aug 2016 Jianan Li, Xiaodan Liang, Jianshu Li, Tingfa Xu, Jiashi Feng, Shuicheng Yan

Most of existing detection pipelines treat object proposals independently and predict bounding box locations and classification scores over them separately.

Object Proposal Generation Semantic Segmentation

Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning

no code implementations13 Aug 2016 Keze Wang, Shengfu Zhai, Hui Cheng, Xiaodan Liang, Liang Lin

In this paper, we propose a novel inference-embedded multi-task learning framework for predicting human pose from still depth images, which is implemented with a deep architecture of neural networks.

Multi-Task Learning Pose Estimation +1

Peak-Piloted Deep Network for Facial Expression Recognition

no code implementations24 Jul 2016 Xiangyun Zhao, Xiaodan Liang, Luoqi Liu, Teng Li, Yugang Han, Nuno Vasconcelos, Shuicheng Yan

Objective functions for training of deep networks for face-related recognition tasks, such as facial expression recognition (FER), usually consider each sample independently.

Face Recognition Facial Expression Recognition +1

LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling

1 code implementation18 Apr 2016 Zhen Li, Yukang Gan, Xiaodan Liang, Yizhou Yu, Hui Cheng, Liang Lin

Another long short-term memorized fusion layer is set up to integrate the contexts along the vertical direction from different channels, and perform bi-directional propagation of the fused vertical contexts along the horizontal direction to obtain true 2D global contexts.

Scene Labeling

Deep Structured Scene Parsing by Learning with Image Descriptions

no code implementations CVPR 2016 Liang Lin, Guangrun Wang, Rui Zhang, Ruimao Zhang, Xiaodan Liang, WangMeng Zuo

This paper addresses a fundamental problem of scene understanding: How to parse the scene image into a structured configuration (i. e., a semantic object hierarchy with object interaction relations) that finely accords with human perception.

Scene Labeling Scene Understanding

Geometric Scene Parsing with Hierarchical LSTM

no code implementations7 Apr 2016 Zhanglin Peng, Ruimao Zhang, Xiaodan Liang, Xiaobai Liu, Liang Lin

This paper addresses the problem of geometric scene parsing, i. e. simultaneously labeling geometric surfaces (e. g. sky, ground and vertical plane) and determining the interaction relations (e. g. layering, supporting, siding and affinity) between main regions.

3D Reconstruction Scene Labeling

Attentive Contexts for Object Detection

no code implementations24 Mar 2016 Jianan Li, Yunchao Wei, Xiaodan Liang, Jian Dong, Tingfa Xu, Jiashi Feng, Shuicheng Yan

We provide preliminary answers to these questions through developing a novel Attention to Context Convolution Neural Network (AC-CNN) based object detection model.

Object Detection

Semantic Object Parsing with Graph LSTM

no code implementations23 Mar 2016 Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, Shuicheng Yan

By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data.

Scale-aware Pixel-wise Object Proposal Networks

no code implementations19 Jan 2016 Zequn Jie, Xiaodan Liang, Jiashi Feng, Wen Feng Lu, Eng Hock Francis Tay, Shuicheng Yan

In particular, in order to improve the localization accuracy, a fully convolutional network is employed which predicts locations of object proposals for each pixel.

Object Detection

Human Parsing With Contextualized Convolutional Neural Network

no code implementations ICCV 2015 Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.

Human Parsing

Semantic Object Parsing with Local-Global Long Short-Term Memory

no code implementations CVPR 2016 Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, Shuicheng Yan

The long chains of sequential computation by stacked LG-LSTM layers also enable each pixel to sense a much larger region for inference benefiting from the memorization of previous dependencies in all positions along all dimensions.

Reversible Recursive Instance-level Object Segmentation

no code implementations CVPR 2016 Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Zequn Jie, Jiashi Feng, Liang Lin, Shuicheng Yan

By being reversible, the proposal refinement sub-network adaptively determines an optimal number of refinement iterations required for each proposal during both training and testing.

Denoising Semantic Segmentation

Scale-aware Fast R-CNN for Pedestrian Detection

no code implementations28 Oct 2015 Jianan Li, Xiaodan Liang, ShengMei Shen, Tingfa Xu, Jiashi Feng, Shuicheng Yan

Taking pedestrian detection as an example, we illustrate how we can leverage this philosophy to develop a Scale-Aware Fast R-CNN (SAF R-CNN) framework.

Pedestrian Detection

STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation

1 code implementation10 Sep 2015 Yunchao Wei, Xiaodan Liang, Yunpeng Chen, Xiaohui Shen, Ming-Ming Cheng, Jiashi Feng, Yao Zhao, Shuicheng Yan

Then, a better network called Enhanced-DCNN is learned with supervision from the predicted segmentation masks of simple images based on the Initial-DCNN as well as the image-level annotations.

RGB Salient Object Detection Salient Object Detection +1

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

no code implementations CVPR 2015 Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, Shuicheng Yan

Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image.

Human Parsing

Deep Human Parsing with Active Template Regression

1 code implementation9 Mar 2015 Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, Shuicheng Yan

The first CNN network is with max-pooling, and designed to predict the template coefficients for each label mask, while the second CNN network is without max-pooling to preserve sensitivity to label mask position and accurately predict the active shape parameters.

Human Parsing

Recognizing Focal Liver Lesions in Contrast-Enhanced Ultrasound with Discriminatively Trained Spatio-Temporal Model

1 code implementation3 Feb 2015 Xiaodan Liang, Qingxing Cao, Rui Huang, Liang Lin

The aim of this study is to provide an automatic computational framework to assist clinicians in diagnosing Focal Liver Lesions (FLLs) in Contrast-Enhancement Ultrasound (CEUS).

Complex Background Subtraction by Pursuing Dynamic Spatio-Temporal Models

no code implementations2 Feb 2015 Liang Lin, Yuanlu Xu, Xiaodan Liang, Jian-Huang Lai

Although it has been widely discussed in video surveillance, background subtraction is still an open problem in the context of complex scenarios, e. g., dynamic backgrounds, illumination variations, and indistinct foreground objects.

Computational Baby Learning

no code implementations11 Nov 2014 Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan

Then the concept detector can be fine-tuned based on these new instances.

Object Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.