Search Results for author: Xiaodan Liang

Found 269 papers, 121 papers with code

Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach

no code implementations • CVPR 2017 • Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, Shuicheng Yan

We investigate a principle way to progressively mine discriminative object regions using classification networks to address the weakly-supervised semantic segmentation problems.

Classification General Classification +4

Paper
Add Code

Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation

no code implementations • NeurIPS 2018 • Christy Y. Li, Xiaodan Liang, Zhiting Hu, Eric P. Xing

Experiments show that our approach achieves the state-of-the-art results on two medical report datasets, generating well-balanced structured sentences with robust coverage of heterogeneous medical report contents.

Decision Making Retrieval +1

Paper
Add Code

Image-derived generative modeling of pseudo-macromolecular structures - towards the statistical assessment of Electron CryoTomography template matching

no code implementations • 12 May 2018 • Kai Wen Wang, Xiangrui Zeng, Xiaodan Liang, Zhiguang Huo, Eric P. Xing, Min Xu

Cellular Electron CryoTomography (CECT) is a 3D imaging technique that captures information about the structure and spatial organization of macromolecular complexes within single cells, in near-native state and at sub-molecular resolution.

Generative Adversarial Network Template Matching +1

Paper
Add Code

Dilated Temporal Relational Adversarial Network for Generic Video Summarization

no code implementations • 30 Apr 2018 • Yu-jia Zhang, Michael Kampffmeyer, Xiaodan Liang, Dingwen Zhang, Min Tan, Eric P. Xing

Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner.

Generative Adversarial Network Video Summarization +1

Paper
Add Code

ConnNet: A Long-Range Relation-Aware Pixel-Connectivity Network for Salient Segmentation

no code implementations • 20 Apr 2018 • Michael Kampffmeyer, Nanqing Dong, Xiaodan Liang, Yu-jia Zhang, Eric P. Xing

We argue that semantic salient segmentation can instead be effectively resolved by reformulating it as a simple yet intuitive pixel-pair based connectivity prediction task.

Relation Segmentation +1

Paper
Add Code

Visual Question Reasoning on General Dependency Tree

no code implementations • CVPR 2018 • Qingxing Cao, Xiaodan Liang, Bailing Li, Guanbin Li, Liang Lin

This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to compose the previously mined evidence.

Question Answering Visual Question Answering

Paper
Add Code

Dynamic-structured Semantic Propagation Network

no code implementations • CVPR 2018 • Xiaodan Liang, Hongfei Zhou, Eric Xing

Moreoever, we demonstrate a universal segmentation model that is jointly trained on diverse datasets can surpass the performance of the common fine-tuning scheme for exploiting multiple domain knowledge.

Ranked #61 on Semantic Segmentation on Cityscapes test

Segmentation Universal Segmentation

Paper
Add Code

Deep Structured Scene Parsing by Learning with Image Descriptions

no code implementations • CVPR 2016 • Liang Lin, Guangrun Wang, Rui Zhang, Ruimao Zhang, Xiaodan Liang, WangMeng Zuo

This paper addresses a fundamental problem of scene understanding: How to parse the scene image into a structured configuration (i. e., a semantic object hierarchy with object interaction relations) that finely accords with human perception.

Descriptive Object +3

Paper
Add Code

Learning to Segment Human by Watching YouTube

no code implementations • 4 Oct 2017 • Xiaodan Liang, Yunchao Wei, Liang Lin, Yunpeng Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan

An intuition on human segmentation is that when a human is moving in a video, the video-context (e. g., appearance and motion clues) may potentially infer reasonable mask information for the whole human body.

Human Detection Segmentation +5

Paper
Add Code

Deep learning based supervised semantic segmentation of Electron Cryo-Subtomograms

no code implementations • 12 Feb 2018 • Chang Liu, Xiangrui Zeng, Ruogu Lin, Xiaodan Liang, Zachary Freyberg, Eric Xing, Min Xu

Cellular Electron Cryo-Tomography (CECT) is a powerful imaging technique for the 3D visualization of cellular structure and organization at submolecular resolution.

Decoder Segmentation +1

Paper
Add Code

Real-to-Virtual Domain Unification for End-to-End Autonomous Driving

no code implementations • ECCV 2018 • Luona Yang, Xiaodan Liang, Tairui Wang, Eric Xing

In the spectrum of vision-based autonomous driving, vanilla end-to-end models are not interpretable and suboptimal in performance, while mediated perception models require additional intermediate representations such as segmentation masks or detection bounding boxes, whose annotation can be prohibitively expensive as we move to a larger scale.

Autonomous Driving

Paper
Add Code

Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder

no code implementations • 2 Jan 2018 • Yu-jia Zhang, Xiaodan Liang, Dingwen Zhang, Min Tan, Eric P. Xing

Unsupervised video summarization plays an important role on digesting, browsing, and searching the ever-growing videos every day, and the underlying fine-grained semantic and motion information (i. e., objects of interest and their key motions) in online videos has been barely touched.

Object Unsupervised Video Summarization

Paper
Add Code

Nonparametric Variational Auto-encoders for Hierarchical Representation Learning

no code implementations • ICCV 2017 • Prasoon Goyal, Zhiting Hu, Xiaodan Liang, Chenyu Wang, Eric Xing

In this work, we propose hierarchical nonparametric variational autoencoders, which combines tree-structured Bayesian nonparametric priors with VAEs, to enable infinite flexibility of the latent representation space.

Clustering Representation Learning +1

Paper
Add Code

Attention-Aware Face Hallucination via Deep Reinforcement Learning

no code implementations • CVPR 2017 • Qingxing Cao, Liang Lin, Yukai Shi, Xiaodan Liang, Guanbin Li

Face hallucination is a domain-specific super-resolution problem with the goal to generate high-resolution (HR) faces from low-resolution (LR) input images.

Face Hallucination Hallucination +3

Paper
Add Code

Dual Motion GAN for Future-Flow Embedded Video Prediction

no code implementations • ICCV 2017 • Xiaodan Liang, Lisa Lee, Wei Dai, Eric P. Xing

To make both synthesized future frames and flows indistinguishable from reality, a dual adversarial training method is proposed to ensure that the future-flow prediction is able to help infer realistic future-frames, while the future-frame prediction in turn leads to realistic optical flows.

Representation Learning Video Prediction

Paper
Add Code

Temporal Dynamic Graph LSTM for Action-driven Video Object Detection

no code implementations • ICCV 2017 • Yuan Yuan, Xiaodan Liang, Xiaolong Wang, Dit-yan Yeung, Abhinav Gupta

A common issue, however, is that objects of interest that are not involved in human actions are often absent in global action descriptions known as "missing label".

Ranked #3 on Weakly Supervised Object Detection on Charades

Object object-detection +3

Paper
Add Code

Generative Semantic Manipulation with Contrasting GAN

no code implementations • 1 Aug 2017 • Xiaodan Liang, Hao Zhang, Eric P. Xing

Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired image-to-image translation, such as photo$\rightarrow$ sketch and artist painting style transfer.

Ranked #4 on Facial Expression Translation on CelebA

Image-to-Image Translation Style Transfer

Paper
Add Code

Recurrent 3D Pose Sequence Machines

no code implementations • CVPR 2017 • Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, Hui Cheng

3D human articulated pose recovery from monocular image sequences is very challenging due to the diverse appearances, viewpoints, occlusions, and also the human 3D pose is inherently ambiguous from the monocular imagery.

Ranked #20 on 3D Human Pose Estimation on HumanEva-I

3D Human Pose Estimation 3D Pose Estimation

Paper
Add Code

Perceptual Generative Adversarial Networks for Small Object Detection

no code implementations • CVPR 2017 • Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, Shuicheng Yan

In this work, we address the small object detection problem by developing a single architecture that internally lifts representations of small objects to "super-resolved" ones, achieving similar characteristics as large objects and thus more discriminative for detection.

Generative Adversarial Network Object +2

Paper
Add Code

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

no code implementations • 11 Jun 2017 • Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P. Xing

We show that Poseidon enables Caffe and TensorFlow to achieve 15. 5x speed-up on 16 single-GPU machines, even with limited bandwidth (10GbE) and the challenging VGG19-22K network for image classification.

Image Classification

Paper
Add Code

SCAN: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-rays

no code implementations • 26 Mar 2017 • Wei Dai, Joseph Doyle, Xiaodan Liang, Hao Zhang, Nanqing Dong, Yuan Li, Eric P. Xing

Through this adversarial process the critic network learns the higher order structures and guides the segmentation model to achieve realistic segmentation outcomes.

Organ Segmentation Segmentation

Paper
Add Code

Recurrent Topic-Transition GAN for Visual Paragraph Generation

no code implementations • ICCV 2017 • Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing

The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators.

Ranked #6 on Image Paragraph Captioning on Image Paragraph Captioning

Generative Adversarial Network Image Paragraph Captioning +1

Paper
Add Code

ZM-Net: Real-time Zero-shot Image Manipulation Network

no code implementations • 21 Mar 2017 • Hao Wang, Xiaodan Liang, Hao Zhang, Dit-yan Yeung, Eric P. Xing

We cast this problem as manipulating an input image according to a parametric model whose key parameters can be conditionally generated from any guiding signal (even unseen ones).

Colorization Descriptive +2

Paper
Add Code

Interpretable Structure-Evolving LSTM

no code implementations • CVPR 2017 • Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing

Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network optimization.

Small Data Image Classification

Paper
Add Code

Tree-Structured Reinforcement Learning for Sequential Object Localization

no code implementations • NeurIPS 2016 • Zequn Jie, Xiaodan Liang, Jiashi Feng, Xiaojie Jin, Wen Feng Lu, Shuicheng Yan

Therefore, Tree-RL can better cover different objects with various scales which is quite appealing in the context of object proposal.

Object Object Localization +2

Paper
Add Code

Peak-Piloted Deep Network for Facial Expression Recognition

no code implementations • 24 Jul 2016 • Xiangyun Zhao, Xiaodan Liang, Luoqi Liu, Teng Li, Yugang Han, Nuno Vasconcelos, Shuicheng Yan

Objective functions for training of deep networks for face-related recognition tasks, such as facial expression recognition (FER), usually consider each sample independently.

Ranked #2 on Facial Expression Recognition (FER) on Oulu-CASIA

Face Recognition Facial Expression Recognition +2

Paper
Add Code

Multi-stage Object Detection with Group Recursive Learning

no code implementations • 18 Aug 2016 • Jianan Li, Xiaodan Liang, Jianshu Li, Tingfa Xu, Jiashi Feng, Shuicheng Yan

Most of existing detection pipelines treat object proposals independently and predict bounding box locations and classification scores over them separately.

Object object-detection +4

Paper
Add Code

Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning

no code implementations • 13 Aug 2016 • Keze Wang, Shengfu Zhai, Hui Cheng, Xiaodan Liang, Liang Lin

In this paper, we propose a novel inference-embedded multi-task learning framework for predicting human pose from still depth images, which is implemented with a deep architecture of neural networks.

Multi-Task Learning Pose Estimation +1

Paper
Add Code

Is Faster R-CNN Doing Well for Pedestrian Detection?

no code implementations • 24 Jul 2016 • Liliang Zhang, Liang Lin, Xiaodan Liang, Kaiming He

Detecting pedestrian has been arguably addressed as a special topic beyond general object detection.

Ranked #19 on Pedestrian Detection on Caltech

Object object-detection +3

Paper
Add Code

Scale-aware Pixel-wise Object Proposal Networks

no code implementations • 19 Jan 2016 • Zequn Jie, Xiaodan Liang, Jiashi Feng, Wen Feng Lu, Eng Hock Francis Tay, Shuicheng Yan

In particular, in order to improve the localization accuracy, a fully convolutional network is employed which predicts locations of object proposals for each pixel.

Object object-detection +2

Paper
Add Code

Scale-aware Fast R-CNN for Pedestrian Detection

no code implementations • 28 Oct 2015 • Jianan Li, Xiaodan Liang, ShengMei Shen, Tingfa Xu, Jiashi Feng, Shuicheng Yan

Taking pedestrian detection as an example, we illustrate how we can leverage this philosophy to develop a Scale-Aware Fast R-CNN (SAF R-CNN) framework.

Ranked #23 on Pedestrian Detection on Caltech

Pedestrian Detection Philosophy

Paper
Add Code

Geometric Scene Parsing with Hierarchical LSTM

no code implementations • 7 Apr 2016 • Zhanglin Peng, Ruimao Zhang, Xiaodan Liang, Xiaobai Liu, Liang Lin

This paper addresses the problem of geometric scene parsing, i. e. simultaneously labeling geometric surfaces (e. g. sky, ground and vertical plane) and determining the interaction relations (e. g. layering, supporting, siding and affinity) between main regions.

3D Reconstruction Scene Labeling

Paper
Add Code

Attentive Contexts for Object Detection

no code implementations • 24 Mar 2016 • Jianan Li, Yunchao Wei, Xiaodan Liang, Jian Dong, Tingfa Xu, Jiashi Feng, Shuicheng Yan

We provide preliminary answers to these questions through developing a novel Attention to Context Convolution Neural Network (AC-CNN) based object detection model.

Object object-detection +1

Paper
Add Code

Semantic Object Parsing with Graph LSTM

no code implementations • 23 Mar 2016 • Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, Shuicheng Yan

By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data.

Object Superpixels

Paper
Add Code

Reversible Recursive Instance-level Object Segmentation

no code implementations • CVPR 2016 • Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Zequn Jie, Jiashi Feng, Liang Lin, Shuicheng Yan

By being reversible, the proposal refinement sub-network adaptively determines an optimal number of refinement iterations required for each proposal during both training and testing.

Denoising Object +2

Paper
Add Code

Semantic Object Parsing with Local-Global Long Short-Term Memory

no code implementations • CVPR 2016 • Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, Shuicheng Yan

The long chains of sequential computation by stacked LG-LSTM layers also enable each pixel to sense a much larger region for inference benefiting from the memorization of previous dependencies in all positions along all dimensions.

Memorization Position

Paper
Add Code

Proposal-free Network for Instance-level Object Segmentation

no code implementations • 9 Sep 2015 • Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao Yang, Liang Lin, Shuicheng Yan

Instance-level object segmentation is an important yet under-explored task.

Clustering Object +3

Paper
Add Code

Computational Baby Learning

no code implementations • 11 Nov 2014 • Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan

Then the concept detector can be fine-tuned based on these new instances.

object-detection Object Detection

Paper
Add Code

Matching-CNN Meets KNN: Quasi-Parametric Human Parsing

no code implementations • CVPR 2015 • Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, Shuicheng Yan

Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image.

Human Parsing

Paper
Add Code

Complex Background Subtraction by Pursuing Dynamic Spatio-Temporal Models

no code implementations • 2 Feb 2015 • Liang Lin, Yuanlu Xu, Xiaodan Liang, Jian-Huang Lai

Although it has been widely discussed in video surveillance, background subtraction is still an open problem in the context of complex scenarios, e. g., dynamic backgrounds, illumination variations, and indistinct foreground objects.

Paper
Add Code

Learning Latent Spatio-Temporal Compositional Model for Human Action Recognition

no code implementations • 1 Feb 2015 • Xiaodan Liang, Liang Lin, Liangliang Cao

Action recognition is an important problem in multimedia understanding.

Action Recognition Temporal Action Localization +1

Paper
Add Code

Deep Generative Models with Learnable Knowledge Constraints

no code implementations • NeurIPS 2018 • Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Xiaodan Liang, Lianhui Qin, Haoye Dong, Eric Xing

The broad set of deep generative models (DGMs) has achieved remarkable advances.

Image Generation Reinforcement Learning (RL) +1

Paper
Add Code

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

no code implementations • ECCV 2018 • Xiaodan Liang, Tairui Wang, Luona Yang, Eric Xing

To our knowledge, this is the first successful case of the learned driving policy through reinforcement learning in the high-fidelity simulator, which performs better-than supervised imitation learning.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Geometric Generalization Based Zero-Shot Learning Dataset Infinite World: Simple Yet Powerful

no code implementations • 10 Jul 2018 • Rajesh Chidambaram, Michael Kampffmeyer, Willie Neiswanger, Xiaodan Liang, Thomas Lachmann, Eric Xing

Analogously, this paper introduces geometric generalization based zero-shot learning tests to measure the rapid learning ability and the internal consistency of deep generative models.

Zero-Shot Learning

Paper
Add Code

Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio

no code implementations • 10 Jul 2018 • Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Zeya Wang, Wei Dai, Eric P. Xing

Specifically, we propose a model that enforces our intuition that prediction masks should be domain independent.

Organ Segmentation Segmentation +1

Paper
Add Code

Query-Conditioned Three-Player Adversarial Network for Video Summarization

no code implementations • 17 Jul 2018 • Yu-jia Zhang, Michael Kampffmeyer, Xiaodan Liang, Min Tan, Eric P. Xing

Video summarization plays an important role in video understanding by selecting key frames/shots.

Generative Adversarial Network Video Summarization +1

Paper
Add Code

Reinforced Auto-Zoom Net: Towards Accurate and Fast Breast Cancer Segmentation in Whole-slide Images

no code implementations • 29 Jul 2018 • Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Zeya Wang, Wei Dai, Eric P. Xing

Motivated by the zoom-in operation of a pathologist using a digital microscope, RAZN learns a policy network to decide whether zooming is required in a given region of interest.

whole slide images

Paper
Add Code

Jointly Deep Multi-View Learning for Clustering Analysis

no code implementations • 19 Aug 2018 • Bingqian Lin, Yuan Xie, Yanyun Qu, Cuihua Li, Xiaodan Liang

To our best knowledge, this is the first work to model the multi-view clustering in a deep joint framework, which will provide a meaningful thinking in unsupervised multi-view learning.

Clustering Multiview Clustering +1

Paper
Add Code

Interpretable Visual Question Answering by Reasoning on Dependency Trees

no code implementations • 6 Sep 2018 • Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin

Collaborative reasoning for understanding image-question pairs is a very critical but underexplored topic in interpretable visual question answering systems.

Question Answering valid +1

Paper
Add Code

Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis

no code implementations • NeurIPS 2018 • Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, Jian Yin

Despite remarkable advances in image synthesis research, existing works often fail in manipulating images under the context of large geometric transformations.

Generative Adversarial Network Image Generation

Paper
Add Code

Texar: A Modularized, Versatile, and Extensible Toolbox for Text Generation

no code implementations • WS 2018 • Zhiting Hu, Zichao Yang, Tiancheng Zhao, Haoran Shi, Junxian He, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Lianhui Qin, Devendra Singh Chaplot, Bowen Tan, Xingjiang Yu, Eric Xing

The features make Texar particularly suitable for technique sharing and generalization across different text generation applications.

Image Captioning Machine Translation +3

Paper
Add Code

Reinforcement Cutting-Agent Learning for Video Object Segmentation

no code implementations • CVPR 2018 • Junwei Han, Le Yang, Dingwen Zhang, Xiaojun Chang, Xiaodan Liang

In this paper, we formulate this problem as a Markov Decision Process, where agents are learned to segment object regions under a deep reinforcement learning framework.

Decision Making Object +5

Paper
Add Code

RCAA: Relational Context-Aware Agents for Person Search

no code implementations • ECCV 2018 • Xiaojun Chang, Po-Yao Huang, Yi-Dong Shen, Xiaodan Liang, Yi Yang, Alexander G. Hauptmann

In this paper, we address this problem by training relational context-aware agents which learn the actions to localize the target person from the gallery of whole scene images.

Person Search

Paper
Add Code

Generative Semantic Manipulation with Mask-Contrasting GAN

no code implementations • ECCV 2018 • Xiaodan Liang, Hao Zhang, Liang Lin, Eric Xing

Despite the promising results on paired/unpaired image-to-image translation achieved by Generative Adversarial Networks (GANs), prior works often only transfer the low-level information (e. g. color or texture changes), but fail to manipulate high-level semantic meanings (e. g., geometric structure or content) of different object regions.

Image-to-Image Translation

Paper
Add Code

Adversarial Geometry-Aware Human Motion Prediction

no code implementations • ECCV 2018 • Liang-Yan Gui, Yu-Xiong Wang, Xiaodan Liang, Jose M. F. Moura

We explore an approach to forecasting human motion in a few milliseconds given an input 3D skeleton sequence based on a recurrent encoder-decoder framework.

Decoder Human motion prediction +1

Paper
Add Code

Graph Transformer

no code implementations • ICLR 2019 • Yuan Li, Xiaodan Liang, Zhiting Hu, Yinbo Chen, Eric P. Xing

Graph neural networks (GNN) have gained increasing research interests as a mean to the challenging goal of robust and universal graph learning.

Few-Shot Learning General Classification +3

Paper
Add Code

Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection

no code implementations • ICCV 2015 • Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan

Then the concept detector can be fine-tuned based on these new instances.

object-detection Weakly Supervised Object Detection

Paper
Add Code

Human Parsing With Contextualized Convolutional Neural Network

no code implementations • ICCV 2015 • Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.

Human Parsing

Paper
Add Code

Towards Multi-pose Guided Virtual Try-on Network

no code implementations • ICCV 2019 • Haoye Dong, Xiaodan Liang, Bochao Wang, Hanjiang Lai, Jia Zhu, Jian Yin

Given an input person image, a desired clothes image, and a desired pose, the proposed Multi-pose Guided Virtual Try-on Network (MG-VTON) can generate a new person image after fitting the desired clothes into the input image and manipulating human poses.

Ranked #1 on Virtual Try-on on Deep-Fashion

Fashion Synthesis Generative Adversarial Network +3

Paper
Add Code

Knowledge-driven Encode, Retrieve, Paraphrase for Medical Image Report Generation

no code implementations • 25 Mar 2019 • Christy Y. Li, Xiaodan Liang, Zhiting Hu, Eric P. Xing

Generating long and semantic-coherent reports to describe medical images poses great challenges towards bridging visual and linguistic modalities, incorporating medical domain knowledge, and generating realistic and accurate descriptions.

Graph Learning Knowledge Graphs +3

Paper
Add Code

Learning Personalized Modular Network Guided by Structured Knowledge

no code implementations • CVPR 2019 • Xiaodan Liang

Learning semantic configurations and activation of modules to align well with structured knowledge can be regarded as a decision-making procedure, which is solved by a new graph-based reinforcement learning algorithm.

Decision Making Semantic Segmentation

Paper
Add Code

Fashion Editing with Adversarial Parsing Learning

no code implementations • CVPR 2020 • Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin

Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value.

Decoder Generative Adversarial Network +2

Paper
Add Code

Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network

no code implementations • 23 Sep 2019 • Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin

Explanation and high-order reasoning capabilities are crucial for real-world visual question answering with diverse levels of inference complexity (e. g., what is the dog that is near the girl playing with?)

Question Answering Visual Question Answering

Paper
Add Code

Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning

no code implementations • 28 Sep 2019 • Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, Liang Lin

Resembling the rapid learning capability of human, few-shot learning empowers vision systems to understand new concepts by training with few samples.

Ranked #19 on Few-Shot Object Detection on MS-COCO (30-shot)

Few-Shot Learning Few-Shot Object Detection +3

Paper
Add Code

Layout-Graph Reasoning for Fashion Landmark Detection

no code implementations • CVPR 2019 • Weijiang Yu, Xiaodan Liang, Ke Gong, Chenhan Jiang, Nong Xiao, Liang Lin

Each Layout-Graph Reasoning(LGR) layer aims to map feature representations into structural graph nodes via a Map-to-Node module, performs reasoning over structural graph nodes to achieve global layout coherency via a layout-graph reasoning module, and then maps graph nodes back to enhance feature representations via a Node-to-Map module.

Attribute Clustering +1

Paper
Add Code

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

no code implementations • CVPR 2020 • Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang

In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information.

Ranked #13 on Vision and Language Navigation on VLN Challenge

Navigate Vision-Language Navigation

Paper
Add Code

SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection

no code implementations • 22 Nov 2019 • Lewei Yao, Hang Xu, Wei zhang, Xiaodan Liang, Zhenguo Li

In this paper, we present a two-stage coarse-to-fine searching strategy named Structural-to-Modular NAS (SM-NAS) for searching a GPU-friendly design of both an efficient combination of modules and better modular-level architecture for object detection.

Neural Architecture Search Object +2

Paper
Add Code

Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN

no code implementations • 18 Feb 2020 • Hang Xu, Linpu Fang, Xiaodan Liang, Wenxiong Kang, Zhenguo Li

Finally, an InterDomain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally.

Object object-detection +2

Paper
Add Code

Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis

1 code implementation • 14 Mar 2020 • Junfan Lin, Keze Wang, Ziliang Chen, Xiaodan Liang, Liang Lin

To eliminate this bias and inspired by the propensity score matching technique with causal diagram, we propose a propensity-based patient simulator to effectively answer unrecorded inquiry by drawing knowledge from the other records; Bias (ii) inherently comes along with the passively collected data, and is one of the key obstacles for training the agent towards "learning how" rather than "remembering what".

Medical Diagnosis

Paper
Code

ElixirNet: Relation-aware Network Architecture Adaptation for Medical Lesion Detection

no code implementations • 3 Mar 2020 • Chenhan Jiang, Shaoju Wang, Hang Xu, Xiaodan Liang, Nong Xiao

Is a hand-crafted detection network tailored for natural image undoubtedly good enough over a discrepant medical lesion domain?

Lesion Detection medical image detection +1

Paper
Add Code

Linguistically Driven Graph Capsule Network for Visual Question Reasoning

no code implementations • 23 Mar 2020 • Qingxing Cao, Xiaodan Liang, Keze Wang, Liang Lin

Inspired by the property of a capsule network that can carve a tree structure inside a regular convolutional neural network (CNN), we propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network", where the compositional process is guided by the linguistic parse tree.

Question Answering Visual Question Answering

Paper
Add Code

Bidirectional Graph Reasoning Network for Panoptic Segmentation

no code implementations • CVPR 2020 • Yangxin Wu, Gengwei Zhang, Yiming Gao, Xiajun Deng, Ke Gong, Xiaodan Liang, Liang Lin

We introduce a Bidirectional Graph Reasoning Network (BGRNet), which incorporates graph structure into the conventional panoptic segmentation network to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.

Instance Segmentation Panoptic Segmentation +1

Paper
Add Code

CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search

no code implementations • ECCV 2020 • Xin Chen, Yawen Duan, Zewei Chen, Hang Xu, Zihao Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

In spite of its remarkable progress, many algorithms are restricted to particular search spaces.

Ranked #13 on Neural Architecture Search on NAS-Bench-201, ImageNet-16-120 (Accuracy (Val) metric)

Meta-Learning Meta Reinforcement Learning +3

Paper
Add Code

Erasure for Advancing: Dynamic Self-Supervised Learning for Commonsense Reasoning

no code implementations • 1 Jan 2021 • Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin

To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.

Question Answering Self-Supervised Learning +1

Paper
Add Code

CAT-SAC: Soft Actor-Critic with Curiosity-Aware Entropy Temperature

no code implementations • 1 Jan 2021 • Junfan Lin, Changxin Huang, Xiaodan Liang, Liang Lin

The curiosity is added to the target entropy to increase the entropy temperature for unfamiliar states and decrease the target entropy for familiar states.

Reinforcement Learning (RL)

Paper
Add Code

NASOA: Towards Faster Task-oriented Online Fine-tuning

no code implementations • 1 Jan 2021 • Hang Xu, Ning Kang, Gengwei Zhang, Xiaodan Liang, Zhenguo Li

The resulting model zoo is more training efficient than SOTA NAS models, e. g. 6x faster than RegNetY-16GF, and 1. 7x faster than EfficientNetB3.

Cloud Computing Neural Architecture Search

Paper
Add Code

UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers

no code implementations • ICLR 2021 • Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang

Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Iterative Graph Self-Distillation

no code implementations • 23 Oct 2020 • HANLIN ZHANG, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric P. Xing

Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs.

Contrastive Learning Graph Learning +1

Paper
Add Code

AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning

no code implementations • NeurIPS 2020 • Hao Zhang, Yuan Li, Zhijie Deng, Xiaodan Liang, Lawrence Carin, Eric Xing

Synchronization is a key step in data-parallel distributed machine learning (ML).

Transfer Learning

Paper
Add Code

Towards Robust Partially Supervised Multi-Structure Medical Image Segmentation on Small-Scale Data

no code implementations • 28 Nov 2020 • Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Min Xu, Irina Voiculescu, Eric P. Xing

To bridge the methodological gaps in partially supervised learning (PSL) under data scarcity, we propose Vicinal Labels Under Uncertainty (VLUU), a simple yet efficient framework utilizing the human structure similarity for partially supervised medical image segmentation.

Data Augmentation Image Segmentation +5

Paper
Add Code

Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation

no code implementations • 7 Dec 2020 • Gengwei Zhang, Yiming Gao, Hang Xu, Hao Zhang, Zhenguo Li, Xiaodan Liang

Panoptic segmentation that unifies instance segmentation and semantic segmentation has recently attracted increasing attention.

Ranked #17 on Panoptic Segmentation on COCO test-dev

Instance Segmentation Panoptic Segmentation +1

Paper
Add Code

Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

no code implementations • 22 Dec 2020 • Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin

When sampling tasks in MML-ASR, AMS adaptively determines the task sampling probability for each source language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

no code implementations • 24 Dec 2020 • Yinya Huang, Meng Fang, Xunlin Zhan, Qingxing Cao, Xiaodan Liang, Liang Lin

It is crucial since the quality of the evidence is the key to answering commonsense questions, and even determines the upper bound on the QA systems performance.

Question Answering World Knowledge

Paper
Add Code

Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition

no code implementations • 9 Jan 2021 • Fuyu Wang, Xiaodan Liang, Lin Xu, Liang Lin

Beyond generating long and topic-coherent paragraphs in traditional captioning tasks, the medical image report composition task poses more task-oriented challenges by requiring both the highly-accurate medical term diagnosis and multiple heterogeneous forms of information including impression and findings.

Retrieval Sentence

Paper
Add Code

GTAE: Graph-Transformer based Auto-Encoders for Linguistic-Constrained Text Style Transfer

no code implementations • 1 Feb 2021 • Yukai Shi, Sen Zhang, Chenxing Zhou, Xiaodan Liang, Xiaojun Yang, Liang Lin

Non-parallel text style transfer has attracted increasing research interests in recent years.

Decoder Sentence +2

Paper
Add Code

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving

no code implementations • 21 Jun 2021 • Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Jiageng Mao, Chaoqiang Ye, Wei zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu

Experiments show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when fine-tuning with different downstream tasks (i. e., detection, semantic/instance segmentation) in autonomous driving domain.

Autonomous Driving Instance Segmentation +5

Paper
Add Code

AutoBERT-Zero: Evolving BERT Backbone from Scratch

no code implementations • 15 Jul 2021 • Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li

Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks.

Ranked #10 on Semantic Textual Similarity on MRPC

Inductive Bias Language Modelling +3

Paper
Add Code

WAS-VTON: Warping Architecture Search for Virtual Try-on Network

no code implementations • 1 Aug 2021 • Zhenyu Xie, Xujie Zhang, Fuwei Zhao, Haoye Dong, Michael C. Kampffmeyer, Haonan Yan, Xiaodan Liang

Despite recent progress on image-based virtual try-on, current methods are constraint by shared warping networks and thus fail to synthesize natural try-on results when faced with clothing categories that require different warping operations.

Neural Architecture Search Virtual Try-on

Paper
Add Code

NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models

no code implementations • ICCV 2021 • Hang Xu, Ning Kang, Gengwei Zhang, Chuanlong Xie, Xiaodan Liang, Zhenguo Li

Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks.

Cloud Computing Neural Architecture Search

Paper
Add Code

Deep Learning for Embodied Vision Navigation: A Survey

no code implementations • 7 Jul 2021 • Fengda Zhu, Yi Zhu, Vincent CS Lee, Xiaodan Liang, Xiaojun Chang

A navigation agent is supposed to have various intelligent skills, such as visual perceiving, mapping, planning, exploring and reasoning, etc.

Autonomous Driving Navigate +1

Paper
Add Code

Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report Generation With Alternate Learning

no code implementations • 11 Aug 2021 • Guangyi Liu, Yinghong Liao, Fuyu Wang, Bin Zhang, Lu Zhang, Xiaodan Liang, Xiang Wan, Shaolin Li, Zhen Li, Shuixing Zhang, Shuguang Cui

Medical imaging technologies, including computed tomography (CT) or chest X-Ray (CXR), are largely employed to facilitate the diagnosis of the COVID-19.

Computed Tomography (CT) Medical Report Generation

Paper
Add Code

M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

no code implementations • CVPR 2022 • Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, XiaoYong Wei, Minlong Lu, YaoWei Wang, Xiaodan Liang

Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets.

Contrastive Learning

Paper
Add Code

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

no code implementations • Findings (EMNLP) 2021 • Guolin Zheng, Yubei Xiao, Ke Gong, Pan Zhou, Xiaodan Liang, Liang Lin

Specifically, we unify a pre-trained acoustic model (wav2vec 2. 0) and a language model (BERT) into an end-to-end trainable framework.

Language Modelling Representation Learning +2

Paper
Add Code

Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering

no code implementations • ICCV 2021 • Qingxing Cao, Wentao Wan, Keze Wang, Xiaodan Liang, Liang Lin

The experimental results show that our proposed method can improve current VQA models on OOD split without losing performance on the in-domain test data.

Novel Concepts Question Answering +1

Paper
Add Code

Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation

no code implementations • ICCV 2021 • Yi Zhu, Yue Weng, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Yutong Lu, Jianbin Jiao

Vision-Dialog Navigation (VDN) requires an agent to ask questions and navigate following the human responses to find target objects.

Imitation Learning Navigate

Paper
Add Code

Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection

no code implementations • ICCV 2021 • Hanxue Liang, Chenhan Jiang, Dapeng Feng, Xin Chen, Hang Xu, Xiaodan Liang, Wei zhang, Zhenguo Li, Luc van Gool

Here we present a novel self-supervised 3D Object detection framework that seamlessly integrates the geometry-aware contrast and clustering harmonization to lift the unsupervised 3D representation learning, named GCC-3D.

3D Object Detection Clustering +4

Paper
Add Code

Role Diversity Matters: A Study of Cooperative Training Strategies for Multi-Agent RL

no code implementations • 29 Sep 2021 • Siyi Hu, Chuanlong Xie, Xiaodan Liang, Xiaojun Chang

In addition, role diversity can help to find a better training strategy and increase performance in cooperative MARL.

SMAC+ Starcraft +1

Paper
Add Code

Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis

no code implementations • 27 Oct 2021 • Bowen Wu, Zhenyu Xie, Xiaodan Liang, Yubei Xiao, Haoye Dong, Liang Lin

The integration of human parsing and appearance flow effectively guides the generation of video frames with realistic appearance.

Human Parsing Video Generation

Paper
Add Code

AutoLoss: Learning Discrete Schedule for Alternate Optimization

no code implementations • ICLR 2019 • Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing

Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters.

Image Generation Machine Translation +3

Paper
Add Code

Revisiting Over-smoothing in BERT from the Perspective of Graph

no code implementations • ICLR 2022 • Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, James T. Kwok

Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields.

Paper
Add Code

Modern Augmented Reality: Applications, Trends, and Future Directions

no code implementations • 18 Feb 2022 • Shervin Minaee, Xiaodan Liang, Shuicheng Yan

Augmented reality (AR) is one of the relatively old, yet trending areas in the intersection of computer vision and computer graphics with numerous applications in several areas, from gaming and entertainment, to education and healthcare.

Paper
Add Code

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

no code implementations • 15 Mar 2022 • Kaican Li, Kai Chen, Haoyu Wang, Lanqing Hong, Chaoqiang Ye, Jianhua Han, Yukuai Chen, Wei zhang, Chunjing Xu, Dit-yan Yeung, Xiaodan Liang, Zhenguo Li, Hang Xu

One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases.

Autonomous Driving Object +2

Paper
Add Code

elBERto: Self-supervised Commonsense Learning for Question Answering

no code implementations • 17 Mar 2022 • Xunlin Zhan, Yuan Li, Xiao Dong, Xiaodan Liang, Zhiting Hu, Lawrence Carin

Commonsense question answering requires reasoning about everyday situations and causes and effects implicit in context.

Question Answering Representation Learning +1

Paper
Add Code

Laneformer: Object-aware Row-Column Transformers for Lane Detection

no code implementations • 18 Mar 2022 • Jianhua Han, Xiajun Deng, Xinyue Cai, Zhen Yang, Hang Xu, Chunjing Xu, Xiaodan Liang

We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is a long-standing research topic for visual perception in autonomous driving.

Autonomous Driving Decoder +2

Paper
Add Code

Dressing in the Wild by Watching Dance Videos

no code implementations • CVPR 2022 • Xin Dong, Fuwei Zhao, Zhenyu Xie, Xijin Zhang, Daniel K. Du, Min Zheng, Xiang Long, Xiaodan Liang, Jianchao Yang

While significant progress has been made in garment transfer, one of the most applicable directions of human-centric image generation, existing works overlook the in-the-wild imagery, presenting severe garment-person misalignment as well as noticeable degradation in fine texture details.

Image Generation Virtual Try-on

Paper
Add Code

ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts

no code implementations • CVPR 2022 • Bingqian Lin, Yi Zhu, Zicong Chen, Xiwen Liang, Jianzhuang Liu, Xiaodan Liang

Vision-Language Navigation (VLN) is a challenging task that requires an embodied agent to perform action-level modality alignment, i. e., make instruction-asked actions sequentially in complex visual environments.

Vision-Language Navigation

Paper
Add Code

BodyGAN: General-Purpose Controllable Neural Human Body Generation

no code implementations • CVPR 2022 • Chaojie Yang, Hanhui Li, Shengjie Wu, Shengkai Zhang, Haonan Yan, Nianhong Jiao, Jie Tang, Runnan Zhou, Xiaodan Liang, Tianxiang Zheng

This is because current methods mainly rely on a single pose/appearance model, which is limited in disentangling various poses and appearance in human images.

Disentanglement Image Generation +1

Paper
Add Code

Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation

no code implementations • CVPR 2022 • Mingjie Li, Wenjia Cai, Karin Verspoor, Shirui Pan, Xiaodan Liang, Xiaojun Chang

To endow models with the capability of incorporating expert knowledge, we propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG), in which clinical relation triples are injected into the visual features as prior knowledge to drive the decoding procedure.

Clinical Knowledge Decoder +1

Paper
Add Code

Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

no code implementations • 17 Jun 2022 • Xiao Dong, Xunlin Zhan, Yunchao Wei, XiaoYong Wei, YaoWei Wang, Minlong Lu, Xiaochun Cao, Xiaodan Liang

Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.

Retrieval

Paper
Add Code

Discourse-Aware Graph Networks for Textual Logical Reasoning

no code implementations • 4 Jul 2022 • Yinya Huang, Lemao Liu, Kun Xu, Meng Fang, Liang Lin, Xiaodan Liang

In this work, we propose logic structural-constraint modeling to solve the logical reasoning QA and introduce discourse-aware graph networks (DAGNs).

graph construction Logical Reasoning +3

Paper
Add Code

Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL

no code implementations • 1 Jun 2022 • Siyi Hu, Chuanlong Xie, Xiaodan Liang, Xiaojun Chang

In this study, we quantify the agent's behavior difference and build its relationship with the policy performance via {\bf Role Diversity}, a metric to measure the characteristics of MARL tasks.

SMAC+ Starcraft

Paper
Add Code

Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

no code implementations • 18 Jul 2022 • Quande Liu, Youpeng Wen, Jianhua Han, Chunjing Xu, Hang Xu, Xiaodan Liang

To bridge the gap between supervised semantic segmentation and real-world applications that acquires one model to recognize arbitrary new concepts, recent zero-shot segmentation attracts a lot of attention by exploring the relationships between unseen and seen object categories, yet requiring large amounts of densely-annotated data with diverse base classes.

Clustering Online Clustering +3

Paper
Add Code

PASTA-GAN++: A Versatile Framework for High-Resolution Unpaired Virtual Try-on

no code implementations • 27 Jul 2022 • Zhenyu Xie, Zaiyu Huang, Fuwei Zhao, Haoye Dong, Michael Kampffmeyer, Xin Dong, Feida Zhu, Xiaodan Liang

In this work, we take a step forwards to explore versatile virtual try-on solutions, which we argue should possess three main properties, namely, they should support unsupervised training, arbitrary garment categories, and controllable garment editing.

Disentanglement Image Generation +1

Paper
Add Code

ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design

no code implementations • 11 Aug 2022 • Xujie Zhang, Yu Sha, Michael C. Kampffmeyer, Zhenyu Xie, Zequn Jie, Chengwen Huang, Jianqing Peng, Xiaodan Liang

ARMANI discretizes an image into uniform tokens based on a learned cross-modal codebook in its first stage and uses a Transformer to model the distribution of image tokens for a real image given the tokens of the control signals in its second stage.

Image Generation

Paper
Add Code

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

no code implementations • 19 Sep 2022 • Xiwen Liang, Yangxin Wu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan Liang

Aiming towards a holistic understanding of multiple downstream tasks simultaneously, there is a need for extracting features with better transferability.

Autonomous Driving Multi-Task Learning +4

Paper
Add Code

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

no code implementations • 20 Sep 2022 • Lewei Yao, Jianhua Han, Youpeng Wen, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Chunjing Xu, Hang Xu

We further design a concept dictionary~(with descriptions) from various online sources and detection datasets to provide prior knowledge for each concept.

object-detection Open World Object Detection

Paper
Add Code

Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection

no code implementations • 2 Nov 2022 • Yanxin Long, Jianhua Han, Runhui Huang, Xu Hang, Yi Zhu, Chunjing Xu, Xiaodan Liang

Inspired by the success of vision-language methods (VLMs) in zero-shot classification, recent works attempt to extend this line of work into object detection by leveraging the localization ability of pre-trained VLMs and generating pseudo labels for unseen classes in a self-training manner.

Object object-detection +5

Paper
Add Code

Structure-Preserving 3D Garment Modeling with Neural Sewing Machines

no code implementations • 12 Nov 2022 • Xipeng Chen, Guangrun Wang, Dizhong Zhu, Xiaodan Liang, Philip H. S. Torr, Liang Lin

In this paper, we propose a novel Neural Sewing Machine (NSM), a learning-based framework for structure-preserving 3D garment modeling, which is capable of learning representations for garments with diverse shapes and topologies and is successfully applied to 3D garment reconstruction and controllable manipulation.

Garment Reconstruction Representation Learning

Paper
Add Code

3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation

no code implementations • 2 Dec 2022 • Zutao Jiang, Guansong Lu, Xiaodan Liang, Jihua Zhu, Wei zhang, Xiaojun Chang, Hang Xu

Here, we make the first attempt to achieve generic text-guided cross-category 3D object generation via a new 3D-TOGO model, which integrates a text-to-views generation module and a views-to-3D generation module.

3D Generation Contrastive Learning +2

Paper
Add Code

CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation

no code implementations • 4 Dec 2022 • ZiCheng Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Wei Ke

Then in the Sentence-Mask Alignment (SMA) module, the masks are weighted by the sentence embedding to localize the referred object, and finally projected back to aggregate the pixels for the target.

Image Segmentation Semantic Segmentation +3

Paper
Add Code

NLIP: Noise-robust Language-Image Pre-training

no code implementations • 14 Dec 2022 • Runhui Huang, Yanxin Long, Jianhua Han, Hang Xu, Xiwen Liang, Chunjing Xu, Xiaodan Liang

Large-scale cross-modal pre-training paradigms have recently shown ubiquitous success on a wide range of downstream tasks, e. g., zero-shot classification, retrieval and image captioning.

Image Captioning Memorization +3

Paper
Add Code

Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation

no code implementations • 13 Feb 2023 • Bingqian Lin, Yi Zhu, Xiaodan Liang, Liang Lin, Jianzhuang Liu

Vision-Language Navigation (VLN) is a challenging task which requires an agent to align complex visual observations to language instructions to reach the goal position.

Re-Ranking Vision-Language Navigation

Paper
Add Code

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

no code implementations • CVPR 2023 • Xiwen Liang, Minzhe Niu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan Liang

Multi-task learning has emerged as a powerful paradigm to solve a range of tasks simultaneously with good efficiency in both computation resources and inference time.

Autonomous Driving Lane Detection +4

Paper
Add Code

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

no code implementations • CVPR 2023 • Yanxin Long, Youpeng Wen, Jianhua Han, Hang Xu, Pengzhen Ren, Wei zhang, Shen Zhao, Xiaodan Liang

Besides, our CapDet also achieves state-of-the-art performance on dense captioning tasks, e. g., 15. 44% mAP on VG V1. 2 and 13. 98% on the VG-COCO dataset.

Dense Captioning

Paper
Add Code

CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

no code implementations • 22 Mar 2023 • Yihan Zeng, Chenhan Jiang, Jiageng Mao, Jianhua Han, Chaoqiang Ye, Qingqiu Huang, Dit-yan Yeung, Zhen Yang, Xiaodan Liang, Hang Xu

Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks.

Ranked #3 on Zero-shot 3D Point Cloud Classification on ScanNetV2

Zero-shot 3D Point Cloud Classification

Paper
Add Code

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment

no code implementations • CVPR 2023 • Lewei Yao, Jianhua Han, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Hang Xu

This paper presents DetCLIPv2, an efficient and scalable training framework that incorporates large-scale image-text pairs to achieve open-vocabulary object detection (OVD).

Language Modelling object-detection +1

Paper
Add Code

CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data

no code implementations • CVPR 2023 • Yihan Zeng, Chenhan Jiang, Jiageng Mao, Jianhua Han, Chaoqiang Ye, Qingqiu Huang, Dit-yan Yeung, Zhen Yang, Xiaodan Liang, Hang Xu

Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks.

Paper
Add Code

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

no code implementations • 1 Jun 2023 • Xiao Dong, Runhui Huang, XiaoYong Wei, Zequn Jie, Jianxing Yu, Jian Yin, Xiaodan Liang

Recent advances in vision-language pre-training have enabled machines to perform better in multimodal object discrimination (e. g., image-text semantic alignment) and image synthesis (e. g., text-to-image generation).

Contrastive Learning Retrieval +1

Paper
Add Code

Surfer: Progressive Reasoning with World Models for Robotic Manipulation

no code implementations • 20 Jun 2023 • Pengzhen Ren, Kaidong Zhang, Hetao Zheng, Zixuan Li, Yuhang Wen, Fengda Zhu, Mas Ma, Xiaodan Liang

To conduct a comprehensive and systematic evaluation of the robot manipulation model in terms of language understanding and physical execution, we also created a robotic manipulation benchmark with progressive reasoning tasks, called SeaWave.

Decision Making Natural Language Understanding +2

Paper
Add Code

CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation

no code implementations • 17 Jun 2023 • Xiwen Liang, Liang Ma, Shanshan Guo, Jianhua Han, Hang Xu, Shikui Ma, Xiaodan Liang

Understanding and following natural language instructions while navigating through complex, real-world environments poses a significant challenge for general-purpose robots.

Decision Making Instruction Following +4

Paper
Add Code

FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

no code implementations • ICCV 2023 • Zhijian Huang, Sihao Lin, Guiyu Liu, Mukun Luo, Chaoqiang Ye, Hang Xu, Xiaojun Chang, Xiaodan Liang

Specifically, the gradients, produced by the task heads and used to update the shared backbone, will be calibrated at the backbone's last layer to alleviate the task conflict.

Autonomous Driving Multi-Task Learning

Paper
Add Code

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation

no code implementations • ICCV 2023 • Kaixin Cai, Pengzhen Ren, Yi Zhu, Hang Xu, Jianzhuang Liu, Changlin Li, Guangrun Wang, Xiaodan Liang

To address this issue, we propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation that enhances a model's ability to reorganize patches mixed across images, exploring both local visual relevance and global semantic coherence.

Segmentation Semantic Segmentation +1

Paper
Add Code

LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts

no code implementations • ICCV 2023 • BinBin Yang, Yi Luo, Ziliang Chen, Guangrun Wang, Xiaodan Liang, Liang Lin

Thanks to the rapid development of diffusion models, unprecedented progress has been witnessed in image synthesis.

Layout-to-Image Generation Object +1

Paper
Add Code

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

no code implementations • ICCV 2023 • Runhui Huang, Jianhua Han, Guansong Lu, Xiaodan Liang, Yihan Zeng, Wei zhang, Hang Xu

DiffDis first formulates the image-text discriminative problem as a generative diffusion process of the text embedding from the text encoder conditioned on the image.

Image Generation Zero-Shot Learning

Paper
Add Code

Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos

no code implementations • ICCV 2023 • Haoyuan Li, Haoye Dong, Hanchao Jia, Dong Huang, Michael C. Kampffmeyer, Liang Lin, Xiaodan Liang

Multi-person 3D mesh recovery from videos is a critical first step towards automatic perception of group behavior in virtual reality, physical therapy and beyond.

Human Detection

Paper
Add Code

GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training

no code implementations • ICCV 2023 • Xinchi Deng, Han Shi, Runhui Huang, Changlin Li, Hang Xu, Jianhua Han, James Kwok, Shen Zhao, Wei zhang, Xiaodan Liang

Compared with the existing methods, GrowCLIP improves 2. 3% average top-1 accuracy on zero-shot image classification of 9 downstream tasks.

Image Classification Image Retrieval +2

Paper
Add Code

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

no code implementations • ICCV 2023 • Xujie Zhang, BinBin Yang, Michael C. Kampffmeyer, Wenqing Zhang, Shiyue Zhang, Guansong Lu, Liang Lin, Hang Xu, Xiaodan Liang

Cross-modal garment synthesis and manipulation will significantly benefit the way fashion designers generate garments and modify their designs via flexible linguistic interfaces. Current approaches follow the general text-to-image paradigm and mine cross-modal relations via simple cross-attention modules, neglecting the structural correspondence between visual and textual representations in the fashion design domain.

Attribute Constituency Parsing +1

Paper
Add Code

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

no code implementations • ICCV 2023 • Cuican Yu, Guansong Lu, Yihan Zeng, Jian Sun, Xiaodan Liang, Huibin Li, Zongben Xu, Songcen Xu, Wei zhang, Hang Xu

In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D faces using text guidance.

3D Shape Generation Contrastive Learning +2

Paper
Add Code

CLOMO: Counterfactual Logical Modification with Large Language Models

no code implementations • 29 Nov 2023 • Yinya Huang, Ruixin Hong, Hongming Zhang, Wei Shao, Zhicheng Yang, Dong Yu, ChangShui Zhang, Xiaodan Liang, Linqi Song

In this study, we delve into the realm of counterfactual reasoning capabilities of large language models (LLMs).

counterfactual Counterfactual Reasoning +1

Paper
Add Code

WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

no code implementations • 6 Dec 2023 • Xujie Zhang, Xiu Li, Michael Kampffmeyer, Xin Dong, Zhenyu Xie, Feida Zhu, Haoye Dong, Xiaodan Liang

Image-based Virtual Try-On (VITON) aims to transfer an in-shop garment image onto a target person.

Image Generation Virtual Try-on

Paper
Add Code

DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance

no code implementations • 5 Dec 2023 • Cong Wang, Jiaxi Gu, Panwen Hu, Songcen Xu, Hang Xu, Xiaodan Liang

Especially for fidelity, our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.

Image to Video Generation

Paper
Add Code

Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model

no code implementations • 18 Dec 2023 • Zhenyu Xie, Yang Wu, Xuehao Gao, Zhongqian Sun, Wei Yang, Xiaodan Liang

Besides, we introduce a multi-denoiser framework for the advanced diffusion model to ease the learning of high-dimensional model and fully explore the generative potential of the diffusion model.

Denoising Motion Synthesis

Paper
Add Code

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

no code implementations • 14 Jan 2024 • Jiaqi Chen, Bingqian Lin, ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong

Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-making and generalization abilities across various tasks.

Decision Making Vision and Language Navigation

Paper
Add Code

GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data

no code implementations • 9 Feb 2024 • Haoyuan Li, Yanpeng Zhou, Yihan Zeng, Hang Xu, Xiaodan Liang

3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions, which is curial to object identification, classification, and retrieval.

Language Modelling Retrieval

Paper
Add Code

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

no code implementations • 9 Mar 2024 • Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Liang Lin

For encouraging the agent to well capture the difference brought by perturbation, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts.

Contrastive Learning Navigate +1

Paper
Add Code

Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation

no code implementations • 13 Mar 2024 • ZiCheng Zhang, Tong Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Qixiang Ye, Wei Ke

To mitigate these issues, we propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information. Specifically, we leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings.

Decoder Language Modelling +2

Paper
Add Code

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model

no code implementations • 18 Mar 2024 • Runhui Huang, Kaixin Cai, Jianhua Han, Xiaodan Liang, Renjing Pei, Guansong Lu, Songcen Xu, Wei zhang, Hang Xu

Specifically, an inter-layer attention module is designed to encourage information exchange and learning between layers, while a text-guided intra-layer attention module incorporates layer-specific prompts to direct the specific-content generation for each layer.

Image Generation Style Transfer

Paper
Add Code

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

no code implementations • 14 Apr 2024 • Lewei Yao, Renjie Pi, Jianhua Han, Xiaodan Liang, Hang Xu, Wei zhang, Zhenguo Li, Dan Xu

This is followed by a fine-tuning stage that leverages a small number of high-resolution samples to further enhance detection performance.

Dense Captioning Language Modelling +4

Paper
Add Code

MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation

no code implementations • 1 May 2024 • Xujie Zhang, Ente Lin, Xiu Li, Yuxuan Luo, Michael Kampffmeyer, Xin Dong, Xiaodan Liang

Besides, to remove the segmentation dependency, MMTryon uses a parsing-free garment encoder and leverages a novel scalable data generation pipeline to convert existing VITON datasets to a form that allows MMTryon to be trained without requiring any explicit segmentation.

Segmentation Virtual Try-on

Paper
Add Code

Unbiased Math Word Problems Benchmark for Mitigating Solving Bias

2 code implementations • Findings (NAACL) 2022 • Zhicheng Yang, Jinghui Qin, Jiaqi Chen, Xiaodan Liang

However, current solvers exist solving bias which consists of data bias and learning bias due to biased dataset and improper training strategy.

Math

Paper
Code

Monocular 3D Hand Mesh Recovery via Dual Noise Estimation

1 code implementation • 26 Dec 2023 • Hanhui Li, Xiaojian Lin, Xuan Huang, Zejun Yang, Zhisheng Wang, Xiaodan Liang

However, due to the fixed hand topology and complex hand poses, current models are hard to generate meshes that are aligned with the image well.

Noise Estimation

Paper
Code

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

1 code implementation • 8 Dec 2021 • Xiwen Liang, Fengda Zhu, Yi Zhu, Bingqian Lin, Bing Wang, Xiaodan Liang

The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.

Contrastive Learning Navigate +1

Paper
Code

Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search

1 code implementation • CVPR 2022 • Minbin Huang, Zhijian Huang, Changlin Li, Xin Chen, Hang Xu, Zhenguo Li, Xiaodan Liang

It is able to find top 0. 16\% and 0. 29\% architectures on average on two search spaces under the budget of only 50 models.

Neural Architecture Search Relation

Paper
Code

Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers

1 code implementation • 16 Oct 2022 • Tao Tang, Changlin Li, Guangrun Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang

Despite the success, its development and application on self-supervised vision transformers have been hindered by several barriers, including the high search cost, the lack of supervision, and the unsuitable search space.

Data Augmentation Image Retrieval +3

Paper
Code

ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency

1 code implementation • 31 Jan 2023 • Pengzhen Ren, Changlin Li, Hang Xu, Yi Zhu, Guangrun Wang, Jianzhuang Liu, Xiaojun Chang, Xiaodan Liang

Specifically, we first propose text-to-views consistency modeling to learn correspondence for multiple views of the same input image.

Segmentation Semantic Segmentation

Paper
Code

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

1 code implementation • 30 Nov 2020 • Junfan Lin, Zhongzhan Huang, Keze Wang, Xiaodan Liang, Weiwei Chen, Liang Lin

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency.

Continuous Control Reinforcement Learning (RL)

Paper
Code

Boosting Visual-Language Models by Exploiting Hard Samples

1 code implementation • 9 May 2023 • Haonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi

In this work, we present HELIP, a cost-effective strategy tailored to enhance the performance of existing CLIP models without the need for training a model from scratch or collecting additional data.

Retrieval Zero-Shot Learning

Paper
Code

Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching

1 code implementation • 8 Jul 2019 • Ziliang Chen, Zhanfu Yang, Xiaoxi Wang, Xiaodan Liang, Xiaopeng Yan, Guanbin Li, Liang Lin

A broad range of cross-$m$-domain generation researches boil down to matching a joint distribution by deep generative models (DGMs).

Paper
Code

Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark

1 code implementation • 14 Feb 2022 • Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei zhang, Xin Jiang, Chunjing Xu, Hang Xu

Experiments show that Wukong can serve as a promising Chinese pre-training dataset and benchmark for different cross-modal learning methods.

Ranked #6 on Image Retrieval on MUGE Retrieval

Benchmarking Contrastive Learning +6

Paper
Code

SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

1 code implementation • 27 Jul 2022 • Mengxue Qu, Yu Wu, Wu Liu, Qiqi Gong, Xiaodan Liang, Olga Russakovsky, Yao Zhao, Yunchao Wei

Particularly, SiRi conveys a significant principle to the research of visual grounding, i. e., a better initialized vision-language encoder would help the model converge to a better local minimum, advancing the performance accordingly.

Visual Grounding

Paper
Code

3D Visibility-aware Generalizable Neural Radiance Fields for Interacting Hands

1 code implementation • 2 Jan 2024 • Xuan Huang, Hanhui Li, Zejun Yang, Zhisheng Wang, Xiaodan Liang

Subsequently, a feature fusion module that exploits the visibility of query points and mesh vertices is introduced to adaptively merge features of both hands, enabling the recovery of features in unseen areas.

Paper
Code

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

1 code implementation • 29 Jun 2021 • Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu

Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.

Machine Translation Style Transfer +3

Paper
Code

LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning

2 code implementations • 17 May 2022 • Zhicheng Yang, Jinghui Qin, Jiaqi Chen, Liang Lin, Xiaodan Liang

To address this issue and make a step towards interpretable MWP solving, we first construct a high-quality MWP dataset named InterMWP which consists of 11, 495 MWPs and annotates interpretable logical formulas based on algebraic knowledge as the grounded linguistic logic of each solution equation.

Math Math Word Problem Solving

Paper
Code

Don’t Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

1 code implementation • NAACL 2022 • Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu

Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.

Machine Translation Style Transfer +2

Paper
Code

Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

1 code implementation • 25 Nov 2022 • Zaiyu Huang, Hanhui Li, Zhenyu Xie, Michael Kampffmeyer, Qingling Cai, Xiaodan Liang

Existing methods are restricted in this setting as they estimate garment warping flows mainly based on 2D poses and appearance, which omits the geometric prior of the 3D human body shape.

Virtual Try-on

Paper
Code

Learning To Segment Every Referring Object Point by Point

1 code implementation • CVPR 2023 • Mengxue Qu, Yu Wu, Yunchao Wei, Wu Liu, Xiaodan Liang, Yao Zhao

Extensive experiments show that our model achieves 52. 06% in terms of accuracy (versus 58. 93% in fully supervised setting) on RefCOCO+@testA, when only using 1% of the mask annotations.

Object Referring Expression +1

Paper
Code

Speak Like a Native: Prompting Large Language Models in a Native Style

1 code implementation • 22 Nov 2023 • Zhicheng Yang, Yiwei Wang, Yinya Huang, Jing Xiong, Xiaodan Liang, Jing Tang

Specifically, with AlignedCoT, we observe an average +3. 2\% improvement for \texttt{gpt-3. 5-turbo} compared to the carefully handcrafted CoT on multi-step reasoning benchmarks. Furthermore, we use AlignedCoT to rewrite the CoT text style in the training set, which improves the performance of Retrieval Augmented Generation by 3. 6\%. The source code and dataset is available at https://github. com/yangzhch6/AlignedCoT

Common Sense Reasoning GSM8K +3

Paper
Code

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

1 code implementation • CVPR 2022 • BinBin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, Xiaodan Liang

To make ROSETTA automatically determine which experience is available and useful, a prototypical task correlation guided Gating Diversity Controller(GDC) is introduced to adaptively adjust the diversity of gates for the new task based on class-specific prototypes.

Continual Learning Object +2

Paper
Code

TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models

1 code implementation • 16 Oct 2023 • Jing Xiong, Jianhao Shen, Ye Yuan, Haiming Wang, Yichun Yin, Zhengying Liu, Lin Li, Zhijiang Guo, Qingxing Cao, Yinya Huang, Chuanyang Zheng, Xiaodan Liang, Ming Zhang, Qun Liu

Automated theorem proving (ATP) has become an appealing domain for exploring the reasoning ability of the recent successful generative language models.

Automated Theorem Proving Benchmarking +1

Paper
Code

MLP Can Be A Good Transformer Learner

1 code implementation • 8 Apr 2024 • Sihao Lin, Pumeng Lyu, Dongrui Liu, Tao Tang, Xiaodan Liang, Andy Song, Xiaojun Chang

We identify that regarding the attention layer in bottom blocks, their subsequent MLP layers, i. e. two feed-forward layers, can elicit the same entropy quantity.

Paper
Code

MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure

3 code implementations • 22 Oct 2022 • Yinya Huang, Hongming Zhang, Ruixin Hong, Xiaodan Liang, ChangShui Zhang, Dong Yu

To this end, we propose a comprehensive logical reasoning explanation form.

Logical Reasoning

Paper
Code

TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

1 code implementation • 29 Apr 2024 • Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

To address this issue, we introduce TheaterGen, a training-free framework that integrates large language models (LLMs) and text-to-image (T2I) models to provide the capability of multi-turn image generation.

Denoising Image Generation +2

Paper
Code

Vision-Language Navigation with Random Environmental Mixup

1 code implementation • ICCV 2021 • Chong Liu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang, ZongYuan Ge, Yi-Dong Shen

Then, we cross-connect the key views of different scenes to construct augmented scenes.

Ranked #38 on Vision and Language Navigation on VLN Challenge

Data Augmentation Navigate +1

Paper
Code

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

1 code implementation • 23 Jul 2021 • Bingqian Lin, Yi Zhu, Yanxin Long, Xiaodan Liang, Qixiang Ye, Liang Lin

Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.

Vision and Language Navigation Vision-Language Navigation

Paper
Code

Recognizing Focal Liver Lesions in Contrast-Enhanced Ultrasound with Discriminatively Trained Spatio-Temporal Model

1 code implementation • 3 Feb 2015 • Xiaodan Liang, Qingxing Cao, Rui Huang, Liang Lin

The aim of this study is to provide an automatic computational framework to assist clinicians in diagnosing Focal Liver Lesions (FLLs) in Contrast-Enhancement Ultrasound (CEUS).

Paper
Code

DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning

1 code implementation • 4 Oct 2023 • Jing Xiong, Zixuan Li, Chuanyang Zheng, Zhijiang Guo, Yichun Yin, Enze Xie, Zhicheng Yang, Qingxing Cao, Haiming Wang, Xiongwei Han, Jing Tang, Chengming Li, Xiaodan Liang

Dual Queries first query LLM to obtain LLM-generated knowledge such as CoT, then query the retriever to obtain the final exemplars via both question and the knowledge.

Dimensionality Reduction In-Context Learning +1

Paper
Code

NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning

1 code implementation • 12 Mar 2024 • Bingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang

Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.

Navigate Vision and Language Navigation

Paper
Code

Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems

1 code implementation • EMNLP 2020 • Jinghui Qin, Lihui Lin, Xiaodan Liang, Rumin Zhang, Liang Lin

A practical automatic textual math word problems (MWPs) solver should be able to solve various textual MWPs while most existing works only focused on one-unknown linear MWPs.

Ranked #10 on Math Word Problem Solving on ALG514

Decoder Math +1

Paper
Code

Data-to-Text Generation with Style Imitation

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shuai Lin, Wentao Wang, Zichao Yang, Xiaodan Liang, Frank F. Xu, Eric Xing, Zhiting Hu

That is, the model learns to imitate the writing style of any given exemplar sentence, with automatic adaptions to faithfully describe the content record.

Data-to-Text Generation Sentence +1

Paper
Code

"My nose is running.""Are you also coughing?": Building A Medical Diagnosis Agent with Interpretable Inquiry Logics

1 code implementation • 29 Apr 2022 • Wenge Liu, Yi Cheng, Hao Wang, Jianheng Tang, Yafei Liu, Ruihui Zhao, Wenjie Li, Yefeng Zheng, Xiaodan Liang

In this paper, we explore how to bring interpretability to data-driven DSMD.

Medical Diagnosis

Paper
Code

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

1 code implementation • 27 Feb 2024 • Tao Tang, Guangrun Wang, Yixing Lao, Peng Chen, Jie Liu, Liang Lin, Kaicheng Yu, Xiaodan Liang

Through extensive experiments across various datasets and scenes, we demonstrate the effectiveness of our approach in facilitating better interaction between LiDAR and camera modalities within a unified neural field.

Novel View Synthesis

Paper
Code

LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling

1 code implementation • 18 Apr 2016 • Zhen Li, Yukang Gan, Xiaodan Liang, Yizhou Yu, Hui Cheng, Liang Lin

Another long short-term memorized fusion layer is set up to integrate the contexts along the vertical direction from different channels, and perform bi-directional propagation of the fused vertical contexts along the horizontal direction to obtain true 2D global contexts.

Scene Labeling

Paper
Code

SparseBERT: Rethinking the Importance Analysis in Self-attention

1 code implementation • 25 Feb 2021 • Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok

A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.

Paper
Code

SOON: Scenario Oriented Object Navigation with Graph-based Exploration

1 code implementation • CVPR 2021 • Fengda Zhu, Xiwen Liang, Yi Zhu, Xiaojun Chang, Xiaodan Liang

In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description.

Ranked #5 on Visual Navigation on SOON Test

Attribute Navigate +2

Paper
Code

Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation

1 code implementation • 8 Feb 2022 • Li Liu, Qingle Huang, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun Chang, Xiaodan Liang

Extensive experiments on two vision tasks, includ-ing ImageNet classification and Pascal VOC segmentation, demonstrate the superiority of our ICKD, which consis-tently outperforms many existing methods, advancing thestate-of-the-art in the fields of Knowledge Distillation.

Knowledge Distillation

Paper
Code

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

1 code implementation • 26 Apr 2023 • Bingqian Lin, Zicong Chen, Mingjie Li, Haokun Lin, Hang Xu, Yi Zhu, Jianzhuang Liu, Wenjia Cai, Lei Yang, Shen Zhao, Chenfei Wu, Ling Chen, Xiaojun Chang, Yi Yang, Lei Xing, Xiaodan Liang

In MOTOR, we combine two kinds of basic medical knowledge, i. e., general and specific knowledge, in a complementary manner to boost the general pretraining process.

Medical Visual Question Answering Question Answering +1

Paper
Code

RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment

1 code implementation • 31 May 2023 • Guian Fang, Zutao Jiang, Jianhua Han, Guansong Lu, Hang Xu, Shengcai Liao, Xiaodan Liang

Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from textual descriptions.

Caption Generation Language Modelling +3

Paper
Code

AutoLoss: Learning Discrete Schedules for Alternate Optimization

1 code implementation • 4 Oct 2018 • Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing

Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters.

Image Generation Machine Translation +4

Paper
Code

Symbolic Graph Reasoning Meets Convolutions

1 code implementation • NeurIPS 2018 • Xiaodan Liang, Zhiting Hu, Hao Zhang, Liang Lin, Eric P. Xing

To cooperate with local convolutions, each SGR is constituted by three modules: a) a primal local-to-semantic voting module where the features of all symbolic nodes are generated by voting from local representations; b) a graph reasoning module propagates information over knowledge graph to achieve global semantic coherency; c) a dual semantic-to-local mapping module learns new associations of the evolved symbolic nodes with local representations, and accordingly enhances local features.

Ranked #81 on Semantic Segmentation on ADE20K val

Image Classification Semantic Segmentation

Paper
Code

Vision-Dialog Navigation by Exploring Cross-modal Memory

1 code implementation • CVPR 2020 • Yi Zhu, Fengda Zhu, Zhaohuan Zhan, Bingqian Lin, Jianbin Jiao, Xiaojun Chang, Xiaodan Liang

Benefiting from the collaborative learning of the L-mem and the V-mem, our CMN is able to explore the memory about the decision making of historical navigation actions which is for the current step.

Decision Making

Paper
Code

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

2 code implementations • NeurIPS 2020 • Yangxin Wu, Gengwei Zhang, Hang Xu, Xiaodan Liang, Liang Lin

In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm.

Instance Segmentation Panoptic Segmentation +2

Paper
Code

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift

1 code implementation • ICCV 2021 • Jiefeng Peng, Jiqi Zhang, Changlin Li, Guangrun Wang, Xiaodan Liang, Liang Lin

We attribute this ranking correlation problem to the supernet training consistency shift, including feature shift and parameter shift.

Attribute Neural Architecture Search

Paper
Code

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

1 code implementation • ACL 2022 • Xiwen Liang, Fengda Zhu, Lingling Li, Hang Xu, Xiaodan Liang

To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP).

Domain Adaptation Vision-Language Navigation

Paper
Code

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

1 code implementation • NeurIPS 2020 • Wangchunshu Zhou, Jinyi Hu, HANLIN ZHANG, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang

In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.

Explanation Generation Natural Language Understanding

Paper
Code

Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks

1 code implementation • ACL 2021 • Jinghui Qin, Xiaodan Liang, Yining Hong, Jianheng Tang, Liang Lin

Previous math word problem solvers following the encoder-decoder paradigm fail to explicitly incorporate essential math symbolic constraints, leading to unexplainable and unreasonable predictions.

Decoder Math

Paper
Code

Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning

2 code implementations • 25 May 2022 • Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, Lingpeng Kong

In this paradigm, the synthesized data from the PLM acts as the carrier of knowledge, which is used to train a task-specific model with orders of magnitude fewer parameters than the PLM, achieving both higher performance and efficiency than prompt-based zero-shot learning methods on PLMs.

text-classification Text Classification +1

Paper
Code

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

2 code implementations • CVPR 2021 • Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

While existing NAS methods mostly design architectures on a single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.

Neural Architecture Search Transfer Learning

Paper
Code

Knowledge Distillation via the Target-aware Transformer

2 code implementations • CVPR 2022 • Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang, Gang Wang

To this end, we propose a novel one-to-all spatial matching knowledge distillation approach.

Knowledge Distillation

Paper
Code

Prototypical Graph Contrastive Learning

1 code implementation • 17 Jun 2021 • Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang

However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.

Clustering Contrastive Learning +1

Paper
Code

MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

1 code implementation • 14 Feb 2024 • Yinya Huang, Xiaohan Lin, Zhengying Liu, Qingxing Cao, Huajian Xin, Haiming Wang, Zhenguo Li, Linqi Song, Xiaodan Liang

Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving.

Automated Theorem Proving Language Modelling +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.