no code implementations • 11 Nov 2014 • Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan
Then the concept detector can be fine-tuned based on these new instances.
no code implementations • 1 Feb 2015 • Xiaodan Liang, Liang Lin, Liangliang Cao
Action recognition is an important problem in multimedia understanding.
no code implementations • 2 Feb 2015 • Liang Lin, Yuanlu Xu, Xiaodan Liang, Jian-Huang Lai
Although it has been widely discussed in video surveillance, background subtraction is still an open problem in the context of complex scenarios, e. g., dynamic backgrounds, illumination variations, and indistinct foreground objects.
1 code implementation • 3 Feb 2015 • Xiaodan Liang, Qingxing Cao, Rui Huang, Liang Lin
The aim of this study is to provide an automatic computational framework to assist clinicians in diagnosing Focal Liver Lesions (FLLs) in Contrast-Enhancement Ultrasound (CEUS).
1 code implementation • 9 Mar 2015 • Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, Shuicheng Yan
The first CNN network is with max-pooling, and designed to predict the template coefficients for each label mask, while the second CNN network is without max-pooling to preserve sensitivity to label mask position and accurately predict the active shape parameters.
no code implementations • CVPR 2015 • Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, Shuicheng Yan
Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image.
no code implementations • 9 Sep 2015 • Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao Yang, Liang Lin, Shuicheng Yan
Instance-level object segmentation is an important yet under-explored task.
1 code implementation • 10 Sep 2015 • Yunchao Wei, Xiaodan Liang, Yunpeng Chen, Xiaohui Shen, Ming-Ming Cheng, Jiashi Feng, Yao Zhao, Shuicheng Yan
Then, a better network called Enhanced-DCNN is learned with supervision from the predicted segmentation masks of simple images based on the Initial-DCNN as well as the image-level annotations.
no code implementations • 28 Oct 2015 • Jianan Li, Xiaodan Liang, ShengMei Shen, Tingfa Xu, Jiashi Feng, Shuicheng Yan
Taking pedestrian detection as an example, we illustrate how we can leverage this philosophy to develop a Scale-Aware Fast R-CNN (SAF R-CNN) framework.
Ranked #23 on Pedestrian Detection on Caltech
no code implementations • CVPR 2016 • Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, Shuicheng Yan
The long chains of sequential computation by stacked LG-LSTM layers also enable each pixel to sense a much larger region for inference benefiting from the memorization of previous dependencies in all positions along all dimensions.
no code implementations • CVPR 2016 • Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Zequn Jie, Jiashi Feng, Liang Lin, Shuicheng Yan
By being reversible, the proposal refinement sub-network adaptively determines an optimal number of refinement iterations required for each proposal during both training and testing.
no code implementations • ICCV 2015 • Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan
Then the concept detector can be fine-tuned based on these new instances.
no code implementations • ICCV 2015 • Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan
In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.
no code implementations • 19 Jan 2016 • Zequn Jie, Xiaodan Liang, Jiashi Feng, Wen Feng Lu, Eng Hock Francis Tay, Shuicheng Yan
In particular, in order to improve the localization accuracy, a fully convolutional network is employed which predicts locations of object proposals for each pixel.
no code implementations • 23 Mar 2016 • Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, Shuicheng Yan
By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data.
no code implementations • 24 Mar 2016 • Jianan Li, Yunchao Wei, Xiaodan Liang, Jian Dong, Tingfa Xu, Jiashi Feng, Shuicheng Yan
We provide preliminary answers to these questions through developing a novel Attention to Context Convolution Neural Network (AC-CNN) based object detection model.
no code implementations • 7 Apr 2016 • Zhanglin Peng, Ruimao Zhang, Xiaodan Liang, Xiaobai Liu, Liang Lin
This paper addresses the problem of geometric scene parsing, i. e. simultaneously labeling geometric surfaces (e. g. sky, ground and vertical plane) and determining the interaction relations (e. g. layering, supporting, siding and affinity) between main regions.
no code implementations • CVPR 2016 • Liang Lin, Guangrun Wang, Rui Zhang, Ruimao Zhang, Xiaodan Liang, WangMeng Zuo
This paper addresses a fundamental problem of scene understanding: How to parse the scene image into a structured configuration (i. e., a semantic object hierarchy with object interaction relations) that finely accords with human perception.
1 code implementation • 18 Apr 2016 • Zhen Li, Yukang Gan, Xiaodan Liang, Yizhou Yu, Hui Cheng, Liang Lin
Another long short-term memorized fusion layer is set up to integrate the contexts along the vertical direction from different channels, and perform bi-directional propagation of the fused vertical contexts along the horizontal direction to obtain true 2D global contexts.
no code implementations • 24 Jul 2016 • Liliang Zhang, Liang Lin, Xiaodan Liang, Kaiming He
Detecting pedestrian has been arguably addressed as a special topic beyond general object detection.
Ranked #19 on Pedestrian Detection on Caltech
no code implementations • 24 Jul 2016 • Xiangyun Zhao, Xiaodan Liang, Luoqi Liu, Teng Li, Yugang Han, Nuno Vasconcelos, Shuicheng Yan
Objective functions for training of deep networks for face-related recognition tasks, such as facial expression recognition (FER), usually consider each sample independently.
Ranked #2 on Facial Expression Recognition (FER) on Oulu-CASIA
no code implementations • 13 Aug 2016 • Keze Wang, Shengfu Zhai, Hui Cheng, Xiaodan Liang, Liang Lin
In this paper, we propose a novel inference-embedded multi-task learning framework for predicting human pose from still depth images, which is implemented with a deep architecture of neural networks.
no code implementations • 18 Aug 2016 • Jianan Li, Xiaodan Liang, Jianshu Li, Tingfa Xu, Jiashi Feng, Shuicheng Yan
Most of existing detection pipelines treat object proposals independently and predict bounding box locations and classification scores over them separately.
3 code implementations • ICML 2017 • Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, Eric P. Xing
Generic generation and manipulation of text is challenging and has limited success compared to recent deep generative modeling in visual domain.
no code implementations • NeurIPS 2016 • Zequn Jie, Xiaodan Liang, Jiashi Feng, Xiaojie Jin, Wen Feng Lu, Shuicheng Yan
Therefore, Tree-RL can better cover different objects with various scales which is quite appealing in the context of object proposal.
1 code implementation • CVPR 2017 • Xiaodan Liang, Lisa Lee, Eric P. Xing
To capture such global interdependency, we propose a deep Variation-structured Reinforcement Learning (VRL) framework to sequentially discover object relationships and attributes in the whole image.
no code implementations • CVPR 2017 • Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing
Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network optimization.
1 code implementation • CVPR 2017 • Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, Liang Lin
Human parsing has recently attracted a lot of research interests due to its huge application potentials.
Ranked #13 on Semantic Segmentation on LIP val
no code implementations • 21 Mar 2017 • Hao Wang, Xiaodan Liang, Hao Zhang, Dit-yan Yeung, Eric P. Xing
We cast this problem as manipulating an input image according to a parametric model whose key parameters can be conditionally generated from any guiding signal (even unseen ones).
no code implementations • ICCV 2017 • Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators.
Generative Adversarial Network Image Paragraph Captioning +1
no code implementations • ICCV 2017 • Prasoon Goyal, Zhiting Hu, Xiaodan Liang, Chenyu Wang, Eric Xing
In this work, we propose hierarchical nonparametric variational autoencoders, which combines tree-structured Bayesian nonparametric priors with VAEs, to enable infinite flexibility of the latent representation space.
no code implementations • CVPR 2017 • Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, Shuicheng Yan
We investigate a principle way to progressively mine discriminative object regions using classification networks to address the weakly-supervised semantic segmentation problems.
no code implementations • 26 Mar 2017 • Wei Dai, Joseph Doyle, Xiaodan Liang, Hao Zhang, Nanqing Dong, Yuan Li, Eric P. Xing
Through this adversarial process the critic network learns the higher order structures and guides the segmentation model to achieve realistic segmentation outcomes.
no code implementations • 11 Jun 2017 • Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P. Xing
We show that Poseidon enables Caffe and TensorFlow to achieve 15. 5x speed-up on 16 single-GPU machines, even with limited bandwidth (10GbE) and the challenging VGG19-22K network for image classification.
no code implementations • CVPR 2017 • Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, Shuicheng Yan
In this work, we address the small object detection problem by developing a single architecture that internally lifts representations of small objects to "super-resolved" ones, achieving similar characteristics as large objects and thus more discriminative for detection.
no code implementations • CVPR 2017 • Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, Hui Cheng
3D human articulated pose recovery from monocular image sequences is very challenging due to the diverse appearances, viewpoints, occlusions, and also the human 3D pose is inherently ambiguous from the monocular imagery.
Ranked #20 on 3D Human Pose Estimation on HumanEva-I
no code implementations • 1 Aug 2017 • Xiaodan Liang, Hao Zhang, Eric P. Xing
Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired image-to-image translation, such as photo$\rightarrow$ sketch and artist painting style transfer.
Ranked #4 on Facial Expression Translation on CelebA
no code implementations • ICCV 2017 • Xiaodan Liang, Lisa Lee, Wei Dai, Eric P. Xing
To make both synthesized future frames and flows indistinguishable from reality, a dual adversarial training method is proposed to ensure that the future-flow prediction is able to help infer realistic future-frames, while the future-frame prediction in turn leads to realistic optical flows.
no code implementations • ICCV 2017 • Yuan Yuan, Xiaodan Liang, Xiaolong Wang, Dit-yan Yeung, Abhinav Gupta
A common issue, however, is that objects of interest that are not involved in human actions are often absent in global action descriptions known as "missing label".
Ranked #3 on Weakly Supervised Object Detection on Charades
no code implementations • CVPR 2017 • Qingxing Cao, Liang Lin, Yukai Shi, Xiaodan Liang, Guanbin Li
Face hallucination is a domain-specific super-resolution problem with the goal to generate high-resolution (HR) faces from low-resolution (LR) input images.
no code implementations • 4 Oct 2017 • Xiaodan Liang, Yunchao Wei, Liang Lin, Yunpeng Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan
An intuition on human segmentation is that when a human is moving in a video, the video-context (e. g., appearance and motion clues) may potentially infer reasonable mask information for the whole human body.
1 code implementation • NeurIPS 2017 • Zhijie Deng, Hao Zhang, Xiaodan Liang, Luona Yang, Shizhen Xu, Jun Zhu, Eric P. Xing
We study the problem of conditional generative modeling based on designated semantics or structures.
no code implementations • 2 Jan 2018 • Yu-jia Zhang, Xiaodan Liang, Dingwen Zhang, Min Tan, Eric P. Xing
Unsupervised video summarization plays an important role on digesting, browsing, and searching the ever-growing videos every day, and the underlying fine-grained semantic and motion information (i. e., objects of interest and their key motions) in online videos has been barely touched.
1 code implementation • 5 Jan 2018 • Peilun Li, Xiaodan Liang, Daoyuan Jia, Eric P. Xing
It presents two main contributions to traditional GANs: 1) a soft gradient-sensitive objective for keeping semantic boundaries; 2) a semantic-aware discriminator for validating the fidelity of personalized adaptions with respect to each semantic region.
no code implementations • ECCV 2018 • Luona Yang, Xiaodan Liang, Tairui Wang, Eric Xing
In the spectrum of vision-based autonomous driving, vanilla end-to-end models are not interpretable and suboptimal in performance, while mediated perception models require additional intermediate representations such as segmentation masks or detection bounding boxes, whose annotation can be prohibitively expensive as we move to a larger scale.
no code implementations • 12 Feb 2018 • Chang Liu, Xiangrui Zeng, Ruogu Lin, Xiaodan Liang, Zachary Freyberg, Eric Xing, Min Xu
Cellular Electron Cryo-Tomography (CECT) is a powerful imaging technique for the 3D visualization of cellular structure and organization at submolecular resolution.
no code implementations • CVPR 2018 • Xiaodan Liang, Hongfei Zhou, Eric Xing
Moreoever, we demonstrate a universal segmentation model that is jointly trained on diverse datasets can surpass the performance of the common fine-tuning scheme for exploiting multiple domain knowledge.
Ranked #61 on Semantic Segmentation on Cityscapes test
no code implementations • CVPR 2018 • Qingxing Cao, Xiaodan Liang, Bailing Li, Guanbin Li, Liang Lin
This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to compose the previously mined evidence.
3 code implementations • 5 Apr 2018 • Xiaodan Liang, Ke Gong, Xiaohui Shen, Liang Lin
To further explore and take advantage of the semantic correlation of these two tasks, we propose a novel joint human parsing and pose estimation network to explore efficient context modeling, which can simultaneously predict parsing and pose with extremely high quality.
Ranked #10 on Semantic Segmentation on LIP val
no code implementations • 20 Apr 2018 • Michael Kampffmeyer, Nanqing Dong, Xiaodan Liang, Yu-jia Zhang, Eric P. Xing
We argue that semantic salient segmentation can instead be effectively resolved by reformulating it as a simple yet intuitive pixel-pair based connectivity prediction task.
no code implementations • 30 Apr 2018 • Yu-jia Zhang, Michael Kampffmeyer, Xiaodan Liang, Dingwen Zhang, Min Tan, Eric P. Xing
Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner.
no code implementations • 12 May 2018 • Kai Wen Wang, Xiangrui Zeng, Xiaodan Liang, Zhiguang Huo, Eric P. Xing, Min Xu
Cellular Electron CryoTomography (CECT) is a 3D imaging technique that captures information about the structure and spatial organization of macromolecular complexes within single cells, in near-native state and at sub-molecular resolution.
no code implementations • NeurIPS 2018 • Christy Y. Li, Xiaodan Liang, Zhiting Hu, Eric P. Xing
Experiments show that our approach achieves the state-of-the-art results on two medical report datasets, generating well-balanced structured sentences with robust coverage of heterogeneous medical report contents.
3 code implementations • CVPR 2019 • Michael Kampffmeyer, Yinbo Chen, Xiaodan Liang, Hao Wang, Yu-jia Zhang, Eric P. Xing
Graph convolutional neural networks have recently shown great potential for the task of zero-shot learning.
no code implementations • CVPR 2018 • Junwei Han, Le Yang, Dingwen Zhang, Xiaojun Chang, Xiaodan Liang
In this paper, we formulate this problem as a Markov Decision Process, where agents are learned to segment object regions under a deep reinforcement learning framework.
no code implementations • NeurIPS 2018 • Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Xiaodan Liang, Lianhui Qin, Haoye Dong, Eric Xing
The broad set of deep generative models (DGMs) has achieved remarkable advances.
no code implementations • WS 2018 • Zhiting Hu, Zichao Yang, Tiancheng Zhao, Haoran Shi, Junxian He, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Lianhui Qin, Devendra Singh Chaplot, Bowen Tan, Xingjiang Yu, Eric Xing
The features make Texar particularly suitable for technique sharing and generalization across different text generation applications.
no code implementations • 10 Jul 2018 • Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Zeya Wang, Wei Dai, Eric P. Xing
Specifically, we propose a model that enforces our intuition that prediction masks should be domain independent.
no code implementations • 10 Jul 2018 • Rajesh Chidambaram, Michael Kampffmeyer, Willie Neiswanger, Xiaodan Liang, Thomas Lachmann, Eric Xing
Analogously, this paper introduces geometric generalization based zero-shot learning tests to measure the rapid learning ability and the internal consistency of deep generative models.
no code implementations • ECCV 2018 • Xiaodan Liang, Tairui Wang, Luona Yang, Eric Xing
To our knowledge, this is the first successful case of the learned driving policy through reinforcement learning in the high-fidelity simulator, which performs better-than supervised imitation learning.
no code implementations • 17 Jul 2018 • Yu-jia Zhang, Michael Kampffmeyer, Xiaodan Liang, Min Tan, Eric P. Xing
Video summarization plays an important role in video understanding by selecting key frames/shots.
5 code implementations • ECCV 2018 • Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, Meng Yang
Second, to alleviate boundary artifacts of warped clothes and make the results more realistic, we employ a Try-On Module that learns a composition mask to integrate the warped clothes and the rendered image to ensure smoothness.
no code implementations • 29 Jul 2018 • Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Zeya Wang, Wei Dai, Eric P. Xing
Motivated by the zoom-in operation of a pathologist using a digital microscope, RAZN learns a policy network to decide whether zooming is required in a given region of interest.
1 code implementation • ECCV 2018 • Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin
Instance-level human parsing towards real-world human analysis scenarios is still under-explored due to the absence of sufficient data resources and technical difficulty in parsing multiple instances in a single pass.
Ranked #6 on Human Part Segmentation on CIHP
1 code implementation • 2 Aug 2018 • Qixian Zhou, Xiaodan Liang, Ke Gong, Liang Lin
Beyond the existing single-person and multiple-person human parsing tasks in static images, this paper makes the first attempt to investigate a more realistic video instance-level human parsing that simultaneously segments out each person instance and parses each instance into more fine-grained parts (e. g., head, leg, dress).
no code implementations • 19 Aug 2018 • Bingqian Lin, Yuan Xie, Yanyun Qu, Cuihua Li, Xiaodan Liang
To our best knowledge, this is the first work to model the multi-view clustering in a deep joint framework, which will provide a meaningful thinking in unsupervised multi-view learning.
no code implementations • ECCV 2018 • Xiaodan Liang, Hao Zhang, Liang Lin, Eric Xing
Despite the promising results on paired/unpaired image-to-image translation achieved by Generative Adversarial Networks (GANs), prior works often only transfer the low-level information (e. g. color or texture changes), but fail to manipulate high-level semantic meanings (e. g., geometric structure or content) of different object regions.
no code implementations • ECCV 2018 • Xiaojun Chang, Po-Yao Huang, Yi-Dong Shen, Xiaodan Liang, Yi Yang, Alexander G. Hauptmann
In this paper, we address this problem by training relational context-aware agents which learn the actions to localize the target person from the gallery of whole scene images.
no code implementations • ECCV 2018 • Liang-Yan Gui, Yu-Xiong Wang, Xiaodan Liang, Jose M. F. Moura
We explore an approach to forecasting human motion in a few milliseconds given an input 3D skeleton sequence based on a recurrent encoder-decoder framework.
4 code implementations • ACL 2019 • Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wangrong Zhu, Devendra Singh Sachan, Eric P. Xing
The versatile toolkit also fosters technique sharing across different text generation tasks.
no code implementations • 6 Sep 2018 • Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin
Collaborative reasoning for understanding image-question pairs is a very critical but underexplored topic in interpretable visual question answering systems.
no code implementations • ICLR 2019 • Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing
Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters.
1 code implementation • 4 Oct 2018 • Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing
Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters.
no code implementations • NeurIPS 2018 • Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, Jian Yin
Despite remarkable advances in image synthesis research, existing works often fail in manipulating images under the context of large geometric transformations.
1 code implementation • NeurIPS 2018 • Xiaodan Liang, Zhiting Hu, Hao Zhang, Liang Lin, Eric P. Xing
To cooperate with local convolutions, each SGR is constituted by three modules: a) a primal local-to-semantic voting module where the features of all symbolic nodes are generated by voting from local representations; b) a graph reasoning module propagates information over knowledge graph to achieve global semantic coherency; c) a dual semantic-to-local mapping module learns new associations of the evolved symbolic nodes with local representations, and accordingly enhances local features.
Ranked #81 on Semantic Segmentation on ADE20K val
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shuai Lin, Wentao Wang, Zichao Yang, Xiaodan Liang, Frank F. Xu, Eric Xing, Zhiting Hu
That is, the model learns to imitate the writing style of any given exemplar sentence, with automatic adaptions to faithfully describe the content record.
1 code implementation • 30 Jan 2019 • Lin Xu, Qixian Zhou, Ke Gong, Xiaodan Liang, Jianheng Tang, Liang Lin
Besides the challenges for conversational dialogue systems (e. g. topic transition coherency and question understanding), automatic medical diagnosis further poses more critical requirements for the dialogue rationality in the context of medical knowledge and symptom-disease relations.
no code implementations • ICCV 2019 • Haoye Dong, Xiaodan Liang, Bochao Wang, Hanjiang Lai, Jia Zhu, Jian Yin
Given an input person image, a desired clothes image, and a desired pose, the proposed Multi-pose Guided Virtual Try-on Network (MG-VTON) can generate a new person image after fitting the desired clothes into the input image and manipulating human poses.
Ranked #1 on Virtual Try-on on Deep-Fashion
no code implementations • 25 Mar 2019 • Christy Y. Li, Xiaodan Liang, Zhiting Hu, Eric P. Xing
Generating long and semantic-coherent reports to describe medical images poses great challenges towards bridging visual and linguistic modalities, incorporating medical domain knowledge, and generating realistic and accurate descriptions.
1 code implementation • CVPR 2019 • Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin
By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity.
no code implementations • ICLR 2019 • Yuan Li, Xiaodan Liang, Zhiting Hu, Yinbo Chen, Eric P. Xing
Graph neural networks (GNN) have gained increasing research interests as a mean to the challenging goal of robust and universal graph learning.
2 code implementations • ACL 2019 • Jianheng Tang, Tiancheng Zhao, Chenyan Xiong, Xiaodan Liang, Eric P. Xing, Zhiting Hu
We study the problem of imposing conversational goals on open-domain chat agents.
no code implementations • CVPR 2019 • Xiaodan Liang
Learning semantic configurations and activation of modules to align well with structured knowledge can be regarded as a decision-making procedure, which is solved by a new graph-based reinforcement learning algorithm.
no code implementations • CVPR 2020 • Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin
Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value.
1 code implementation • CVPR 2019 • Ziliang Chen, Jingyu Zhuang, Xiaodan Liang, Liang Lin
(Unsupervised) Domain Adaptation (DA) seeks for classifying target instances when solely provided with source labeled and target unlabeled examples for training.
Ranked #3 on Multi-target Domain Adaptation on Office-Home
1 code implementation • 8 Jul 2019 • Ziliang Chen, Zhanfu Yang, Xiaoxi Wang, Xiaodan Liang, Xiaopeng Yan, Guanbin Li, Liang Lin
A broad range of cross-$m$-domain generation researches boil down to matching a joint distribution by deep generative models (DGMs).
no code implementations • 23 Sep 2019 • Qingxing Cao, Bailin Li, Xiaodan Liang, Liang Lin
Explanation and high-order reasoning capabilities are crucial for real-world visual question answering with diverse levels of inference complexity (e. g., what is the dog that is near the girl playing with?)
no code implementations • 28 Sep 2019 • Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, Liang Lin
Resembling the rapid learning capability of human, few-shot learning empowers vision systems to understand new concepts by training with few samples.
Ranked #19 on Few-Shot Object Detection on MS-COCO (30-shot)
no code implementations • CVPR 2019 • Weijiang Yu, Xiaodan Liang, Ke Gong, Chenhan Jiang, Nong Xiao, Liang Lin
Each Layout-Graph Reasoning(LGR) layer aims to map feature representations into structural graph nodes via a Map-to-Node module, performs reasoning over structural graph nodes to achieve global layout coherency via a layout-graph reasoning module, and then maps graph nodes back to enhance feature representations via a Node-to-Map module.
1 code implementation • NeurIPS 2019 • Weijiang Yu, Jingwen Zhou, Weihao Yu, Xiaodan Liang, Nong Xiao
Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.
no code implementations • CVPR 2020 • Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang
In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information.
Ranked #13 on Vision and Language Navigation on VLN Challenge
no code implementations • 22 Nov 2019 • Lewei Yao, Hang Xu, Wei zhang, Xiaodan Liang, Zhenguo Li
In this paper, we present a two-stage coarse-to-fine searching strategy named Structural-to-Modular NAS (SM-NAS) for searching a GPU-friendly design of both an efficient combination of modules and better modular-level architecture for object detection.
1 code implementation • 29 Nov 2019 • Changlin Li, Jiefeng Peng, Liuchun Yuan, Guangrun Wang, Xiaodan Liang, Liang Lin, Xiaojun Chang
Moreover, we find that the knowledge of a network model lies not only in the network parameters but also in the network architecture.
Ranked #1 on Neural Architecture Search on CIFAR-100
1 code implementation • 4 Feb 2020 • Jinghui Qin, Zheng Ye, Jianheng Tang, Xiaodan Liang
Target-guided open-domain conversation aims to proactively and naturally guide a dialogue agent or human to achieve specific goals, topics or keywords during open-ended conversations.
no code implementations • 18 Feb 2020 • Hang Xu, Linpu Fang, Xiaodan Liang, Wenxiong Kang, Zhenguo Li
Finally, an InterDomain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally.
no code implementations • 3 Mar 2020 • Chenhan Jiang, Shaoju Wang, Hang Xu, Xiaodan Liang, Nong Xiao
Is a hand-crafted detection network tailored for natural image undoubtedly good enough over a discrepant medical lesion domain?
1 code implementation • 14 Mar 2020 • Junfan Lin, Keze Wang, Ziliang Chen, Xiaodan Liang, Liang Lin
To eliminate this bias and inspired by the propensity score matching technique with causal diagram, we propose a propensity-based patient simulator to effectively answer unrecorded inquiry by drawing knowledge from the other records; Bias (ii) inherently comes along with the passively collected data, and is one of the key obstacles for training the agent towards "learning how" rather than "remembering what".
1 code implementation • CVPR 2020 • Yi Zhu, Fengda Zhu, Zhaohuan Zhan, Bingqian Lin, Jianbin Jiao, Xiaojun Chang, Xiaodan Liang
Benefiting from the collaborative learning of the L-mem and the V-mem, our CMN is able to explore the memory about the decision making of historical navigation actions which is for the current step.
no code implementations • 23 Mar 2020 • Qingxing Cao, Xiaodan Liang, Keze Wang, Liang Lin
Inspired by the property of a capsule network that can carve a tree structure inside a regular convolutional neural network (CNN), we propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network", where the compositional process is guided by the linguistic parse tree.
no code implementations • CVPR 2020 • Yangxin Wu, Gengwei Zhang, Yiming Gao, Xiajun Deng, Ke Gong, Xiaodan Liang, Liang Lin
We introduce a Bidirectional Graph Reasoning Network (BGRNet), which incorporates graph structure into the conventional panoptic segmentation network to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.
2 code implementations • 6 Jun 2020 • Mingjie Li, Fuyu Wang, Xiaojun Chang, Xiaodan Liang
Firstly, the regions of primary interest to radiologists are usually located in a small area of the global image, meaning that the remainder parts of the image could be considered as irrelevant noise in the training procedure.
no code implementations • ECCV 2020 • Xin Chen, Yawen Duan, Zewei Chen, Hang Xu, Zihao Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li
In spite of its remarkable progress, many algorithms are restricted to particular search spaces.
Ranked #12 on Neural Architecture Search on NAS-Bench-201, CIFAR-10 (Accuracy (Val) metric)
1 code implementation • ECCV 2020 • Hang Xu, Shaoju Wang, Xinyue Cai, Wei zhang, Xiaodan Liang, Zhenguo Li
In this paper, we propose a novel lane-sensitive architecture search framework named CurveLane-NAS to automatically capture both long-ranged coherent and accurate short-range curve information while unifying both architecture search and post-processing on curve lane predictions via point blending.
Ranked #12 on Lane Detection on CurveLanes
1 code implementation • EMNLP 2020 • Lishan Huang, Zheng Ye, Jinghui Qin, Liang Lin, Xiaodan Liang
Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation.
1 code implementation • EMNLP 2020 • Jinghui Qin, Lihui Lin, Xiaodan Liang, Rumin Zhang, Liang Lin
A practical automatic textual math word problems (MWPs) solver should be able to solve various textual MWPs while most existing works only focused on one-unknown linear MWPs.
Ranked #10 on Math Word Problem Solving on ALG514
1 code implementation • 15 Oct 2020 • Wenge Liu, Jianheng Tang, Yi Cheng, Wenjie Li, Yefeng Zheng, Xiaodan Liang
To push forward the future research on building expert-sensitive medical dialogue system, we proposes two kinds of medical dialogue tasks based on MedDG dataset.
no code implementations • 23 Oct 2020 • HANLIN ZHANG, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric P. Xing
Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs.
1 code implementation • NeurIPS 2020 • Wangchunshu Zhou, Jinyi Hu, HANLIN ZHANG, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang
In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.
2 code implementations • NeurIPS 2020 • Yangxin Wu, Gengwei Zhang, Hang Xu, Xiaodan Liang, Liang Lin
In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm.
no code implementations • 28 Nov 2020 • Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Min Xu, Irina Voiculescu, Eric P. Xing
To bridge the methodological gaps in partially supervised learning (PSL) under data scarcity, we propose Vicinal Labels Under Uncertainty (VLUU), a simple yet efficient framework utilizing the human structure similarity for partially supervised medical image segmentation.
1 code implementation • 30 Nov 2020 • Junfan Lin, Zhongzhan Huang, Keze Wang, Xiaodan Liang, Weiwei Chen, Liang Lin
Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency.
no code implementations • NeurIPS 2020 • Hao Zhang, Yuan Li, Zhijie Deng, Xiaodan Liang, Lawrence Carin, Eric Xing
Synchronization is a key step in data-parallel distributed machine learning (ML).
no code implementations • 7 Dec 2020 • Gengwei Zhang, Yiming Gao, Hang Xu, Hao Zhang, Zhenguo Li, Xiaodan Liang
Panoptic segmentation that unifies instance segmentation and semantic segmentation has recently attracted increasing attention.
Ranked #17 on Panoptic Segmentation on COCO test-dev
1 code implementation • 14 Dec 2020 • Qingxing Cao, Bailin Li, Xiaodan Liang, Keze Wang, Liang Lin
Specifically, we generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs to disentangle the knowledge from other biases.
no code implementations • 22 Dec 2020 • Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin
When sampling tasks in MML-ASR, AMS adaptively determines the task sampling probability for each source language.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 22 Dec 2020 • Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin
Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues.
no code implementations • 24 Dec 2020 • Yinya Huang, Meng Fang, Xunlin Zhan, Qingxing Cao, Xiaodan Liang, Liang Lin
It is crucial since the quality of the evidence is the key to answering commonsense questions, and even determines the upper bound on the QA systems performance.
2 code implementations • 1 Jan 2021 • Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li
While existing NAS methods mostly design architectures on one single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.
no code implementations • ICCV 2021 • Hanxue Liang, Chenhan Jiang, Dapeng Feng, Xin Chen, Hang Xu, Xiaodan Liang, Wei zhang, Zhenguo Li, Luc van Gool
Here we present a novel self-supervised 3D Object detection framework that seamlessly integrates the geometry-aware contrast and clustering harmonization to lift the unsupervised 3D representation learning, named GCC-3D.
no code implementations • 1 Jan 2021 • Junfan Lin, Changxin Huang, Xiaodan Liang, Liang Lin
The curiosity is added to the target entropy to increase the entropy temperature for unfamiliar states and decrease the target entropy for familiar states.
no code implementations • 1 Jan 2021 • Hang Xu, Ning Kang, Gengwei Zhang, Xiaodan Liang, Zhenguo Li
The resulting model zoo is more training efficient than SOTA NAS models, e. g. 6x faster than RegNetY-16GF, and 1. 7x faster than EfficientNetB3.
no code implementations • ICCV 2021 • Yi Zhu, Yue Weng, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Yutong Lu, Jianbin Jiao
Vision-Dialog Navigation (VDN) requires an agent to ask questions and navigate following the human responses to find target objects.
no code implementations • 1 Jan 2021 • Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin
To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.
no code implementations • ICLR 2021 • Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang
Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task.
no code implementations • ICCV 2021 • Qingxing Cao, Wentao Wan, Keze Wang, Xiaodan Liang, Liang Lin
The experimental results show that our proposed method can improve current VQA models on OOD split without losing performance on the in-domain test data.
2 code implementations • ICCV 2021 • Li Liu, Qingle Huang, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun Chang, Xiaodan Liang
Extensive experiments on two vision tasks, including ImageNet classification and Pascal VOC segmentation, demonstrate the superiority of our ICKD, which consistently outperforms many existing methods, advancing the state-of-the-art in the fields of Knowledge Distillation.
Ranked #19 on Knowledge Distillation on ImageNet
no code implementations • 9 Jan 2021 • Fuyu Wang, Xiaodan Liang, Lin Xu, Liang Lin
Beyond generating long and topic-coherent paragraphs in traditional captioning tasks, the medical image report composition task poses more task-oriented challenges by requiring both the highly-accurate medical term diagnosis and multiple heterogeneous forms of information including impression and findings.
1 code implementation • 20 Jan 2021 • Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang
Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task.
2 code implementations • 26 Jan 2021 • Liang Lin, Yiming Gao, Ke Gong, Meng Wang, Xiaodan Liang
Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e. g., sharing discrepant label granularity) without extensive re-training.
no code implementations • 1 Feb 2021 • Yukai Shi, Sen Zhang, Chenxing Zhou, Xiaodan Liang, Xiaojun Yang, Liang Lin
Non-parallel text style transfer has attracted increasing research interests in recent years.
1 code implementation • ICLR 2021 • Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, Zhenguo Li
For object detection, the well-established classification and regression loss functions have been carefully designed by considering diverse learning challenges.
1 code implementation • 25 Feb 2021 • Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok
A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.
1 code implementation • EMNLP 2020 • Zhengzhong Liu, Guanxiong Ding, Avinash Bukkittu, Mansi Gupta, Pengzhi Gao, Atif Ahmed, Shikun Zhang, Xin Gao, Swapnil Singhavi, Linwei Li, Wei Wei, Zecong Hu, Haoran Shi, Haoying Zhang, Xiaodan Liang, Teruko Mitamura, Eric P. Xing, Zhiting Hu
Empirical natural language processing (NLP) systems in application domains (e. g., healthcare, finance, education) involve interoperation among multiple components, ranging from data ingestion, human annotation, to text retrieval, analysis, generation, and visualization.
1 code implementation • ICCV 2021 • Changlin Li, Tao Tang, Guangrun Wang, Jiefeng Peng, Bing Wang, Xiaodan Liang, Xiaojun Chang
In this work, we present Block-wisely Self-supervised Neural Architecture Search (BossNAS), an unsupervised NAS method that addresses the problem of inaccurate architecture rating caused by large weight-sharing space and biased supervision in previous methods.
1 code implementation • CVPR 2021 • Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, Xiaojun Chang
Here, we explore a dynamic network slimming regime, named Dynamic Slimmable Network (DS-Net), which aims to achieve good hardware-efficiency via dynamically adjusting filter numbers of networks at test time with respect to different inputs, while keeping filters stored statically and contiguously in hardware to prevent the extra burden.
2 code implementations • NAACL 2021 • Yinya Huang, Meng Fang, Yu Cao, LiWei Wang, Xiaodan Liang
The model encodes discourse information as a graph with elementary discourse units (EDUs) and discourse relations, and learns the discourse-aware features via a graph network for downstream QA tasks.
Ranked #24 on Reading Comprehension on ReClor
no code implementations • CVPR 2021 • Fengda Zhu, Xiwen Liang, Yi Zhu, Xiaojun Chang, Xiaodan Liang
In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description.
Ranked #5 on Visual Navigation on SOON Test
1 code implementation • ACL 2021 • Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, Song-Chun Zhu
We further propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (Inter-GPS).
Ranked #1 on Mathematical Question Answering on GeoS
2 code implementations • CVPR 2021 • Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li
While existing NAS methods mostly design architectures on a single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.
1 code implementation • Findings (ACL) 2021 • Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, Liang Lin
Therefore, we propose a Geometric Question Answering dataset GeoQA, containing 4, 998 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems.
Ranked #4 on Mathematical Reasoning on PGPS9K
1 code implementation • ACL 2021 • Zheng Ye, Liucun Lu, Lishan Huang, Liang Lin, Xiaodan Liang
To address these limitations, we propose Quantifiable Dialogue Coherence Evaluation (QuantiDCE), a novel framework aiming to train a quantifiable dialogue coherence metric that can reflect the actual human rating standards.
1 code implementation • ICCV 2021 • Chong Liu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang, ZongYuan Ge, Yi-Dong Shen
Then, we cross-connect the key views of different scenes to construct augmented scenes.
Ranked #38 on Vision and Language Navigation on VLN Challenge
1 code implementation • 17 Jun 2021 • Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang
However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.
1 code implementation • 21 Jun 2021 • Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei zhang, Zhenguo Li, Jie Yu, Hang Xu, Chunjing Xu
To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
no code implementations • 21 Jun 2021 • Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Jiageng Mao, Chaoqiang Ye, Wei zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu
Experiments show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when fine-tuning with different downstream tasks (i. e., detection, semantic/instance segmentation) in autonomous driving domain.
1 code implementation • 29 Jun 2021 • Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu
Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.
1 code implementation • ACL 2021 • Jinghui Qin, Xiaodan Liang, Yining Hong, Jianheng Tang, Liang Lin
Previous math word problem solvers following the encoder-decoder paradigm fail to explicitly incorporate essential math symbolic constraints, leading to unexplainable and unreasonable predictions.
no code implementations • 7 Jul 2021 • Fengda Zhu, Yi Zhu, Vincent CS Lee, Xiaodan Liang, Xiaojun Chang
A navigation agent is supposed to have various intelligent skills, such as visual perceiving, mapping, planning, exploring and reasoning, etc.
no code implementations • 15 Jul 2021 • Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li
Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks.
Ranked #10 on Semantic Textual Similarity on MRPC
1 code implementation • 23 Jul 2021 • Bingqian Lin, Yi Zhu, Yanxin Long, Xiaodan Liang, Qixiang Ye, Liang Lin
Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.
1 code implementation • ICCV 2021 • Xunlin Zhan, Yangxin Wu, Xiao Dong, Yunchao Wei, Minlong Lu, Yichi Zhang, Hang Xu, Xiaodan Liang
In this paper, we investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval among fine-grained product categories.
no code implementations • 1 Aug 2021 • Zhenyu Xie, Xujie Zhang, Fuwei Zhao, Haoye Dong, Michael C. Kampffmeyer, Haonan Yan, Xiaodan Liang
Despite recent progress on image-based virtual try-on, current methods are constraint by shared warping networks and thus fail to synthesize natural try-on results when faced with clothing categories that require different warping operations.
no code implementations • ICCV 2021 • Hang Xu, Ning Kang, Gengwei Zhang, Chuanlong Xie, Xiaodan Liang, Zhenguo Li
Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks.
1 code implementation • ICCV 2021 • Fuwei Zhao, Zhenyu Xie, Michael Kampffmeyer, Haoye Dong, Songfang Han, Tianxiang Zheng, Tao Zhang, Xiaodan Liang
Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value.
no code implementations • 11 Aug 2021 • Guangyi Liu, Yinghong Liao, Fuyu Wang, Bin Zhang, Lu Zhang, Xiaodan Liang, Xiang Wan, Shaolin Li, Zhen Li, Shuixing Zhang, Shuguang Cui
Medical imaging technologies, including computed tomography (CT) or chest X-Ray (CXR), are largely employed to facilitate the diagnosis of the COVID-19.
1 code implementation • Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) 2021 • Mingjie Li, Wenjia Cai, Rui Liu, Yuetian Weng, Xiaoyun Zhao, Cong Wang, Xin Chen, Zhong Liu, Caineng Pan, Mengke Li, Yizhi Liu, Flora D Salim, Karin Verspoor, Xiaodan Liang, Xiaojun Chang
Researchers have explored advanced methods from computer vision and natural language processing to incorporate medical domain knowledge for the generation of readable medical reports.
1 code implementation • ICCV 2021 • Jiefeng Peng, Jiqi Zhang, Changlin Li, Guangrun Wang, Xiaodan Liang, Liang Lin
We attribute this ranking correlation problem to the supernet training consistency shift, including feature shift and parameter shift.
1 code implementation • ICCV 2021 • Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, Chunjing Xu
To resolve the problems, we propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest.
Ranked #2 on 3D Object Detection on waymo vehicle (AP metric)
1 code implementation • ICCV 2021 • Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, Chunjing Xu
We present Voxel Transformer (VoTr), a novel and effective voxel-based Transformer backbone for 3D object detection from point clouds.
Ranked #3 on 3D Object Detection on waymo vehicle (L1 mAP metric)
no code implementations • CVPR 2022 • Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, XiaoYong Wei, Minlong Lu, YaoWei Wang, Xiaodan Liang
Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets.
1 code implementation • Findings (EMNLP) 2021 • Chenhe Dong, Guangrun Wang, Hang Xu, Jiefeng Peng, Xiaozhe Ren, Xiaodan Liang
In this paper, we have a critical insight that improving the feed-forward network (FFN) in BERT has a higher gain than improving the multi-head attention (MHA) since the computational cost of FFN is 2$\sim$3 times larger than MHA.
no code implementations • Findings (EMNLP) 2021 • Guolin Zheng, Yubei Xiao, Ke Gong, Pan Zhou, Xiaodan Liang, Liang Lin
Specifically, we unify a pre-trained acoustic model (wav2vec 2. 0) and a language model (BERT) into an end-to-end trainable framework.
1 code implementation • 21 Sep 2021 • Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, Xiaojun Chang
Dynamic networks have shown their promising capability in reducing theoretical computation complexity by adapting their architectures to the input during inference.
no code implementations • 29 Sep 2021 • Siyi Hu, Chuanlong Xie, Xiaodan Liang, Xiaojun Chang
In addition, role diversity can help to find a better training strategy and increase performance in cooperative MARL.
1 code implementation • 25 Oct 2021 • Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu
Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.
Ranked #1 on Visual Question Answering (VQA) on IconQA
no code implementations • 27 Oct 2021 • Bowen Wu, Zhenyu Xie, Xiaodan Liang, Yubei Xiao, Haoye Dong, Liang Lin
The integration of human parsing and appearance flow effectively guides the generation of video frames with realistic appearance.
1 code implementation • ICCV 2021 • Haonan Yan, Jiaqi Chen, Xujie Zhang, Shengkai Zhang, Nianhong Jiao, Xiaodan Liang, Tianxiang Zheng
However, the popular DensePose-COCO dataset relies on a sophisticated manual annotation system, leading to severe limitations in acquiring the denser and more accurate annotated pose resources.
1 code implementation • ICLR 2022 • Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu
In this paper, we introduce a large-scale Fine-grained Interactive Language-Image Pre-training (FILIP) to achieve finer-level alignment through a cross-modal late interaction mechanism, which uses a token-wise maximum similarity between visual and textual tokens to guide the contrastive objective.
1 code implementation • NeurIPS 2021 • Zhenyu Xie, Zaiyu Huang, Fuwei Zhao, Haoye Dong, Michael Kampffmeyer, Xiaodan Liang
Image-based virtual try-on is one of the most promising applications of human-centric image generation due to its tremendous real-world potential.
1 code implementation • 8 Dec 2021 • Xiwen Liang, Fengda Zhu, Yi Zhu, Bingqian Lin, Bing Wang, Xiaodan Liang
The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.
no code implementations • CVPR 2022 • Chaojie Yang, Hanhui Li, Shengjie Wu, Shengkai Zhang, Haonan Yan, Nianhong Jiao, Jie Tang, Runnan Zhou, Xiaodan Liang, Tianxiang Zheng
This is because current methods mainly rely on a single pose/appearance model, which is limited in disentangling various poses and appearance in human images.
1 code implementation • 8 Feb 2022 • Li Liu, Qingle Huang, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun Chang, Xiaodan Liang
Extensive experiments on two vision tasks, includ-ing ImageNet classification and Pascal VOC segmentation, demonstrate the superiority of our ICKD, which consis-tently outperforms many existing methods, advancing thestate-of-the-art in the fields of Knowledge Distillation.
1 code implementation • 14 Feb 2022 • Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei zhang, Xin Jiang, Chunjing Xu, Hang Xu
Experiments show that Wukong can serve as a promising Chinese pre-training dataset and benchmark for different cross-modal learning methods.
Ranked #6 on Image Retrieval on MUGE Retrieval
no code implementations • ICLR 2022 • Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, James T. Kwok
Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields.
no code implementations • 18 Feb 2022 • Shervin Minaee, Xiaodan Liang, Shuicheng Yan
Augmented reality (AR) is one of the relatively old, yet trending areas in the intersection of computer vision and computer graphics with numerous applications in several areas, from gaming and entertainment, to education and healthcare.
1 code implementation • ACL 2022 • Xiwen Liang, Fengda Zhu, Lingling Li, Hang Xu, Xiaodan Liang
To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP).
no code implementations • 15 Mar 2022 • Kaican Li, Kai Chen, Haoyu Wang, Lanqing Hong, Chaoqiang Ye, Jianhua Han, Yukuai Chen, Wei zhang, Chunjing Xu, Dit-yan Yeung, Xiaodan Liang, Zhenguo Li, Hang Xu
One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases.
no code implementations • 17 Mar 2022 • Xunlin Zhan, Yuan Li, Xiao Dong, Xiaodan Liang, Zhiting Hu, Lawrence Carin
Commonsense question answering requires reasoning about everyday situations and causes and effects implicit in context.
no code implementations • 18 Mar 2022 • Jianhua Han, Xiajun Deng, Xinyue Cai, Zhen Yang, Hang Xu, Chunjing Xu, Xiaodan Liang
We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is a long-standing research topic for visual perception in autonomous driving.
1 code implementation • CVPR 2022 • Pengzhen Ren, Changlin Li, Guangrun Wang, Yun Xiao, Qing Du, Xiaodan Liang, Xiaojun Chang
Recently, a surge of interest in visual transformers is to reduce the computational cost by limiting the calculation of self-attention to a local window.
1 code implementation • CVPR 2022 • Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang
First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth.
no code implementations • CVPR 2022 • Xin Dong, Fuwei Zhao, Zhenyu Xie, Xijin Zhang, Daniel K. Du, Min Zheng, Xiang Long, Xiaodan Liang, Jianchao Yang
While significant progress has been made in garment transfer, one of the most applicable directions of human-centric image generation, existing works overlook the in-the-wild imagery, presenting severe garment-person misalignment as well as noticeable degradation in fine texture details.
1 code implementation • CVPR 2022 • Minbin Huang, Zhijian Huang, Changlin Li, Xin Chen, Hang Xu, Zhenguo Li, Xiaodan Liang
It is able to find top 0. 16\% and 0. 29\% architectures on average on two search spaces under the budget of only 50 models.
1 code implementation • 29 Apr 2022 • Wenge Liu, Yi Cheng, Hao Wang, Jianheng Tang, Yafei Liu, Ruihui Zhao, Wenjie Li, Yefeng Zheng, Xiaodan Liang
In this paper, we explore how to bring interpretability to data-driven DSMD.
1 code implementation • CVPR 2022 • BinBin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, Xiaodan Liang
To make ROSETTA automatically determine which experience is available and useful, a prototypical task correlation guided Gating Diversity Controller(GDC) is introduced to adaptively adjust the diversity of gates for the new task based on class-specific prototypes.
2 code implementations • 17 May 2022 • Zhicheng Yang, Jinghui Qin, Jiaqi Chen, Liang Lin, Xiaodan Liang
To address this issue and make a step towards interpretable MWP solving, we first construct a high-quality MWP dataset named InterMWP which consists of 11, 495 MWPs and annotates interpretable logical formulas based on algebraic knowledge as the grounded linguistic logic of each solution equation.
2 code implementations • Findings (NAACL) 2022 • Zhicheng Yang, Jinghui Qin, Jiaqi Chen, Xiaodan Liang
However, current solvers exist solving bias which consists of data bias and learning bias due to biased dataset and improper training strategy.
1 code implementation • CVPR 2022 • Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang, Gang Wang
To this end, we propose a novel one-to-all spatial matching knowledge distillation approach.
2 code implementations • 25 May 2022 • Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, Lingpeng Kong
In this paradigm, the synthesized data from the PLM acts as the carrier of knowledge, which is used to train a task-specific model with orders of magnitude fewer parameters than the PLM, achieving both higher performance and efficiency than prompt-based zero-shot learning methods on PLMs.
1 code implementation • 30 May 2022 • Kaicheng Yu, Tang Tao, Hongwei Xie, Zhiwei Lin, Zhongwei Wu, Zhongyu Xia, TingTing Liang, Haiyang Sun, Jiong Deng, Dayang Hao, Yongtao Wang, Xiaodan Liang, Bing Wang
There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR.
no code implementations • CVPR 2022 • Bingqian Lin, Yi Zhu, Zicong Chen, Xiwen Liang, Jianzhuang Liu, Xiaodan Liang
Vision-Language Navigation (VLN) is a challenging task that requires an embodied agent to perform action-level modality alignment, i. e., make instruction-asked actions sequentially in complex visual environments.
no code implementations • 1 Jun 2022 • Siyi Hu, Chuanlong Xie, Xiaodan Liang, Xiaojun Chang
In this study, we quantify the agent's behavior difference and build its relationship with the policy performance via {\bf Role Diversity}, a metric to measure the characteristics of MARL tasks.
no code implementations • CVPR 2022 • Mingjie Li, Wenjia Cai, Karin Verspoor, Shirui Pan, Xiaodan Liang, Xiaojun Chang
To endow models with the capability of incorporating expert knowledge, we propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG), in which clinical relation triples are injected into the visual features as prior knowledge to drive the decoding procedure.
no code implementations • 17 Jun 2022 • Xiao Dong, Xunlin Zhan, Yunchao Wei, XiaoYong Wei, YaoWei Wang, Minlong Lu, Xiaochun Cao, Xiaodan Liang
Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
no code implementations • 4 Jul 2022 • Yinya Huang, Lemao Liu, Kun Xu, Meng Fang, Liang Lin, Xiaodan Liang
In this work, we propose logic structural-constraint modeling to solve the logical reasoning QA and introduce discourse-aware graph networks (DAGNs).
no code implementations • 18 Jul 2022 • Quande Liu, Youpeng Wen, Jianhua Han, Chunjing Xu, Hang Xu, Xiaodan Liang
To bridge the gap between supervised semantic segmentation and real-world applications that acquires one model to recognize arbitrary new concepts, recent zero-shot segmentation attracts a lot of attention by exploring the relationships between unseen and seen object categories, yet requiring large amounts of densely-annotated data with diverse base classes.
1 code implementation • 27 Jul 2022 • Mengxue Qu, Yu Wu, Wu Liu, Qiqi Gong, Xiaodan Liang, Olga Russakovsky, Yao Zhao, Yunchao Wei
Particularly, SiRi conveys a significant principle to the research of visual grounding, i. e., a better initialized vision-language encoder would help the model converge to a better local minimum, advancing the performance accordingly.
no code implementations • 27 Jul 2022 • Zhenyu Xie, Zaiyu Huang, Fuwei Zhao, Haoye Dong, Michael Kampffmeyer, Xin Dong, Feida Zhu, Xiaodan Liang
In this work, we take a step forwards to explore versatile virtual try-on solutions, which we argue should possess three main properties, namely, they should support unsupervised training, arbitrary garment categories, and controllable garment editing.
1 code implementation • 1 Aug 2022 • Guangyi Liu, Zeyu Feng, Yuan Gao, Zichao Yang, Xiaodan Liang, Junwei Bao, Xiaodong He, Shuguang Cui, Zhen Li, Zhiting Hu
This paper proposes a new efficient approach for composable text operations in the compact latent space of text.
Ranked #2 on Unsupervised Text Style Transfer on Yelp