Search Results for author: Xiaodan Liang

Found 270 papers, 121 papers with code

Don’t Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

1 code implementation NAACL 2022 Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu

Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.

Machine Translation Style Transfer +2

Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs

no code implementations20 May 2024 Siyu Lou, Yuntian Chen, Xiaodan Liang, Liang Lin, Quanshi Zhang

In this study, we propose an axiomatic system to define and quantify the precise memorization and in-context reasoning effects used by the large language model (LLM) for language generation.

Disentanglement Language Modelling +3

MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation

no code implementations1 May 2024 Xujie Zhang, Ente Lin, Xiu Li, Yuxuan Luo, Michael Kampffmeyer, Xin Dong, Xiaodan Liang

Besides, to remove the segmentation dependency, MMTryon uses a parsing-free garment encoder and leverages a novel scalable data generation pipeline to convert existing VITON datasets to a form that allows MMTryon to be trained without requiring any explicit segmentation.

Segmentation Virtual Try-on

TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

1 code implementation29 Apr 2024 Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

To address this issue, we introduce TheaterGen, a training-free framework that integrates large language models (LLMs) and text-to-image (T2I) models to provide the capability of multi-turn image generation.

Denoising Image Generation +2

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

1 code implementation25 Apr 2024 Jiehui Huang, Xiao Dong, Wenhui Song, Hanhui Li, Jun Zhou, Yuhao Cheng, Shutao Liao, Long Chen, Yiqiang Yan, Shengcai Liao, Xiaodan Liang

ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions.

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

no code implementations14 Apr 2024 Lewei Yao, Renjie Pi, Jianhua Han, Xiaodan Liang, Hang Xu, Wei zhang, Zhenguo Li, Dan Xu

This is followed by a fine-tuning stage that leverages a small number of high-resolution samples to further enhance detection performance.

Dense Captioning Language Modelling +4

MLP Can Be A Good Transformer Learner

1 code implementation8 Apr 2024 Sihao Lin, Pumeng Lyu, Dongrui Liu, Tao Tang, Xiaodan Liang, Andy Song, Xiaojun Chang

We identify that regarding the attention layer in bottom blocks, their subsequent MLP layers, i. e. two feed-forward layers, can elicit the same entropy quantity.

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model

no code implementations18 Mar 2024 Runhui Huang, Kaixin Cai, Jianhua Han, Xiaodan Liang, Renjing Pei, Guansong Lu, Songcen Xu, Wei zhang, Hang Xu

Specifically, an inter-layer attention module is designed to encourage information exchange and learning between layers, while a text-guided intra-layer attention module incorporates layer-specific prompts to direct the specific-content generation for each layer.

Image Generation Style Transfer

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

1 code implementation13 Mar 2024 Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu

However, many of these works face challenges in identifying correct output modalities and generating coherent images accordingly as the number of output modalities increases and the conversations go deeper.

Prompt Engineering Text-to-Image Generation

Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation

no code implementations13 Mar 2024 ZiCheng Zhang, Tong Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Qixiang Ye, Wei Ke

To mitigate these issues, we propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information. Specifically, we leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings.

Decoder Language Modelling +2

NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning

1 code implementation12 Mar 2024 Bingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang

Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.

Navigate Vision and Language Navigation

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

no code implementations9 Mar 2024 Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Liang Lin

For encouraging the agent to well capture the difference brought by perturbation, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts.

Contrastive Learning Navigate +1

DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions

1 code implementation2 Mar 2024 Guangrun Wang, Changlin Li, Liuchun Yuan, Jiefeng Peng, Xiaoyu Xian, Xiaodan Liang, Xiaojun Chang, Liang Lin

Addressing this problem, we modularize a large search space into blocks with small search spaces and develop a family of models with the distilling neural architecture (DNA) techniques.

Neural Architecture Search

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

1 code implementation27 Feb 2024 Tao Tang, Guangrun Wang, Yixing Lao, Peng Chen, Jie Liu, Liang Lin, Kaicheng Yu, Xiaodan Liang

Through extensive experiments across various datasets and scenes, we demonstrate the effectiveness of our approach in facilitating better interaction between LiDAR and camera modalities within a unified neural field.

Novel View Synthesis

MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

1 code implementation14 Feb 2024 Yinya Huang, Xiaohan Lin, Zhengying Liu, Qingxing Cao, Huajian Xin, Haiming Wang, Zhenguo Li, Linqi Song, Xiaodan Liang

Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving.

Automated Theorem Proving Language Modelling +3

GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data

no code implementations9 Feb 2024 Haoyuan Li, Yanpeng Zhou, Yihan Zeng, Hang Xu, Xiaodan Liang

3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions, which is curial to object identification, classification, and retrieval.

Language Modelling Retrieval

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

no code implementations14 Jan 2024 Jiaqi Chen, Bingqian Lin, ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong

Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-making and generalization abilities across various tasks.

Decision Making Vision and Language Navigation

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models

1 code implementation2 Jan 2024 Xinpeng Ding, Jinahua Han, Hang Xu, Xiaodan Liang, Wei zhang, Xiaomeng Li

BEV-InMLLM integrates multi-view, spatial awareness, and temporal semantics to enhance MLLMs' capabilities on NuInstruct tasks.

Autonomous Driving

3D Visibility-aware Generalizable Neural Radiance Fields for Interacting Hands

1 code implementation2 Jan 2024 Xuan Huang, Hanhui Li, Zejun Yang, Zhisheng Wang, Xiaodan Liang

Subsequently, a feature fusion module that exploits the visibility of query points and mesh vertices is introduced to adaptively merge features of both hands, enabling the recovery of features in unseen areas.

Monocular 3D Hand Mesh Recovery via Dual Noise Estimation

1 code implementation26 Dec 2023 Hanhui Li, Xiaojian Lin, Xuan Huang, Zejun Yang, Zhisheng Wang, Xiaodan Liang

However, due to the fixed hand topology and complex hand poses, current models are hard to generate meshes that are aligned with the image well.

Noise Estimation

Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model

no code implementations18 Dec 2023 Zhenyu Xie, Yang Wu, Xuehao Gao, Zhongqian Sun, Wei Yang, Xiaodan Liang

Besides, we introduce a multi-denoiser framework for the advanced diffusion model to ease the learning of high-dimensional model and fully explore the generative potential of the diffusion model.

Denoising Motion Synthesis

DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance

no code implementations5 Dec 2023 Cong Wang, Jiaxi Gu, Panwen Hu, Songcen Xu, Hang Xu, Xiaodan Liang

Especially for fidelity, our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.

Image to Video Generation

Speak Like a Native: Prompting Large Language Models in a Native Style

1 code implementation22 Nov 2023 Zhicheng Yang, Yiwei Wang, Yinya Huang, Jing Xiong, Xiaodan Liang, Jing Tang

Specifically, with AlignedCoT, we observe an average +3. 2\% improvement for \texttt{gpt-3. 5-turbo} compared to the carefully handcrafted CoT on multi-step reasoning benchmarks. Furthermore, we use AlignedCoT to rewrite the CoT text style in the training set, which improves the performance of Retrieval Augmented Generation by 3. 6\%. The source code and dataset is available at https://github. com/yangzhch6/AlignedCoT

Common Sense Reasoning GSM8K +3

DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning

1 code implementation4 Oct 2023 Jing Xiong, Zixuan Li, Chuanyang Zheng, Zhijiang Guo, Yichun Yin, Enze Xie, Zhicheng Yang, Qingxing Cao, Haiming Wang, Xiongwei Han, Jing Tang, Chengming Li, Xiaodan Liang

Dual Queries first query LLM to obtain LLM-generated knowledge such as CoT, then query the retriever to obtain the final exemplars via both question and the knowledge.

Dimensionality Reduction In-Context Learning +1

LEGO-Prover: Neural Theorem Proving with Growing Libraries

1 code implementation1 Oct 2023 Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, Xiaodan Liang

Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47. 1% to 50. 4%.

 Ranked #1 on Automated Theorem Proving on miniF2F-test (Pass@100 metric)

Automated Theorem Proving

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

no code implementations ICCV 2023 Xujie Zhang, BinBin Yang, Michael C. Kampffmeyer, Wenqing Zhang, Shiyue Zhang, Guansong Lu, Liang Lin, Hang Xu, Xiaodan Liang

Cross-modal garment synthesis and manipulation will significantly benefit the way fashion designers generate garments and modify their designs via flexible linguistic interfaces. Current approaches follow the general text-to-image paradigm and mine cross-modal relations via simple cross-attention modules, neglecting the structural correspondence between visual and textual representations in the fashion design domain.

Attribute Constituency Parsing +1

Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos

no code implementations ICCV 2023 Haoyuan Li, Haoye Dong, Hanchao Jia, Dong Huang, Michael C. Kampffmeyer, Liang Lin, Xiaodan Liang

Multi-person 3D mesh recovery from videos is a critical first step towards automatic perception of group behavior in virtual reality, physical therapy and beyond.

Human Detection

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

no code implementations ICCV 2023 Runhui Huang, Jianhua Han, Guansong Lu, Xiaodan Liang, Yihan Zeng, Wei zhang, Hang Xu

DiffDis first formulates the image-text discriminative problem as a generative diffusion process of the text embedding from the text encoder conditioned on the image.

Image Generation Zero-Shot Learning

CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation

1 code implementation14 Aug 2023 Hongguang Zhu, Yunchao Wei, Xiaodan Liang, Chunjie Zhang, Yao Zhao

Regarding the growing nature of real-world data, such an offline training paradigm on ever-expanding data is unsustainable, because models lack the continual learning ability to accumulate knowledge constantly.

Continual Learning Continual Pretraining

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation

no code implementations ICCV 2023 Kaixin Cai, Pengzhen Ren, Yi Zhu, Hang Xu, Jianzhuang Liu, Changlin Li, Guangrun Wang, Xiaodan Liang

To address this issue, we propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation that enhances a model's ability to reorganize patches mixed across images, exploring both local visual relevance and global semantic coherence.

Segmentation Semantic Segmentation +1

FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

no code implementations ICCV 2023 Zhijian Huang, Sihao Lin, Guiyu Liu, Mukun Luo, Chaoqiang Ye, Hang Xu, Xiaojun Chang, Xiaodan Liang

Specifically, the gradients, produced by the task heads and used to update the shared backbone, will be calibrated at the backbone's last layer to alleviate the task conflict.

Autonomous Driving Multi-Task Learning

Fashion Matrix: Editing Photos by Just Talking

1 code implementation25 Jul 2023 Zheng Chong, Xujie Zhang, Fuwei Zhao, Zhenyu Xie, Xiaodan Liang

The utilization of Large Language Models (LLMs) for the construction of AI systems has garnered significant attention across diverse fields.

Semantic Segmentation

Surfer: Progressive Reasoning with World Models for Robotic Manipulation

no code implementations20 Jun 2023 Pengzhen Ren, Kaidong Zhang, Hetao Zheng, Zixuan Li, Yuhang Wen, Fengda Zhu, Mas Ma, Xiaodan Liang

To conduct a comprehensive and systematic evaluation of the robot manipulation model in terms of language understanding and physical execution, we also created a robotic manipulation benchmark with progressive reasoning tasks, called SeaWave.

Decision Making Natural Language Understanding +2

CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation

no code implementations17 Jun 2023 Xiwen Liang, Liang Ma, Shanshan Guo, Jianhua Han, Hang Xu, Shikui Ma, Xiaodan Liang

Understanding and following natural language instructions while navigating through complex, real-world environments poses a significant challenge for general-purpose robots.

Decision Making Instruction Following +4

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

no code implementations1 Jun 2023 Xiao Dong, Runhui Huang, XiaoYong Wei, Zequn Jie, Jianxing Yu, Jian Yin, Xiaodan Liang

Recent advances in vision-language pre-training have enabled machines to perform better in multimodal object discrimination (e. g., image-text semantic alignment) and image synthesis (e. g., text-to-image generation).

Contrastive Learning Retrieval +1

RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment

1 code implementation31 May 2023 Guian Fang, Zutao Jiang, Jianhua Han, Guansong Lu, Hang Xu, Shengcai Liao, Xiaodan Liang

Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from textual descriptions.

Caption Generation Language Modelling +3

Boosting Visual-Language Models by Exploiting Hard Samples

1 code implementation9 May 2023 Haonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi

In this work, we present HELIP, a cost-effective strategy tailored to enhance the performance of existing CLIP models without the need for training a model from scratch or collecting additional data.

Retrieval Zero-Shot Learning

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

1 code implementation20 Apr 2023 Tang Tao, Longfei Gao, Guangrun Wang, Yixing Lao, Peng Chen, Hengshuang Zhao, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu

We address this challenge by formulating, to the best of our knowledge, the first differentiable end-to-end LiDAR rendering framework, LiDAR-NeRF, leveraging a neural radiance field (NeRF) to facilitate the joint learning of geometry and the attributes of 3D points.

3D Reconstruction Novel LiDAR View Synthesis +1

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment

no code implementations CVPR 2023 Lewei Yao, Jianhua Han, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Hang Xu

This paper presents DetCLIPv2, an efficient and scalable training framework that incorporates large-scale image-text pairs to achieve open-vocabulary object detection (OVD).

Language Modelling object-detection +1

GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning

1 code implementation CVPR 2023 Zhenyu Xie, Zaiyu Huang, Xin Dong, Fuwei Zhao, Haoye Dong, Xijin Zhang, Feida Zhu, Xiaodan Liang

Specifically, compared with the previous global warping mechanism, LFGP employs local flows to warp garments parts individually, and assembles the local warped results via the global garment parsing, resulting in reasonable warped parts and a semantic-correct intact garment even with challenging inputs. On the other hand, our DGT training strategy dynamically truncates the gradient in the overlap area and the warped garment is no more required to meet the boundary constraint, which effectively avoids the texture squeezing problem.

Virtual Try-on

CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

no code implementations22 Mar 2023 Yihan Zeng, Chenhan Jiang, Jiageng Mao, Jianhua Han, Chaoqiang Ye, Qingqiu Huang, Dit-yan Yeung, Zhen Yang, Xiaodan Liang, Hang Xu

Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks.

Zero-shot 3D Point Cloud Classification

Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation

1 code implementation CVPR 2023 Mingjie Li, Bingqian Lin, Zicong Chen, Haokun Lin, Xiaodan Liang, Xiaojun Chang

To address the limitation, we propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning, named DCL.

Contrastive Learning Decoder +3

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

no code implementations CVPR 2023 Yanxin Long, Youpeng Wen, Jianhua Han, Hang Xu, Pengzhen Ren, Wei zhang, Shen Zhao, Xiaodan Liang

Besides, our CapDet also achieves state-of-the-art performance on dense captioning tasks, e. g., 15. 44% mAP on VG V1. 2 and 13. 98% on the VG-COCO dataset.

Dense Captioning

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

no code implementations CVPR 2023 Xiwen Liang, Minzhe Niu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan Liang

Multi-task learning has emerged as a powerful paradigm to solve a range of tasks simultaneously with good efficiency in both computation resources and inference time.

Autonomous Driving Lane Detection +4

Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation

no code implementations13 Feb 2023 Bingqian Lin, Yi Zhu, Xiaodan Liang, Liang Lin, Jianzhuang Liu

Vision-Language Navigation (VLN) is a challenging task which requires an agent to align complex visual observations to language instructions to reach the goal position.

Re-Ranking Vision-Language Navigation

ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency

1 code implementation31 Jan 2023 Pengzhen Ren, Changlin Li, Hang Xu, Yi Zhu, Guangrun Wang, Jianzhuang Liu, Xiaojun Chang, Xiaodan Liang

Specifically, we first propose text-to-views consistency modeling to learn correspondence for multiple views of the same input image.

Segmentation Semantic Segmentation

Learning To Segment Every Referring Object Point by Point

1 code implementation CVPR 2023 Mengxue Qu, Yu Wu, Yunchao Wei, Wu Liu, Xiaodan Liang, Yao Zhao

Extensive experiments show that our model achieves 52. 06% in terms of accuracy (versus 58. 93% in fully supervised setting) on RefCOCO+@testA, when only using 1% of the mask annotations.

Object Referring Expression +1

CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data

no code implementations CVPR 2023 Yihan Zeng, Chenhan Jiang, Jiageng Mao, Jianhua Han, Chaoqiang Ye, Qingqiu Huang, Dit-yan Yeung, Zhen Yang, Xiaodan Liang, Hang Xu

Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks.

CTP:Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation

1 code implementation ICCV 2023 Hongguang Zhu, Yunchao Wei, Xiaodan Liang, Chunjie Zhang, Yao Zhao

Regarding the growing nature of real-world data, such an offline training paradigm on ever-expanding data is unsustainable, because models lack the continual learning ability to accumulate knowledge constantly.

Continual Learning Continual Pretraining

NLIP: Noise-robust Language-Image Pre-training

no code implementations14 Dec 2022 Runhui Huang, Yanxin Long, Jianhua Han, Hang Xu, Xiwen Liang, Chunjing Xu, Xiaodan Liang

Large-scale cross-modal pre-training paradigms have recently shown ubiquitous success on a wide range of downstream tasks, e. g., zero-shot classification, retrieval and image captioning.

Image Captioning Memorization +3

UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

1 code implementation6 Dec 2022 Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen, Xiaodan Liang

Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation.

Geometry Problem Solving Logical Reasoning +1

CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation

no code implementations4 Dec 2022 ZiCheng Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Wei Ke

Then in the Sentence-Mask Alignment (SMA) module, the masks are weighted by the sentence embedding to localize the referred object, and finally projected back to aggregate the pixels for the target.

Image Segmentation Semantic Segmentation +3

3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation

no code implementations2 Dec 2022 Zutao Jiang, Guansong Lu, Xiaodan Liang, Jihua Zhu, Wei zhang, Xiaojun Chang, Hang Xu

Here, we make the first attempt to achieve generic text-guided cross-category 3D object generation via a new 3D-TOGO model, which integrates a text-to-views generation module and a views-to-3D generation module.

3D Generation Contrastive Learning +2

Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

1 code implementation25 Nov 2022 Zaiyu Huang, Hanhui Li, Zhenyu Xie, Michael Kampffmeyer, Qingling Cai, Xiaodan Liang

Existing methods are restricted in this setting as they estimate garment warping flows mainly based on 2D poses and appearance, which omits the geometric prior of the 3D human body shape.

Virtual Try-on

Structure-Preserving 3D Garment Modeling with Neural Sewing Machines

no code implementations12 Nov 2022 Xipeng Chen, Guangrun Wang, Dizhong Zhu, Xiaodan Liang, Philip H. S. Torr, Liang Lin

In this paper, we propose a novel Neural Sewing Machine (NSM), a learning-based framework for structure-preserving 3D garment modeling, which is capable of learning representations for garments with diverse shapes and topologies and is successfully applied to 3D garment reconstruction and controllable manipulation.

Garment Reconstruction Representation Learning

Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection

no code implementations2 Nov 2022 Yanxin Long, Jianhua Han, Runhui Huang, Xu Hang, Yi Zhu, Chunjing Xu, Xiaodan Liang

Inspired by the success of vision-language methods (VLMs) in zero-shot classification, recent works attempt to extend this line of work into object detection by leveraging the localization ability of pre-trained VLMs and generating pseudo labels for unseen classes in a self-training manner.

Object object-detection +5

Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers

1 code implementation16 Oct 2022 Tao Tang, Changlin Li, Guangrun Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang

Despite the success, its development and application on self-supervised vision transformers have been hindered by several barriers, including the high search cost, the lack of supervision, and the unsuitable search space.

Data Augmentation Image Retrieval +3

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library

1 code implementation11 Oct 2022 Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang

A significant challenge facing researchers in the area of multi-agent reinforcement learning (MARL) pertains to the identification of a library that can offer fast and compatible development for multi-agent tasks and algorithm combinations, while obviating the need to consider compatibility issues.

Multi-agent Reinforcement Learning reinforcement-learning +1

Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning

1 code implementation9 Oct 2022 Yi Cheng, Wenge Liu, Wenjie Li, Jiashuo Wang, Ruihui Zhao, Bang Liu, Xiaodan Liang, Yefeng Zheng

Providing Emotional Support (ES) to soothe people in emotional distress is an essential capability in social interactions.

Dialogue Generation

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

no code implementations20 Sep 2022 Lewei Yao, Jianhua Han, Youpeng Wen, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Chunjing Xu, Hang Xu

We further design a concept dictionary~(with descriptions) from various online sources and detection datasets to provide prior knowledge for each concept.

object-detection Open World Object Detection

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

no code implementations19 Sep 2022 Xiwen Liang, Yangxin Wu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan Liang

Aiming towards a holistic understanding of multiple downstream tasks simultaneously, there is a need for extracting features with better transferability.

Autonomous Driving Multi-Task Learning +4

ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design

no code implementations11 Aug 2022 Xujie Zhang, Yu Sha, Michael C. Kampffmeyer, Zhenyu Xie, Zequn Jie, Chengwen Huang, Jianqing Peng, Xiaodan Liang

ARMANI discretizes an image into uniform tokens based on a learned cross-modal codebook in its first stage and uses a Transformer to model the distribution of image tokens for a real image given the tokens of the control signals in its second stage.

Image Generation

SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

1 code implementation27 Jul 2022 Mengxue Qu, Yu Wu, Wu Liu, Qiqi Gong, Xiaodan Liang, Olga Russakovsky, Yao Zhao, Yunchao Wei

Particularly, SiRi conveys a significant principle to the research of visual grounding, i. e., a better initialized vision-language encoder would help the model converge to a better local minimum, advancing the performance accordingly.

Visual Grounding

PASTA-GAN++: A Versatile Framework for High-Resolution Unpaired Virtual Try-on

no code implementations27 Jul 2022 Zhenyu Xie, Zaiyu Huang, Fuwei Zhao, Haoye Dong, Michael Kampffmeyer, Xin Dong, Feida Zhu, Xiaodan Liang

In this work, we take a step forwards to explore versatile virtual try-on solutions, which we argue should possess three main properties, namely, they should support unsupervised training, arbitrary garment categories, and controllable garment editing.

Disentanglement Image Generation +1

Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

no code implementations18 Jul 2022 Quande Liu, Youpeng Wen, Jianhua Han, Chunjing Xu, Hang Xu, Xiaodan Liang

To bridge the gap between supervised semantic segmentation and real-world applications that acquires one model to recognize arbitrary new concepts, recent zero-shot segmentation attracts a lot of attention by exploring the relationships between unseen and seen object categories, yet requiring large amounts of densely-annotated data with diverse base classes.

Clustering Online Clustering +3

Discourse-Aware Graph Networks for Textual Logical Reasoning

no code implementations4 Jul 2022 Yinya Huang, Lemao Liu, Kun Xu, Meng Fang, Liang Lin, Xiaodan Liang

In this work, we propose logic structural-constraint modeling to solve the logical reasoning QA and introduce discourse-aware graph networks (DAGNs).

graph construction Logical Reasoning +3

Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

no code implementations17 Jun 2022 Xiao Dong, Xunlin Zhan, Yunchao Wei, XiaoYong Wei, YaoWei Wang, Minlong Lu, Xiaochun Cao, Xiaodan Liang

Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.


Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation

no code implementations CVPR 2022 Mingjie Li, Wenjia Cai, Karin Verspoor, Shirui Pan, Xiaodan Liang, Xiaojun Chang

To endow models with the capability of incorporating expert knowledge, we propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG), in which clinical relation triples are injected into the visual features as prior knowledge to drive the decoding procedure.

Clinical Knowledge Decoder +1

Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL

no code implementations1 Jun 2022 Siyi Hu, Chuanlong Xie, Xiaodan Liang, Xiaojun Chang

In this study, we quantify the agent's behavior difference and build its relationship with the policy performance via {\bf Role Diversity}, a metric to measure the characteristics of MARL tasks.

SMAC+ Starcraft

ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts

no code implementations CVPR 2022 Bingqian Lin, Yi Zhu, Zicong Chen, Xiwen Liang, Jianzhuang Liu, Xiaodan Liang

Vision-Language Navigation (VLN) is a challenging task that requires an embodied agent to perform action-level modality alignment, i. e., make instruction-asked actions sequentially in complex visual environments.

Vision-Language Navigation

Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning

2 code implementations25 May 2022 Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, Lingpeng Kong

In this paradigm, the synthesized data from the PLM acts as the carrier of knowledge, which is used to train a task-specific model with orders of magnitude fewer parameters than the PLM, achieving both higher performance and efficiency than prompt-based zero-shot learning methods on PLMs.

text-classification Text Classification +1

Unbiased Math Word Problems Benchmark for Mitigating Solving Bias

2 code implementations Findings (NAACL) 2022 Zhicheng Yang, Jinghui Qin, Jiaqi Chen, Xiaodan Liang

However, current solvers exist solving bias which consists of data bias and learning bias due to biased dataset and improper training strategy.


LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning

2 code implementations17 May 2022 Zhicheng Yang, Jinghui Qin, Jiaqi Chen, Liang Lin, Xiaodan Liang

To address this issue and make a step towards interpretable MWP solving, we first construct a high-quality MWP dataset named InterMWP which consists of 11, 495 MWPs and annotates interpretable logical formulas based on algebraic knowledge as the grounded linguistic logic of each solution equation.

Math Math Word Problem Solving

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

1 code implementation CVPR 2022 BinBin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, Xiaodan Liang

To make ROSETTA automatically determine which experience is available and useful, a prototypical task correlation guided Gating Diversity Controller(GDC) is introduced to adaptively adjust the diversity of gates for the new task based on class-specific prototypes.

Continual Learning Object +2

Dressing in the Wild by Watching Dance Videos

no code implementations CVPR 2022 Xin Dong, Fuwei Zhao, Zhenyu Xie, Xijin Zhang, Daniel K. Du, Min Zheng, Xiang Long, Xiaodan Liang, Jianchao Yang

While significant progress has been made in garment transfer, one of the most applicable directions of human-centric image generation, existing works overlook the in-the-wild imagery, presenting severe garment-person misalignment as well as noticeable degradation in fine texture details.

Image Generation Virtual Try-on

Automated Progressive Learning for Efficient Training of Vision Transformers

1 code implementation CVPR 2022 Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang

First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth.

Beyond Fixation: Dynamic Window Visual Transformer

1 code implementation CVPR 2022 Pengzhen Ren, Changlin Li, Guangrun Wang, Yun Xiao, Qing Du, Xiaodan Liang, Xiaojun Chang

Recently, a surge of interest in visual transformers is to reduce the computational cost by limiting the calculation of self-attention to a local window.

Laneformer: Object-aware Row-Column Transformers for Lane Detection

no code implementations18 Mar 2022 Jianhua Han, Xiajun Deng, Xinyue Cai, Zhen Yang, Hang Xu, Chunjing Xu, Xiaodan Liang

We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is a long-standing research topic for visual perception in autonomous driving.

Autonomous Driving Decoder +2

elBERto: Self-supervised Commonsense Learning for Question Answering

no code implementations17 Mar 2022 Xunlin Zhan, Yuan Li, Xiao Dong, Xiaodan Liang, Zhiting Hu, Lawrence Carin

Commonsense question answering requires reasoning about everyday situations and causes and effects implicit in context.

Question Answering Representation Learning +1

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

no code implementations15 Mar 2022 Kaican Li, Kai Chen, Haoyu Wang, Lanqing Hong, Chaoqiang Ye, Jianhua Han, Yukuai Chen, Wei zhang, Chunjing Xu, Dit-yan Yeung, Xiaodan Liang, Zhenguo Li, Hang Xu

One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases.

Autonomous Driving Object +2

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

1 code implementation ACL 2022 Xiwen Liang, Fengda Zhu, Lingling Li, Hang Xu, Xiaodan Liang

To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP).

Domain Adaptation Vision-Language Navigation

Modern Augmented Reality: Applications, Trends, and Future Directions

no code implementations18 Feb 2022 Shervin Minaee, Xiaodan Liang, Shuicheng Yan

Augmented reality (AR) is one of the relatively old, yet trending areas in the intersection of computer vision and computer graphics with numerous applications in several areas, from gaming and entertainment, to education and healthcare.

Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation

1 code implementation8 Feb 2022 Li Liu, Qingle Huang, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun Chang, Xiaodan Liang

Extensive experiments on two vision tasks, includ-ing ImageNet classification and Pascal VOC segmentation, demonstrate the superiority of our ICKD, which consis-tently outperforms many existing methods, advancing thestate-of-the-art in the fields of Knowledge Distillation.

Knowledge Distillation

BodyGAN: General-Purpose Controllable Neural Human Body Generation

no code implementations CVPR 2022 Chaojie Yang, Hanhui Li, Shengjie Wu, Shengkai Zhang, Haonan Yan, Nianhong Jiao, Jie Tang, Runnan Zhou, Xiaodan Liang, Tianxiang Zheng

This is because current methods mainly rely on a single pose/appearance model, which is limited in disentangling various poses and appearance in human images.

Disentanglement Image Generation +1

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

1 code implementation8 Dec 2021 Xiwen Liang, Fengda Zhu, Yi Zhu, Bingqian Lin, Bing Wang, Xiaodan Liang

The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.

Contrastive Learning Navigate +1

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN

1 code implementation NeurIPS 2021 Zhenyu Xie, Zaiyu Huang, Fuwei Zhao, Haoye Dong, Michael Kampffmeyer, Xiaodan Liang

Image-based virtual try-on is one of the most promising applications of human-centric image generation due to its tremendous real-world potential.

Disentanglement Image Generation +1

FILIP: Fine-grained Interactive Language-Image Pre-Training

1 code implementation ICLR 2022 Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu

In this paper, we introduce a large-scale Fine-grained Interactive Language-Image Pre-training (FILIP) to achieve finer-level alignment through a cross-modal late interaction mechanism, which uses a token-wise maximum similarity between visual and textual tokens to guide the contrastive objective.

Image Classification Retrieval +2

UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model

1 code implementation ICCV 2021 Haonan Yan, Jiaqi Chen, Xujie Zhang, Shengkai Zhang, Nianhong Jiao, Xiaodan Liang, Tianxiang Zheng

However, the popular DensePose-COCO dataset relies on a sophisticated manual annotation system, leading to severe limitations in acquiring the denser and more accurate annotated pose resources.

3D Reconstruction

Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis

no code implementations27 Oct 2021 Bowen Wu, Zhenyu Xie, Xiaodan Liang, Yubei Xiao, Haoye Dong, Liang Lin

The integration of human parsing and appearance flow effectively guides the generation of video frames with realistic appearance.

Human Parsing Video Generation

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

1 code implementation25 Oct 2021 Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu

Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.

Arithmetic Reasoning Math Word Problem Solving +2

Role Diversity Matters: A Study of Cooperative Training Strategies for Multi-Agent RL

no code implementations29 Sep 2021 Siyi Hu, Chuanlong Xie, Xiaodan Liang, Xiaojun Chang

In addition, role diversity can help to find a better training strategy and increase performance in cooperative MARL.

SMAC+ Starcraft +1

DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers

1 code implementation21 Sep 2021 Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, Xiaojun Chang

Dynamic networks have shown their promising capability in reducing theoretical computation complexity by adapting their architectures to the input during inference.

Fairness Model Compression

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

1 code implementation Findings (EMNLP) 2021 Chenhe Dong, Guangrun Wang, Hang Xu, Jiefeng Peng, Xiaozhe Ren, Xiaodan Liang

In this paper, we have a critical insight that improving the feed-forward network (FFN) in BERT has a higher gain than improving the multi-head attention (MHA) since the computational cost of FFN is 2$\sim$3 times larger than MHA.

Data Augmentation Knowledge Distillation

M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

no code implementations CVPR 2022 Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, XiaoYong Wei, Minlong Lu, YaoWei Wang, Xiaodan Liang

Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets.

Contrastive Learning

Voxel Transformer for 3D Object Detection

1 code implementation ICCV 2021 Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, Chunjing Xu

We present Voxel Transformer (VoTr), a novel and effective voxel-based Transformer backbone for 3D object detection from point clouds.

Ranked #3 on 3D Object Detection on waymo vehicle (L1 mAP metric)

3D Object Detection Computational Efficiency +3

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

1 code implementation ICCV 2021 Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, Chunjing Xu

To resolve the problems, we propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest.

3D Object Detection object-detection

M3D-VTON: A Monocular-to-3D Virtual Try-On Network

1 code implementation ICCV 2021 Fuwei Zhao, Zhenyu Xie, Michael Kampffmeyer, Haoye Dong, Songfang Han, Tianxiang Zheng, Tao Zhang, Xiaodan Liang

Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value.

Virtual Try-on

WAS-VTON: Warping Architecture Search for Virtual Try-on Network

no code implementations1 Aug 2021 Zhenyu Xie, Xujie Zhang, Fuwei Zhao, Haoye Dong, Michael C. Kampffmeyer, Haonan Yan, Xiaodan Liang

Despite recent progress on image-based virtual try-on, current methods are constraint by shared warping networks and thus fail to synthesize natural try-on results when faced with clothing categories that require different warping operations.

Neural Architecture Search Virtual Try-on

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining

1 code implementation ICCV 2021 Xunlin Zhan, Yangxin Wu, Xiao Dong, Yunchao Wei, Minlong Lu, Yichi Zhang, Hang Xu, Xiaodan Liang

In this paper, we investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval among fine-grained product categories.


Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

1 code implementation23 Jul 2021 Bingqian Lin, Yi Zhu, Yanxin Long, Xiaodan Liang, Qixiang Ye, Liang Lin

Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.

Vision and Language Navigation Vision-Language Navigation

AutoBERT-Zero: Evolving BERT Backbone from Scratch

no code implementations15 Jul 2021 Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li

Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks.

Inductive Bias Language Modelling +3

Deep Learning for Embodied Vision Navigation: A Survey

no code implementations7 Jul 2021 Fengda Zhu, Yi Zhu, Vincent CS Lee, Xiaodan Liang, Xiaojun Chang

A navigation agent is supposed to have various intelligent skills, such as visual perceiving, mapping, planning, exploring and reasoning, etc.

Autonomous Driving Navigate +1

Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks

1 code implementation ACL 2021 Jinghui Qin, Xiaodan Liang, Yining Hong, Jianheng Tang, Liang Lin

Previous math word problem solvers following the encoder-decoder paradigm fail to explicitly incorporate essential math symbolic constraints, leading to unexplainable and unreasonable predictions.

Decoder Math

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

1 code implementation29 Jun 2021 Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu

Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.

Machine Translation Style Transfer +3

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving

no code implementations21 Jun 2021 Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Jiageng Mao, Chaoqiang Ye, Wei zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu

Experiments show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when fine-tuning with different downstream tasks (i. e., detection, semantic/instance segmentation) in autonomous driving domain.

Autonomous Driving Instance Segmentation +5

One Million Scenes for Autonomous Driving: ONCE Dataset

1 code implementation21 Jun 2021 Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei zhang, Zhenguo Li, Jie Yu, Hang Xu, Chunjing Xu

To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.

3D Object Detection Autonomous Driving +1

Prototypical Graph Contrastive Learning

1 code implementation17 Jun 2021 Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang

However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.

Clustering Contrastive Learning +1

Towards Quantifiable Dialogue Coherence Evaluation

1 code implementation ACL 2021 Zheng Ye, Liucun Lu, Lishan Huang, Liang Lin, Xiaodan Liang

To address these limitations, we propose Quantifiable Dialogue Coherence Evaluation (QuantiDCE), a novel framework aiming to train a quantifiable dialogue coherence metric that can reflect the actual human rating standards.

Coherence Evaluation Dialogue Evaluation +1

GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

1 code implementation Findings (ACL) 2021 Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, Liang Lin

Therefore, we propose a Geometric Question Answering dataset GeoQA, containing 4, 998 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems.

Math Mathematical Reasoning +1

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

2 code implementations CVPR 2021 Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

While existing NAS methods mostly design architectures on a single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.

Neural Architecture Search Transfer Learning

SOON: Scenario Oriented Object Navigation with Graph-based Exploration

1 code implementation CVPR 2021 Fengda Zhu, Xiwen Liang, Yi Zhu, Xiaojun Chang, Xiaodan Liang

In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description.

Attribute Navigate +2

DAGN: Discourse-Aware Graph Network for Logical Reasoning

2 code implementations NAACL 2021 Yinya Huang, Meng Fang, Yu Cao, LiWei Wang, Xiaodan Liang

The model encodes discourse information as a graph with elementary discourse units (EDUs) and discourse relations, and learns the discourse-aware features via a graph network for downstream QA tasks.

Logical Reasoning Sentence

Dynamic Slimmable Network

1 code implementation CVPR 2021 Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, Xiaojun Chang

Here, we explore a dynamic network slimming regime, named Dynamic Slimmable Network (DS-Net), which aims to achieve good hardware-efficiency via dynamically adjusting filter numbers of networks at test time with respect to different inputs, while keeping filters stored statically and contiguously in hardware to prevent the extra burden.

Fairness Model Compression

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

1 code implementation ICCV 2021 Changlin Li, Tao Tang, Guangrun Wang, Jiefeng Peng, Bing Wang, Xiaodan Liang, Xiaojun Chang

In this work, we present Block-wisely Self-supervised Neural Architecture Search (BossNAS), an unsupervised NAS method that addresses the problem of inaccurate architecture rating caused by large weight-sharing space and biased supervision in previous methods.

Image Classification Neural Architecture Search +1

A Data-Centric Framework for Composable NLP Workflows

1 code implementation EMNLP 2020 Zhengzhong Liu, Guanxiong Ding, Avinash Bukkittu, Mansi Gupta, Pengzhi Gao, Atif Ahmed, Shikun Zhang, Xin Gao, Swapnil Singhavi, Linwei Li, Wei Wei, Zecong Hu, Haoran Shi, Haoying Zhang, Xiaodan Liang, Teruko Mitamura, Eric P. Xing, Zhiting Hu

Empirical natural language processing (NLP) systems in application domains (e. g., healthcare, finance, education) involve interoperation among multiple components, ranging from data ingestion, human annotation, to text retrieval, analysis, generation, and visualization.

Retrieval Text Retrieval

SparseBERT: Rethinking the Importance Analysis in Self-attention

1 code implementation25 Feb 2021 Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok

A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.

Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search

1 code implementation ICLR 2021 Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, Zhenguo Li

For object detection, the well-established classification and regression loss functions have been carefully designed by considering diverse learning challenges.

Model Optimization object-detection +1

Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer

2 code implementations26 Jan 2021 Liang Lin, Yiming Gao, Ke Gong, Meng Wang, Xiaodan Liang

Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e. g., sharing discrepant label granularity) without extensive re-training.

Graph Representation Learning Human Parsing +2

UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers

1 code implementation20 Jan 2021 Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang

Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task.

reinforcement-learning Reinforcement Learning (RL) +1

Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition

no code implementations9 Jan 2021 Fuyu Wang, Xiaodan Liang, Lin Xu, Liang Lin

Beyond generating long and topic-coherent paragraphs in traditional captioning tasks, the medical image report composition task poses more task-oriented challenges by requiring both the highly-accurate medical term diagnosis and multiple heterogeneous forms of information including impression and findings.

Retrieval Sentence

Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection

no code implementations ICCV 2021 Hanxue Liang, Chenhan Jiang, Dapeng Feng, Xin Chen, Hang Xu, Xiaodan Liang, Wei zhang, Zhenguo Li, Luc van Gool

Here we present a novel self-supervised 3D Object detection framework that seamlessly integrates the geometry-aware contrast and clustering harmonization to lift the unsupervised 3D representation learning, named GCC-3D.

3D Object Detection Clustering +4

Erasure for Advancing: Dynamic Self-Supervised Learning for Commonsense Reasoning

no code implementations1 Jan 2021 Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin

To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.

Question Answering Self-Supervised Learning +1

TransNAS-Bench-101: Improving Transferrability and Generalizability of Cross-Task Neural Architecture Search

2 code implementations1 Jan 2021 Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

While existing NAS methods mostly design architectures on one single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.

Neural Architecture Search Transfer Learning

CAT-SAC: Soft Actor-Critic with Curiosity-Aware Entropy Temperature

no code implementations1 Jan 2021 Junfan Lin, Changxin Huang, Xiaodan Liang, Liang Lin

The curiosity is added to the target entropy to increase the entropy temperature for unfamiliar states and decrease the target entropy for familiar states.

Reinforcement Learning (RL)

NASOA: Towards Faster Task-oriented Online Fine-tuning

no code implementations1 Jan 2021 Hang Xu, Ning Kang, Gengwei Zhang, Xiaodan Liang, Zhenguo Li

The resulting model zoo is more training efficient than SOTA NAS models, e. g. 6x faster than RegNetY-16GF, and 1. 7x faster than EfficientNetB3.

Cloud Computing Neural Architecture Search