Search Results for author: Hang Xu

Found 143 papers, 66 papers with code

LaneCorrect: Self-supervised Lane Detection

no code implementations • 23 Apr 2024 • Ming Nie, Xinyue Cai, Hang Xu, Li Zhang

Lane detection has evolved highly functional autonomous driving system to understand driving scenes even under complex environments.

Autonomous Driving Lane Detection

Paper
Add Code

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

1 code implementation • 22 Apr 2024 • Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games.

counterfactual

Paper
Code

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

no code implementations • 14 Apr 2024 • Lewei Yao, Renjie Pi, Jianhua Han, Xiaodan Liang, Hang Xu, Wei zhang, Zhenguo Li, Dan Xu

This is followed by a fine-tuning stage that leverages a small number of high-resolution samples to further enhance detection performance.

Dense Captioning Language Modelling +4

Paper
Add Code

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

no code implementations • 25 Mar 2024 • Qingping Zheng, Ling Zheng, Yuanfan Guo, Ying Li, Songcen Xu, Jiankang Deng, Hang Xu

Following this, the Reality Guidance Refinement (RGR) process refines artifacts by integrating this mask with realistic latent representations, improving alignment with the original image.

Super-Resolution

Paper
Add Code

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation

no code implementations • 18 Mar 2024 • Haochen Jiang, Yueming Xu, Yihan Zeng, Hang Xu, Wei zhang, Jianfeng Feng, Li Zhang

We model the geometric structure of the scene with occupancy representation and distill the pre-trained open vocabulary model into a 3D language field via volume rendering for zero-shot inference.

3D Reconstruction 3D Scene Reconstruction +3

Paper
Add Code

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model

no code implementations • 18 Mar 2024 • Runhui Huang, Kaixin Cai, Jianhua Han, Xiaodan Liang, Renjing Pei, Guansong Lu, Songcen Xu, Wei zhang, Hang Xu

Specifically, an inter-layer attention module is designed to encourage information exchange and learning between layers, while a text-guided intra-layer attention module incorporates layer-specific prompts to direct the specific-content generation for each layer.

Image Generation Style Transfer

Paper
Add Code

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

no code implementations • 14 Mar 2024 • Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung, James T. Kwok, Yu Zhang

Multimodal large language models (MLLMs) have shown impressive reasoning abilities, which, however, are also more vulnerable to jailbreak attacks than their LLM predecessors.

Optical Character Recognition (OCR)

Paper
Add Code

NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning

1 code implementation • 12 Mar 2024 • Bingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang

Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.

Navigate Vision and Language Navigation

Paper
Code

From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs

no code implementations • 28 Feb 2024 • Yulong Liu, Yunlong Yuan, Chunwei Wang, Jianhua Han, Yongqiang Ma, Li Zhang, Nanning Zheng, Hang Xu

In this work, we introduce a novel tool invocation pipeline designed to control massive real-world APIs.

In-Context Learning

Paper
Add Code

Optimal Parallelization Strategies for Active Flow Control in Deep Reinforcement Learning-Based Computational Fluid Dynamics

no code implementations • 18 Feb 2024 • Wang Jia, Hang Xu

Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems.

Paper
Add Code

Translating Images to Road Network:A Non-Autoregressive Sequence-to-Sequence Approach

2 code implementations • 13 Feb 2024 • Jiachen Lu, Renyuan Peng, Xinyue Cai, Hang Xu, Hongyang Li, Feng Wen, Wei zhang, Li Zhang

Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non-Euclidean data into an integer series called RoadNet Sequence.

Paper
Code

GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data

no code implementations • 9 Feb 2024 • Haoyuan Li, Yanpeng Zhou, Yihan Zeng, Hang Xu, Xiaodan Liang

3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions, which is curial to object identification, classification, and retrieval.

Language Modelling Retrieval

Paper
Add Code

Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts

no code implementations • 8 Feb 2024 • Zhili Liu, Kai Chen, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, James T. Kwok

It also obtains new state-of-the-art self-supervised learning results on detection and segmentation.

Self-Supervised Learning

Paper
Add Code

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

2 code implementations • 31 Jan 2024 • Renyuan Peng, Xinyue Cai, Hang Xu, Jiachen Lu, Feng Wen, Wei zhang, Li Zhang

Accurate extraction of lane graphs relies on precisely estimating vertex and edge information within the DAG.

Autonomous Driving Language Modelling

Paper
Code

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models

1 code implementation • 2 Jan 2024 • Xinpeng Ding, Jinahua Han, Hang Xu, Xiaodan Liang, Wei zhang, Xiaomeng Li

BEV-InMLLM integrates multi-view, spatial awareness, and temporal semantics to enhance MLLMs' capabilities on NuInstruct tasks.

Autonomous Driving

Paper
Code

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

no code implementations • 27 Dec 2023 • Guansong Lu, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei zhang, Hang Xu

Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges.

Computational Efficiency Denoising +1

Paper
Add Code

Rotational Augmented Noise2Inverse for Low-dose Computed Tomography Reconstruction

1 code implementation • 19 Dec 2023 • Hang Xu, Alessandro Perelli

In this work, we present a novel self-supervised method for Low Dose Computed Tomography (LDCT) reconstruction.

Paper
Code

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

no code implementations • 19 Dec 2023 • Yunhao Gou, Zhili Liu, Kai Chen, Lanqing Hong, Hang Xu, Aoxue Li, Dit-yan Yeung, James T. Kwok, Yu Zhang

Instruction tuning of Large Vision-language Models (LVLMs) has revolutionized the development of versatile models with zero-shot generalization across a wide range of downstream vision-language tasks.

Instruction Following Zero-shot Generalization

Paper
Add Code

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

1 code implementation • 18 Dec 2023 • Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, YuFei Wang, Lanqing Hong, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong

We first analyze the limitations of current Multimodal Large Language Models (MLLMs) in this area: they struggle to accurately comprehending basic geometric elements and their relationships.

Language Modelling Large Language Model

Paper
Code

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

1 code implementation • 11 Dec 2023 • Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, Rynson W. H. Lau, WangMeng Zuo

The priors are then regarded as input conditions to maintain reasonable geometries, in which conditional LoRA and weighted score are further proposed to optimize detailed textures.

3D Generation Text to 3D

Paper
Code

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

1 code implementation • 6 Dec 2023 • Ming Nie, Renyuan Peng, Chunwei Wang, Xinyue Cai, Jianhua Han, Hang Xu, Li Zhang

Large vision-language models (VLMs) have garnered increasing interest in autonomous driving areas, due to their advanced capabilities in complex reasoning tasks essential for highly autonomous vehicle behavior.

Autonomous Driving Decision Making

Paper
Code

DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance

no code implementations • 5 Dec 2023 • Cong Wang, Jiaxi Gu, Panwen Hu, Songcen Xu, Hang Xu, Xiaodan Liang

Especially for fidelity, our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.

Image to Video Generation

Paper
Add Code

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

1 code implementation • 5 Dec 2023 • Fengyuan Shi, Jiaxi Gu, Hang Xu, Songcen Xu, Wei zhang, LiMin Wang

Now text-to-image foundation models are widely applied to various downstream image synthesis tasks, such as controllable image generation and image editing, while downstream video synthesis tasks are less explored for several reasons.

Image Generation Model Selection +3

Paper
Code

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model

1 code implementation • 29 Nov 2023 • Haoyu Zhao, Tianyi Lu, Jiaxi Gu, Xing Zhang, Zuxuan Wu, Hang Xu, Yu-Gang Jiang

Identity-consistent video generation seeks to synthesize videos that are guided by both textual prompts and reference images of entities.

Ranked #1 on Video Generation on MSR-VTT

Denoising Image to Video Generation +1

Paper
Code

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models

no code implementations • 25 Oct 2023 • Tianyi Lu, Xing Zhang, Jiaxi Gu, Hang Xu, Renjing Pei, Songcen Xu, Zuxuan Wu

In this way, temporal consistency can be kept with video LDM while high-fidelity from the image LDM can also be exploited.

Denoising Video Editing

Paper
Add Code

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

no code implementations • 16 Oct 2023 • Kai Chen, Chunwei Wang, Kuo Yang, Jianhua Han, Lanqing Hong, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-yan Yeung, Lifeng Shang, Xin Jiang, Qun Liu

The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges.

Instruction Following

Paper
Add Code

A Survey on Video Diffusion Models

1 code implementation • 16 Oct 2023 • Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang

However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain.

Image Generation Video Editing +2

1,285

Paper
Code

Implicit Concept Removal of Diffusion Models

no code implementations • 9 Oct 2023 • Zhili Liu, Kai Chen, Yifan Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung, James Kwok

To address this, we utilize the intrinsic geometric characteristics of implicit concepts and present the Geom-Erasing, a novel concept removal method based on geometric-driven control.

Paper
Add Code

CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

1 code implementation • NeurIPS 2023 • Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

Open-vocabulary 3D Object Detection (OV-3DDet) aims to detect objects from an arbitrary list of categories within a 3D scene, which remains seldom explored in the literature.

3D Object Detection Object +3

137

Paper
Code

TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

no code implementations • 29 Sep 2023 • Tianyu Huang, Yihan Zeng, Bowen Dong, Hang Xu, Songcen Xu, Rynson W. H. Lau, WangMeng Zuo

To this end, an NTFGen module is proposed to model general text latent code in noisy fields.

3D Generation

Paper
Add Code

Baichuan 2: Open Large-scale Language Models

1 code implementation • 19 Sep 2023 • Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, WeiPeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu

Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering.

Feature Engineering GSM8K

3,927

Paper
Code

HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving

no code implementations • 11 Sep 2023 • Xinpeng Ding, Jianhua Han, Hang Xu, Wei zhang, Xiaomeng Li

For the first time, we leverage singular multimodal large language models (MLLMs) to consolidate multiple autonomous driving tasks from videos, i. e., the Risk Object Localization and Intention and Suggestion Prediction (ROLISP) task.

Autonomous Driving Object Localization

Paper
Add Code

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

no code implementations • 7 Sep 2023 • Jiaxi Gu, Shicong Wang, Haoyu Zhao, Tianyi Lu, Xing Zhang, Zuxuan Wu, Songcen Xu, Wei zhang, Yu-Gang Jiang, Hang Xu

Conditioned on an initial video clip with a small number of frames, additional frames are iteratively generated by reusing the original latent features and following the previous diffusion process.

Action Recognition Denoising +3

Paper
Add Code

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

no code implementations • ICCV 2023 • Cuican Yu, Guansong Lu, Yihan Zeng, Jian Sun, Xiaodan Liang, Huibin Li, Zongben Xu, Songcen Xu, Wei zhang, Hang Xu

In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D faces using text guidance.

3D Shape Generation Contrastive Learning +2

Paper
Add Code

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

no code implementations • 31 Aug 2023 • Qingping Zheng, Yuanfan Guo, Jiankang Deng, Jianhua Han, Ying Li, Songcen Xu, Hang Xu

Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes.

Image Generation

Paper
Add Code

GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training

no code implementations • ICCV 2023 • Xinchi Deng, Han Shi, Runhui Huang, Changlin Li, Hang Xu, Jianhua Han, James Kwok, Shen Zhao, Wei zhang, Xiaodan Liang

Compared with the existing methods, GrowCLIP improves 2. 3% average top-1 accuracy on zero-shot image classification of 9 downstream tasks.

Image Classification Image Retrieval +2

Paper
Add Code

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

no code implementations • ICCV 2023 • Xujie Zhang, BinBin Yang, Michael C. Kampffmeyer, Wenqing Zhang, Shiyue Zhang, Guansong Lu, Liang Lin, Hang Xu, Xiaodan Liang

Cross-modal garment synthesis and manipulation will significantly benefit the way fashion designers generate garments and modify their designs via flexible linguistic interfaces. Current approaches follow the general text-to-image paradigm and mine cross-modal relations via simple cross-attention modules, neglecting the structural correspondence between visual and textual representations in the fashion design domain.

Attribute Constituency Parsing +1

Paper
Add Code

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

no code implementations • ICCV 2023 • Runhui Huang, Jianhua Han, Guansong Lu, Xiaodan Liang, Yihan Zeng, Wei zhang, Hang Xu

DiffDis first formulates the image-text discriminative problem as a generative diffusion process of the text embedding from the text encoder conditioned on the image.

Image Generation Zero-Shot Learning

Paper
Add Code

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation

no code implementations • ICCV 2023 • Kaixin Cai, Pengzhen Ren, Yi Zhu, Hang Xu, Jianzhuang Liu, Changlin Li, Guangrun Wang, Xiaodan Liang

To address this issue, we propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation that enhances a model's ability to reorganize patches mixed across images, exploring both local visual relevance and global semantic coherence.

Segmentation Semantic Segmentation +1

Paper
Add Code

PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection

1 code implementation • ICCV 2023 • Ming Nie, Yujing Xue, Chunwei Wang, Chaoqiang Ye, Hang Xu, Xinge Zhu, Qingqiu Huang, Michael Bi Mi, Xinchao Wang, Li Zhang

Recently, polar-based representation has shown promising properties in perceptual tasks.

3D Object Detection object-detection

Paper
Code

FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

no code implementations • ICCV 2023 • Zhijian Huang, Sihao Lin, Guiyu Liu, Mukun Luo, Chaoqiang Ye, Hang Xu, Xiaojun Chang, Xiaodan Liang

Specifically, the gradients, produced by the task heads and used to update the shared backbone, will be calibrated at the backbone's last layer to alleviate the task conflict.

Autonomous Driving Multi-Task Learning

Paper
Add Code

SUIT: Learning Significance-guided Information for 3D Temporal Detection

no code implementations • 4 Jul 2023 • Zheyuan Zhou, Jiachen Lu, Yihan Zeng, Hang Xu, Li Zhang

To this end, we propose to learn Significance-gUided Information for 3D Temporal detection (SUIT), which simplifies temporal information as sparse features for information fusion across frames.

3D Object Detection Autonomous Driving +2

Paper
Add Code

CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation

no code implementations • 17 Jun 2023 • Xiwen Liang, Liang Ma, Shanshan Guo, Jianhua Han, Hang Xu, Shikui Ma, Xiaodan Liang

Understanding and following natural language instructions while navigating through complex, real-world environments poses a significant challenge for general-purpose robots.

Decision Making Instruction Following +4

Paper
Add Code

SLAMB: Accelerated Large Batch Training with Sparse Communication

1 code implementation • The International Conference on Machine Learning (ICML) 2023 • Hang Xu, Wenxuan Zhang, Jiawei Fei, Yuzhe Wu, Tingwen Xie, Jun Huang, Yuchen Xie, Mohamed Elhoseiny, Panos Kalnis

Distributed training of large deep neural networks requires frequent exchange of massive data between machines, thus communication efficiency is a major concern.

Paper
Code

RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment

1 code implementation • 31 May 2023 • Guian Fang, Zutao Jiang, Jianhua Han, Guansong Lu, Hang Xu, Shengcai Liao, Xiaodan Liang

Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from textual descriptions.

Caption Generation Language Modelling +3

Paper
Code

DetGPT: Detect What You Need via Reasoning

1 code implementation • 23 May 2023 • Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang

Overall, our proposed paradigm and DetGPT demonstrate the potential for more sophisticated and intuitive interactions between humans and machines.

Autonomous Driving Object +2

719

Paper
Code

Rethinking Boundary Discontinuity Problem for Oriented Object Detection

1 code implementation • 17 May 2023 • Hang Xu, Xinyuan Liu, Haonan Xu, Yike Ma, Zunjie Zhu, Chenggang Yan, Feng Dai

We decouple reversibility and joint-optim from single smoothing function into two distinct entities, which for the first time achieves the objectives of both correcting angular boundary and blending angle with other parameters. Extensive experiments on multiple datasets show that boundary discontinuity problem is well-addressed.

Object object-detection +2

Paper
Code

Boosting Visual-Language Models by Exploiting Hard Samples

1 code implementation • 9 May 2023 • Haonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi

In this work, we present HELIP, a cost-effective strategy tailored to enhance the performance of existing CLIP models without the need for training a model from scratch or collecting additional data.

Retrieval Zero-Shot Learning

Paper
Code

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

1 code implementation • 26 Apr 2023 • Bingqian Lin, Zicong Chen, Mingjie Li, Haokun Lin, Hang Xu, Yi Zhu, Jianzhuang Liu, Wenjia Cai, Lei Yang, Shen Zhao, Chenfei Wu, Ling Chen, Xiaojun Chang, Yi Yang, Lei Xing, Xiaodan Liang

In MOTOR, we combine two kinds of basic medical knowledge, i. e., general and specific knowledge, in a complementary manner to boost the general pretraining process.

Medical Visual Question Answering Question Answering +1

Paper
Code

Policy Resilience to Environment Poisoning Attacks on Reinforcement Learning

no code implementations • 24 Apr 2023 • Hang Xu, Xinghua Qu, Zinovi Rabinovich

This paper proposes such a policy-resilience mechanism based on an idea of knowledge sharing.

Meta-Learning reinforcement-learning +1

Paper
Add Code

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

1 code implementation • NeurIPS 2023 • Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, Zhenbo Liu, Bangjun Wang, Peijin Jia, Yuting Wang, Shengyin Jiang, Feng Wen, Hang Xu, Ping Luo, Junchi Yan, Wei zhang, Hongyang Li

Accurately depicting the complex traffic scene is a vital component for autonomous vehicles to execute correct judgments.

3D Lane Detection

488

Paper
Code

Graph-based Topology Reasoning for Driving Scenes

1 code implementation • 11 Apr 2023 • Tianyu Li, Li Chen, Huijie Wang, Yang Li, Jiazhi Yang, Xiangwei Geng, Shengyin Jiang, Yuting Wang, Hang Xu, Chunjing Xu, Junchi Yan, Ping Luo, Hongyang Li

Understanding the road genome is essential to realize autonomous driving.

Ranked #5 on 3D Lane Detection on OpenLane-V2 val

3D Lane Detection Autonomous Driving +1

233

Paper
Code

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment

no code implementations • CVPR 2023 • Lewei Yao, Jianhua Han, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Hang Xu

This paper presents DetCLIPv2, an efficient and scalable training framework that incorporates large-scale image-text pairs to achieve open-vocabulary object detection (OVD).

Language Modelling object-detection +1

Paper
Add Code

Mixed Autoencoder for Self-supervised Visual Representation Learning

1 code implementation • CVPR 2023 • Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung

Specifically, our MixedAE outperforms MAE by +0. 3% accuracy, +1. 7 mIoU and +0. 9 AP on ImageNet-1K, ADE20K and COCO respectively with a standard ViT-Base.

Contrastive Learning Data Augmentation +1

Paper
Code

CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

no code implementations • 22 Mar 2023 • Yihan Zeng, Chenhan Jiang, Jiageng Mao, Jianhua Han, Chaoqiang Ye, Qingqiu Huang, Dit-yan Yeung, Zhen Yang, Xiaodan Liang, Hang Xu

Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks.

Ranked #3 on Zero-shot 3D Point Cloud Classification on ScanNetV2

Zero-shot 3D Point Cloud Classification

Paper
Add Code

Towards Universal Vision-language Omni-supervised Segmentation

no code implementations • 12 Mar 2023 • Bowen Dong, Jiaxi Gu, Jianhua Han, Hang Xu, WangMeng Zuo

To improve the open-world segmentation ability, we leverage omni-supervised data (i. e., panoptic segmentation data, object detection data, and image-text pairs data) into training, thus enriching the open-world segmentation ability and achieving better segmentation accuracy.

Instance Segmentation object-detection +4

Paper
Add Code

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

no code implementations • CVPR 2023 • Yanxin Long, Youpeng Wen, Jianhua Han, Hang Xu, Pengzhen Ren, Wei zhang, Shen Zhao, Xiaodan Liang

Besides, our CapDet also achieves state-of-the-art performance on dense captioning tasks, e. g., 15. 44% mAP on VG V1. 2 and 13. 98% on the VG-COCO dataset.

Dense Captioning

Paper
Add Code

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

no code implementations • CVPR 2023 • Xiwen Liang, Minzhe Niu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan Liang

Multi-task learning has emerged as a powerful paradigm to solve a range of tasks simultaneously with good efficiency in both computation resources and inference time.

Autonomous Driving Lane Detection +4

Paper
Add Code

Entity-Level Text-Guided Image Manipulation

1 code implementation • 22 Feb 2023 • Yikai Wang, Jianan Wang, Guansong Lu, Hang Xu, Zhenguo Li, Wei zhang, Yanwei Fu

In the image manipulation phase, SeMani adopts a generative model to synthesize new images conditioned on the entity-irrelevant regions and target text descriptions.

Denoising Image Manipulation

Paper
Code

ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency

1 code implementation • 31 Jan 2023 • Pengzhen Ren, Changlin Li, Hang Xu, Yi Zhu, Guangrun Wang, Jianzhuang Liu, Xiaojun Chang, Xiaodan Liang

Specifically, we first propose text-to-views consistency modeling to learn correspondence for multiple views of the same input image.

Segmentation Semantic Segmentation

Paper
Code

Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST

no code implementations • bioRxiv 2023 • Yahui Long, Kok Siong Ang, Mengwei Li, Kian Long Kelvin Chong, Raman Sethi, Chengwei Zhong, Hang Xu, Zhiwei Ong, Karishma Sachaphibulkij, Ao Chen, Zeng Li, Huazhu Fu, Min Wu, Hsiu Kim Lina Lim, Longqi Liu, Jinmiao Chen

Lastly, compared to other methods, GraphST’s cell type deconvolution achieved higher accuracy on simulated data and better captured spatial niches such as the germinal centers of the lymph node in experimentally acquired data.

Clustering Contrastive Learning

Paper
Add Code

Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach

no code implementations • ICCV 2023 • Jiachen Lu, Renyuan Peng, Xinyue Cai, Hang Xu, Hongyang Li, Feng Wen, Wei zhang, Li Zhang

The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections.

Paper
Add Code

Gaussian Label Distribution Learning for Spherical Image Object Detection

no code implementations • CVPR 2023 • Hang Xu, Xinyuan Liu, Qiang Zhao, Yike Ma, Chenggang Yan, Feng Dai

Therefore, we propose GLDL-ATSS as a better training sample selection strategy for objects of the spherical image, which can alleviate the drawback of IoU threshold-based strategy of scale-sample imbalance.

Object object-detection +2

Paper
Add Code

PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval

no code implementations • ICCV 2023 • Peiyan Guan, Renjing Pei, Bin Shao, Jianzhuang Liu, Weimian Li, Jiaxi Gu, Hang Xu, Songcen Xu, Youliang Yan, Edmund Y. Lam

The parallel isomeric attention module is used as the video encoder, which consists of two parallel branches modeling the spatial-temporal information of videos from both patch and frame levels.

Ranked #3 on Video Retrieval on MSR-VTT-1kA

Representation Learning Retrieval +3

Paper
Add Code

CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data

no code implementations • CVPR 2023 • Yihan Zeng, Chenhan Jiang, Jiageng Mao, Jianhua Han, Chaoqiang Ye, Qingqiu Huang, Dit-yan Yeung, Zhen Yang, Xiaodan Liang, Hang Xu

Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks.

Paper
Add Code

ConQueR: Query Contrast Voxel-DETR for 3D Object Detection

1 code implementation • CVPR 2023 • Benjin Zhu, Zhe Wang, Shaoshuai Shi, Hang Xu, Lanqing Hong, Hongsheng Li

We thus propose a Query Contrast mechanism to explicitly enhance queries towards their best-matched GTs over all unmatched query predictions.

3D Object Detection Object +1

101

Paper
Code

NLIP: Noise-robust Language-Image Pre-training

no code implementations • 14 Dec 2022 • Runhui Huang, Yanxin Long, Jianhua Han, Hang Xu, Xiwen Liang, Chunjing Xu, Xiaodan Liang

Large-scale cross-modal pre-training paradigms have recently shown ubiquitous success on a wide range of downstream tasks, e. g., zero-shot classification, retrieval and image captioning.

Image Captioning Memorization +3

Paper
Add Code

3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation

no code implementations • 2 Dec 2022 • Zutao Jiang, Guansong Lu, Xiaodan Liang, Jihua Zhu, Wei zhang, Xiaojun Chang, Hang Xu

Here, we make the first attempt to achieve generic text-guided cross-category 3D object generation via a new 3D-TOGO model, which integrates a text-to-views generation module and a views-to-3D generation module.

3D Generation Contrastive Learning +2

Paper
Add Code

Generative Negative Text Replay for Continual Vision-Language Pretraining

no code implementations • 31 Oct 2022 • Shipeng Yan, Lanqing Hong, Hang Xu, Jianhua Han, Tinne Tuytelaars, Zhenguo Li, Xuming He

In this work, we focus on learning a VLP model with sequential chunks of image-text pair data.

Continual Learning Image Classification +5

Paper
Add Code

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

no code implementations • 20 Sep 2022 • Lewei Yao, Jianhua Han, Youpeng Wen, Xiaodan Liang, Dan Xu, Wei zhang, Zhenguo Li, Chunjing Xu, Hang Xu

We further design a concept dictionary~(with descriptions) from various online sources and detection datasets to provide prior knowledge for each concept.

object-detection Open World Object Detection

Paper
Add Code

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

no code implementations • 19 Sep 2022 • Xiwen Liang, Yangxin Wu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan Liang

Aiming towards a holistic understanding of multiple downstream tasks simultaneously, there is a need for extracting features with better transferability.

Autonomous Driving Multi-Task Learning +4

Paper
Add Code

Exploring Visual Interpretability for Contrastive Language-Image Pre-training

1 code implementation • 15 Sep 2022 • Yi Li, Hualiang Wang, Yiqun Duan, Hang Xu, Xiaomeng Li

For this problem, we propose the Explainable Contrastive Language-Image Pre-training (ECLIP), which corrects the explainability via the Masked Max Pooling.

Retrieval text similarity

Paper
Code

DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction

1 code implementation • 14 Sep 2022 • Kaichen Zhou, Lanqing Hong, Changhao Chen, Hang Xu, Chaoqiang Ye, Qingyong Hu, Zhenguo Li

Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.

Depth Estimation

Paper
Code

RCLane: Relay Chain Prediction for Lane Detection

no code implementations • 19 Jul 2022 • Shenghua Xu, Xinyue Cai, Bin Zhao, Li Zhang, Hang Xu, Yanwei Fu, xiangyang xue

This is because most of the existing lane detection methods either treat the lane detection as a dense prediction or a detection task, few of them consider the unique topologies (Y-shape, Fork-shape, nearly horizontal lane) of the lane markers, which leads to sub-optimal solution.

Lane Detection

Paper
Add Code

Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

no code implementations • 18 Jul 2022 • Quande Liu, Youpeng Wen, Jianhua Han, Chunjing Xu, Hang Xu, Xiaodan Liang

To bridge the gap between supervised semantic segmentation and real-world applications that acquires one model to recognize arbitrary new concepts, recent zero-shot segmentation attracts a lot of attention by exploring the relationships between unseen and seen object categories, yet requiring large amounts of densely-annotated data with diverse base classes.

Clustering Online Clustering +3

Paper
Add Code

CO^3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving

1 code implementation • 8 Jun 2022 • Runjian Chen, Yao Mu, Runsen Xu, Wenqi Shao, Chenhan Jiang, Hang Xu, Zhenguo Li, Ping Luo

In this paper, we propose CO^3, namely Cooperative Contrastive Learning and Contextual Shape Prediction, to learn 3D representation for outdoor-scene point clouds in an unsupervised manner.

Autonomous Driving Contrastive Learning +1

Paper
Code

Learning Ego 3D Representation as Ray Tracing

1 code implementation • 8 Jun 2022 • Jiachen Lu, Zheyuan Zhou, Xiatian Zhu, Hang Xu, Li Zhang

A self-driving perception model aims to extract 3D semantic representations from multiple cameras collectively into the bird's-eye-view (BEV) coordinate frame of the ego car in order to ground downstream planner.

3D Object Detection Computational Efficiency +4

104

Paper
Code

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing

no code implementations • 26 May 2022 • Zhili Liu, Jianhua Han, Lanqing Hong, Hang Xu, Kai Chen, Chunjing Xu, Zhenguo Li

On the other hand, for existing SSL methods, it is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.

Self-Supervised Learning

Paper
Add Code

Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning

2 code implementations • 25 May 2022 • Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, Lingpeng Kong

In this paradigm, the synthesized data from the PLM acts as the carrier of knowledge, which is used to train a task-specific model with orders of magnitude fewer parameters than the PLM, achieving both higher performance and efficiency than prompt-based zero-shot learning methods on PLMs.

text-classification Text Classification +1

Paper
Code

MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection

1 code implementation • 12 May 2022 • Xuesong Chen, Shaoshuai Shi, Benjin Zhu, Ka Chun Cheung, Hang Xu, Hongsheng Li

Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots.

Autonomous Driving object-detection +1

4,320

Paper
Code

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

1 code implementation • CVPR 2022 • BinBin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, Xiaodan Liang

To make ROSETTA automatically determine which experience is available and useful, a prototypical task correlation guided Gating Diversity Controller(GDC) is introduced to adaptively adjust the diversity of gates for the new task based on class-specific prototypes.

Continual Learning Object +2

Paper
Code

ONCE-3DLanes: Building Monocular 3D Lane Detection

2 code implementations • CVPR 2022 • Fan Yan, Ming Nie, Xinyue Cai, Jianhua Han, Hang Xu, Zhen Yang, Chaoqiang Ye, Yanwei Fu, Michael Bi Mi, Li Zhang

We present ONCE-3DLanes, a real-world autonomous driving dataset with lane layout annotation in 3D space.

3D Lane Detection Autonomous Driving

394

Paper
Code

Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search

1 code implementation • CVPR 2022 • Minbin Huang, Zhijian Huang, Changlin Li, Xin Chen, Hang Xu, Zhenguo Li, Xiaodan Liang

It is able to find top 0. 16\% and 0. 29\% architectures on average on two search spaces under the budget of only 50 models.

Neural Architecture Search Relation

Paper
Code

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation

1 code implementation • CVPR 2022 • Jianan Wang, Guansong Lu, Hang Xu, Zhenguo Li, Chunjing Xu, Yanwei Fu

Existing text-guided image manipulation methods aim to modify the appearance of the image or to edit a few objects in a virtual or simple scenario, which is far from practical application.

Image Generation Image Manipulation

Paper
Code

Point2Seq: Detecting 3D Objects as Sequences

1 code implementation • CVPR 2022 • Yujing Xue, Jiageng Mao, Minzhe Niu, Hang Xu, Michael Bi Mi, Wei zhang, Xiaogang Wang, Xinchao Wang

We further propose a lightweight scene-to-sequence decoder that can auto-regressively generate words conditioned on features from a 3D scene as well as cues from the preceding words.

3D Object Detection Object +1

Paper
Code

Laneformer: Object-aware Row-Column Transformers for Lane Detection

no code implementations • 18 Mar 2022 • Jianhua Han, Xiajun Deng, Xinyue Cai, Zhen Yang, Hang Xu, Chunjing Xu, Xiaodan Liang

We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is a long-standing research topic for visual perception in autonomous driving.

Autonomous Driving Lane Detection +1

Paper
Add Code

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

no code implementations • 15 Mar 2022 • Kaican Li, Kai Chen, Haoyu Wang, Lanqing Hong, Chaoqiang Ye, Jianhua Han, Yukuai Chen, Wei zhang, Chunjing Xu, Dit-yan Yeung, Xiaodan Liang, Zhenguo Li, Hang Xu

One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases.

Autonomous Driving Object +2

Paper
Add Code

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

1 code implementation • ACL 2022 • Xiwen Liang, Fengda Zhu, Lingling Li, Hang Xu, Xiaodan Liang

To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP).

Domain Adaptation Vision-Language Navigation

Paper
Code

Revisiting Over-smoothing in BERT from the Perspective of Graph

no code implementations • ICLR 2022 • Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, James T. Kwok

Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields.

Paper
Add Code

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

3 code implementations • 16 Feb 2022 • Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong

There is a growing interest in dataset generation recently due to the superior generative capacity of large pre-trained language models (PLMs).

Knowledge Distillation Natural Language Inference +5

Paper
Code

Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark

1 code implementation • 14 Feb 2022 • Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei zhang, Xin Jiang, Chunjing Xu, Hang Xu

Experiments show that Wukong can serve as a promising Chinese pre-training dataset and benchmark for different cross-modal learning methods.

Ranked #6 on Image Retrieval on MUGE Retrieval

Benchmarking Contrastive Learning +6

Paper
Code

Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training

no code implementations • NeurIPS 2021 • Zeng Yihan, Chunwei Wang, Yunbo Wang, Hang Xu, Chaoqiang Ye, Zhen Yang, Chao Ma

First, 3D-CoCo is inspired by our observation that the bird-eye-view (BEV) features are more transferable than low-level geometry features.

Cloud Detection Domain Adaptation

Paper
Add Code

FILIP: Fine-grained Interactive Language-Image Pre-Training

1 code implementation • ICLR 2022 • Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu

In this paper, we introduce a large-scale Fine-grained Interactive Language-Image Pre-training (FILIP) to achieve finer-level alignment through a cross-modal late interaction mechanism, which uses a token-wise maximum similarity between visual and textual tokens to guide the contrastive objective.

Image Classification Retrieval +2

649

Paper
Code

SOFT: Softmax-free Transformer with Linear Complexity

2 code implementations • NeurIPS 2021 • Jiachen Lu, Jinghan Yao, Junge Zhang, Xiatian Zhu, Hang Xu, Weiguo Gao, Chunjing Xu, Tao Xiang, Li Zhang

Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.

Computational Efficiency

292

Paper
Code

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

1 code implementation • Findings (EMNLP) 2021 • Chenhe Dong, Guangrun Wang, Hang Xu, Jiefeng Peng, Xiaozhe Ren, Xiaodan Liang

In this paper, we have a critical insight that improving the feed-forward network (FFN) in BERT has a higher gain than improving the multi-head attention (MHA) since the computational cost of FFN is 2$\sim$3 times larger than MHA.

Data Augmentation Knowledge Distillation

Paper
Code

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

1 code implementation • ICCV 2021 • Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, Chunjing Xu

To resolve the problems, we propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest.

Ranked #2 on 3D Object Detection on waymo vehicle (AP metric)

3D Object Detection object-detection

Paper
Code

Voxel Transformer for 3D Object Detection

1 code implementation • ICCV 2021 • Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, Chunjing Xu

We present Voxel Transformer (VoTr), a novel and effective voxel-based Transformer backbone for 3D object detection from point clouds.

Ranked #3 on 3D Object Detection on waymo vehicle (L1 mAP metric)

3D Object Detection Computational Efficiency +3

235

Paper
Code

Adversarial Robustness for Unsupervised Domain Adaptation

no code implementations • ICCV 2021 • Muhammad Awais, Fengwei Zhou, Hang Xu, Lanqing Hong, Ping Luo, Sung-Ho Bae, Zhenguo Li

Extensive Unsupervised Domain Adaptation (UDA) studies have shown great success in practice by learning transferable representations across a labeled source domain and an unlabeled target domain with deep models.

Adversarial Robustness Unsupervised Domain Adaptation

Paper
Add Code

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving

1 code implementation • ICCV 2021 • Kai Chen, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung

By pre-training on SODA10M, a large-scale autonomous driving dataset, MultiSiam exceeds the ImageNet pre-trained MoCo-v2, demonstrating the potential of domain-specific pre-training.

Autonomous Driving Image Clustering +2

Paper
Code

Unbiased IoU for Spherical Image Object Detection

no code implementations • 18 Aug 2021 • Qiang Zhao, Bin Chen, Hang Xu, Yike Ma, XiaoDong Li, Bailan Feng, Chenggang Yan, Feng Dai

In this paper, we first identify that spherical rectangles are unbiased bounding boxes for objects in spherical images, and then propose an analytical method for IoU calculation without any approximations.

Object object-detection +1

Paper
Add Code

G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation

no code implementations • ICCV 2021 • Lewei Yao, Renjie Pi, Hang Xu, Wei zhang, Zhenguo Li, Tong Zhang

In this paper, we investigate the knowledge distillation (KD) strategy for object detection and propose an effective framework applicable to both homogeneous and heterogeneous student-teacher pairs.

Knowledge Distillation object-detection +1

Paper
Add Code

NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models

no code implementations • ICCV 2021 • Hang Xu, Ning Kang, Gengwei Zhang, Chuanlong Xie, Xiaodan Liang, Zhenguo Li

Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks.

Cloud Computing Neural Architecture Search

Paper
Add Code

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining

1 code implementation • ICCV 2021 • Xunlin Zhan, Yangxin Wu, Xiao Dong, Yunchao Wei, Minlong Lu, Yichi Zhang, Hang Xu, Xiaodan Liang

In this paper, we investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval among fine-grained product categories.

Retrieval

Paper
Code

AutoBERT-Zero: Evolving BERT Backbone from Scratch

no code implementations • 15 Jul 2021 • Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li

Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks.

Ranked #10 on Semantic Textual Similarity on MRPC

Inductive Bias Language Modelling +3

Paper
Add Code

One Million Scenes for Autonomous Driving: ONCE Dataset

1 code implementation • 21 Jun 2021 • Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei zhang, Zhenguo Li, Jie Yu, Hang Xu, Chunjing Xu

To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.

3D Object Detection Autonomous Driving +1

173

Paper
Code

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving

no code implementations • 21 Jun 2021 • Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Jiageng Mao, Chaoqiang Ye, Wei zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu

Experiments show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when fine-tuning with different downstream tasks (i. e., detection, semantic/instance segmentation) in autonomous driving domain.

Autonomous Driving Instance Segmentation +5

Paper
Add Code

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

no code implementations • CVPR 2021 • Lewei Yao, Renjie Pi, Hang Xu, Wei zhang, Zhenguo Li, Tong Zhang

For student morphism, weight inheritance strategy is adopted, allowing the student to flexibly update its architecture while fully utilize the predecessor's weights, which considerably accelerates the search; To facilitate dynamic distillation, an elastic teacher pool is trained via integrated progressive shrinking strategy, from which teacher detectors can be sampled without additional cost in subsequent searches.

Knowledge Distillation Neural Architecture Search +2

Paper
Add Code

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search

2 code implementations • CVPR 2021 • Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

While existing NAS methods mostly design architectures on a single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.

Neural Architecture Search Transfer Learning

Paper
Code

DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning

1 code implementation • NeurIPS 2021 • Hang Xu, Kelly Kostopoulou, Aritra Dutta, Xin Li, Alexandros Ntoulas, Panos Kalnis

DeepReduce is orthogonal to existing gradient sparsifiers and can be applied in conjunction with them, transparently to the end-user, to significantly lower the communication overhead.

Paper
Code

BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening

no code implementations • 13 May 2021 • Wenqi Shao, Hang Yu, Zhaoyang Zhang, Hang Xu, Zhenguo Li, Ping Luo

To address this problem, we develop a probability-based pruning algorithm, called batch whitening channel pruning (BWCP), which can stochastically discard unimportant channels by modeling the probability of a channel being activated.

Paper
Add Code

Effective Sparsification of Neural Networks with Global Sparsity Constraint

1 code implementation • CVPR 2021 • Xiao Zhou, Weizhong Zhang, Hang Xu, Tong Zhang

Weight pruning is an effective technique to reduce the model size and inference time for deep neural networks in real-world deployments.

Ranked #6 on Network Pruning on ImageNet - ResNet 50 - 90% sparsity

Network Pruning

Paper
Code

Deeply Unsupervised Patch Re-Identification for Pre-training Object Detectors

no code implementations • 8 Mar 2021 • Jian Ding, Enze Xie, Hang Xu, Chenhan Jiang, Zhenguo Li, Ping Luo, Gui-Song Xia

Unsupervised pre-training aims at learning transferable features that are beneficial for downstream tasks.

Object object-detection +3

Paper
Add Code

SparseBERT: Rethinking the Importance Analysis in Self-attention

1 code implementation • 25 Feb 2021 • Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok

A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.

Paper
Code

L2E: Learning to Exploit Your Opponent

no code implementations • 18 Feb 2021 • Zhe Wu, Kai Li, Enmin Zhao, Hang Xu, Meng Zhang, Haobo Fu, Bo An, Junliang Xing

In this work, we propose a novel Learning to Exploit (L2E) framework for implicit opponent modeling.

Paper
Add Code

Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search

1 code implementation • ICLR 2021 • Peidong Liu, Gengwei Zhang, Bochao Wang, Hang Xu, Xiaodan Liang, Yong Jiang, Zhenguo Li

For object detection, the well-established classification and regression loss functions have been carefully designed by considering diverse learning challenges.

Model Optimization object-detection +1

Paper
Code

DetCo: Unsupervised Contrastive Learning for Object Detection

2 code implementations • ICCV 2021 • Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Peize Sun, Zhenguo Li, Ping Luo

Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection.

Contrastive Learning Image Classification +2

264

Paper
Code

DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning

1 code implementation • NeurIPS 2021 • Kelly Kostopoulou, Hang Xu, Aritra Dutta, Xin Li, Alexandros Ntoulas, Panos Kalnis

This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors, tailored for distributed deep learning.

Paper
Code

Segmenting Transparent Object in the Wild with Transformer

2 code implementations • 21 Jan 2021 • Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo

This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset.

Ranked #3 on Semantic Segmentation on Trans10K

Object Segmentation +2

1,185

Paper
Code

Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection

no code implementations • ICCV 2021 • Hanxue Liang, Chenhan Jiang, Dapeng Feng, Xin Chen, Hang Xu, Xiaodan Liang, Wei zhang, Zhenguo Li, Luc van Gool

Here we present a novel self-supervised 3D Object detection framework that seamlessly integrates the geometry-aware contrast and clustering harmonization to lift the unsupervised 3D representation learning, named GCC-3D.

3D Object Detection Clustering +4

Paper
Add Code

C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing

no code implementations • ICCV 2021 • Yanning Zhou, Hang Xu, Wei zhang, Bin Gao, Pheng-Ann Heng

The semi-supervised semantic segmentation methods utilize the unlabeled data to increase the feature discriminative ability to alleviate the burden of the annotated data.

Contrastive Learning Data Augmentation +1

Paper
Add Code

TransNAS-Bench-101: Improving Transferrability and Generalizability of Cross-Task Neural Architecture Search

2 code implementations • 1 Jan 2021 • Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

While existing NAS methods mostly design architectures on one single task, algorithms that look beyond single-task search are surging to pursue a more efficient and universal solution across various tasks.

Neural Architecture Search Transfer Learning

334

Paper
Code

NASOA: Towards Faster Task-oriented Online Fine-tuning

no code implementations • 1 Jan 2021 • Hang Xu, Ning Kang, Gengwei Zhang, Xiaodan Liang, Zhenguo Li

The resulting model zoo is more training efficient than SOTA NAS models, e. g. 6x faster than RegNetY-16GF, and 1. 7x faster than EfficientNetB3.

Cloud Computing Neural Architecture Search

Paper
Add Code

OpenHoldem: A Benchmark for Large-Scale Imperfect-Information Game Research

no code implementations • 11 Dec 2020 • Kai Li, Hang Xu, Enmin Zhao, Zhe Wu, Junliang Xing

Owning to the unremitting efforts by a few institutes, significant progress has recently been made in designing superhuman AIs in No-limit Texas Hold'em (NLTH), the primary testbed for large-scale imperfect-information game research.

Paper
Add Code

Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation

no code implementations • 7 Dec 2020 • Gengwei Zhang, Yiming Gao, Hang Xu, Hao Zhang, Zhenguo Li, Xiaodan Liang

Panoptic segmentation that unifies instance segmentation and semantic segmentation has recently attracted increasing attention.

Ranked #17 on Panoptic Segmentation on COCO test-dev

Instance Segmentation Panoptic Segmentation +1

Paper
Add Code

VEGA: Towards an End-to-End Configurable AutoML Pipeline

1 code implementation • 3 Nov 2020 • Bochao Wang, Hang Xu, Jiajin Zhang, Chen Chen, Xiaozhi Fang, Yixing Xu, Ning Kang, Lanqing Hong, Chenhan Jiang, Xinyue Cai, Jiawei Li, Fengwei Zhou, Yong Li, Zhicheng Liu, Xinghao Chen, Kai Han, Han Shu, Dehua Song, Yunhe Wang, Wei zhang, Chunjing Xu, Zhenguo Li, Wenzhi Liu, Tong Zhang

Automated Machine Learning (AutoML) is an important industrial solution for automatic discovery and deployment of the machine learning models.

BIG-bench Machine Learning Data Augmentation +3

834

Paper
Code

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

2 code implementations • NeurIPS 2020 • Yangxin Wu, Gengwei Zhang, Hang Xu, Xiaodan Liang, Liang Lin

In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm.

Instance Segmentation Panoptic Segmentation +2

Paper
Code

Driver Anomaly Detection: A Dataset and Contrastive Learning Approach

1 code implementation • 30 Sep 2020 • Okan Köpüklü, Jiapeng Zheng, Hang Xu, Gerhard Rigoll

For this task, we introduce a new video-based benchmark, the Driver Anomaly Detection (DAD) dataset, which contains normal driving videos together with a set of anomalous actions in its training set.

Anomaly Detection Contrastive Learning +1

111

Paper
Code

CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending

1 code implementation • ECCV 2020 • Hang Xu, Shaoju Wang, Xinyue Cai, Wei zhang, Xiaodan Liang, Zhenguo Li

In this paper, we propose a novel lane-sensitive architecture search framework named CurveLane-NAS to automatically capture both long-ranged coherent and accurate short-range curve information while unifying both architecture search and post-processing on curve lane predictions via point blending.

Ranked #12 on Lane Detection on CurveLanes

Autonomous Driving Lane Detection

834

Paper
Code

AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling

no code implementations • ECCV 2020 • Wenshuo Ma, Tingzhong Tian, Hang Xu, Yimin Huang, Zhenguo Li

By carefully analyzing the existing bounding box patterns on the feature hierarchy, we design a flexible and tight hyper-parameter space for anchor configurations.

Bayesian Optimization object-detection +1

Paper
Add Code

CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search

no code implementations • ECCV 2020 • Xin Chen, Yawen Duan, Zewei Chen, Hang Xu, Zihao Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li

In spite of its remarkable progress, many algorithms are restricted to particular search spaces.

Ranked #13 on Neural Architecture Search on NAS-Bench-201, ImageNet-16-120 (Accuracy (Val) metric)

Meta-Learning Meta Reinforcement Learning +3

Paper
Add Code

JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

1 code implementation • ECCV 2020 • Linpu Fang, Xingyan Liu, Li Liu, Hang Xu, Wenxiong Kang

The key ideas are two-fold: a) explicitly modeling the dependencies among joints and the relations between the pixels and the joints for better local feature representation learning; b) unifying the dense pixel-wise offset predictions and direct joint regression for end-to-end training.

3D Hand Pose Estimation regression +1

Paper
Code

ElixirNet: Relation-aware Network Architecture Adaptation for Medical Lesion Detection

no code implementations • 3 Mar 2020 • Chenhan Jiang, Shaoju Wang, Hang Xu, Xiaodan Liang, Nong Xiao

Is a hand-crafted detection network tailored for natural image undoubtedly good enough over a discrepant medical lesion domain?

Lesion Detection medical image detection +1

Paper
Add Code

EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with Cascade Refinement

no code implementations • 18 Feb 2020 • Linpu Fang, Hang Xu, Zhili Liu, Sarah Parisot, Zhenguo Li

In this paper, we study the hybrid-supervised object detection problem, aiming to train a high quality detector with only a limited amount of fullyannotated data and fully exploiting cheap data with imagelevel labels.

Object object-detection +1

Paper
Add Code

Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN

no code implementations • 18 Feb 2020 • Hang Xu, Linpu Fang, Xiaodan Liang, Wenxiong Kang, Zhenguo Li

Finally, an InterDomain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally.

Object object-detection +2

Paper
Add Code

SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection

no code implementations • 22 Nov 2019 • Lewei Yao, Hang Xu, Wei zhang, Xiaodan Liang, Zhenguo Li

In this paper, we present a two-stage coarse-to-fine searching strategy named Structural-to-Modular NAS (SM-NAS) for searching a GPU-friendly design of both an efficient combination of modules and better modular-level architecture for object detection.

Neural Architecture Search Object +2

Paper
Add Code

Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

1 code implementation • NeurIPS 2020 • Han Shi, Renjie Pi, Hang Xu, Zhenguo Li, James T. Kwok, Tong Zhang

In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.

Bayesian Optimization Neural Architecture Search

496

Paper
Code

AIM 2019 Challenge on Constrained Super-Resolution: Methods and Results

2 code implementations • 4 Nov 2019 • Kai Zhang, Shuhang Gu, Radu Timofte, Zheng Hui, Xiumei Wang, Xinbo Gao, Dongliang Xiong, Shuai Liu, Ruipeng Gang, Nan Nan, Chenghua Li, Xueyi Zou, Ning Kang, Zhan Wang, Hang Xu, Chaofeng Wang, Zheng Li, Lin-Lin Wang, Jun Shi, Wenyu Sun, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Yazhe Niu, Peijin Zhuo, Xiangzhen Kong, Long Sun, Wenhao Wang

The challenge had 3 tracks.

Image Super-Resolution

416

Paper
Code

Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification

no code implementations • ICCV 2019 • Hang Xu, Lewei Yao, Wei Zhang, Xiaodan Liang, Zhenguo Li

Abstract Neural architecture search (NAS) has shown great potential in automating the manual process of designing a good CNN architecture for image classification.

Classification General Classification +4

Paper
Add Code

Multi-objective Neural Architecture Search via Predictive Network Performance Optimization

no code implementations • 25 Sep 2019 • Han Shi, Renjie Pi, Hang Xu, Zhenguo Li, James T. Kwok, Tong Zhang

Inspired by the nature of the graph structure of a neural network, we propose BOGCN-NAS, a NAS algorithm using Bayesian Optimization with Graph Convolutional Network (GCN) predictor.

Bayesian Optimization Neural Architecture Search

Paper
Add Code

MANAS: Multi-Agent Neural Architecture Search

no code implementations • 3 Sep 2019 • Vasco Lopes, Fabio Maria Carlucci, Pedro M Esperança, Marco Singh, Victor Gabillon, Antoine Yang, Hang Xu, Zewei Chen, Jun Wang

The Neural Architecture Search (NAS) problem is typically formulated as a graph search problem where the goal is to learn the optimal operations over edges in order to maximise a graph-level global objective.

Neural Architecture Search

Paper
Add Code

Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection

no code implementations • CVPR 2019 • Hang Xu, Chenhan Jiang, Xiaodan Liang, Liang Lin, Zhenguo Li

In this paper, we address the large-scale object detection problem with thousands of categories, which poses severe challenges due to long-tail data distributions, heavy occlusions, and class ambiguities.

Object object-detection +1

Paper
Add Code

Spatial-Aware Graph Relation Network for Large-Scale Object Detection

no code implementations • CVPR 2019 • Hang Xu, Chenhan Jiang, Xiaodan Liang, Zhenguo Li

How to proper encode high-order object relation in the detection system without any external knowledge?

Object object-detection +4

Paper
Add Code

Hybrid Knowledge Routed Modules for Large-scale Object Detection

1 code implementation • NeurIPS 2018 • Chenhan Jiang, Hang Xu, Xiangdan Liang, Liang Lin

The dominant object detection approaches treat the recognition of each region separately and overlook crucial semantic correlations between objects in one scene.

Object object-detection +1

103

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.