Search Results for author: Zhongdao Wang

Found 32 papers, 15 papers with code

Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning

no code implementations12 Nov 2024 Jianhao Li, Tianyu Sun, Xueqian Zhang, Zhongdao Wang, Bailan Feng, Hengshuang Zhao

To tackle this challenge, we find that a considerable portion of points in the accumulated point cloud is redundant, and discarding these points has minimal impact on perception accuracy.

3D Object Detection object-detection

QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model

1 code implementation9 Oct 2024 Fei Xie, Weijia Zhang, Zhongdao Wang, Chao Ma

Recent advancements in State Space Models, notably Mamba, have demonstrated superior performance over the dominant Transformer models, particularly in reducing the computational complexity from quadratic to linear.

Image Classification Instance Segmentation +6

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs

2 code implementations26 Sep 2024 Qinpeng Cui, Yixuan Liu, Xinyi Zhang, Qiqi Bao, Zhongdao Wang, Qingmin Liao, Li Wang, Tian Lu, Emad Barsoum

In this paper, we present DoSSR, a Domain Shift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models while significantly enhancing efficiency by initiating the diffusion process with low-resolution (LR) images.

Image Restoration Image Super-Resolution

ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning

no code implementations26 Sep 2024 Song Wang, Zhongdao Wang, Jiawei Yu, Wentong Li, Bailan Feng, Junbo Chen, Jianke Zhu

In this paper, we conduct a comprehensive evaluation of existing semantic occupancy prediction models from a reliability perspective for the first time.

Autonomous Driving

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

no code implementations18 Jul 2024 Mingkang Zhu, Xi Chen, Zhongdao Wang, Hengshuang Zhao, Jiaya Jia

Recent advances in text-to-image model customization have underscored the importance of integrating new concepts with a few examples.

VEON: Vocabulary-Enhanced Occupancy Prediction

no code implementations17 Jul 2024 Jilai Zheng, Pin Tang, Zhongdao Wang, Guoqing Wang, Xiangxuan Ren, Bailan Feng, Chao Ma

Hence, instead of building our model from scratch, we try to blend 2D foundation models, specifically a depth model MiDaS and a semantic model CLIP, to lift the semantics to 3D space, thus fulfilling 3D occupancy.

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

no code implementations16 Jul 2024 Jianhao Li, Tianyu Sun, Zhongdao Wang, Enze Xie, Bailan Feng, Hongbo Zhang, Ze Yuan, Ke Xu, Jiaheng Liu, Ping Luo

Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset.

Autonomous Driving

OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

no code implementations23 Apr 2024 Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem.

3D Semantic Occupancy Prediction Autonomous Driving +1

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

no code implementations CVPR 2024 Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied.

Autonomous Driving

Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving

1 code implementation13 Mar 2024 Hao Shi, Song Wang, Jiaming Zhang, Xiaoting Yin, Zhongdao Wang, Guangming Wang, Jianke Zhu, Kailun Yang, Kaiwei Wang

Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision.

3D Semantic Scene Completion Autonomous Driving

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

2 code implementations7 Mar 2024 Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li

In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution.

4k Image Captioning +1

Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation

no code implementations28 Jan 2024 Zhenyu Wang, Enze Xie, Aoxue Li, Zhongdao Wang, Xihui Liu, Zhenguo Li

Given a complex text prompt containing multiple concepts including objects, attributes, and relationships, the LLM agent initially decomposes it, which entails the extraction of individual objects, their associated attributes, and the prediction of a coherent scene layout.

Attribute Language Modelling +3

DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking

no code implementations CVPR 2024 Fei Xie, Zhongdao Wang, Chao Ma

To address this issue we cast visual tracking as a point set based denoising diffusion process and propose a novel generative learning based tracker dubbed DiffusionTrack.

Denoising Visual Object Tracking +1

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

3 code implementations30 Sep 2023 Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo Li

We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.

Image Generation Language Modelling

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

1 code implementation19 Apr 2023 Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo

These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.

3D Object Detection Autonomous Driving +3

MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation

no code implementations ICCV 2023 Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo

These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.

3D Object Detection Autonomous Driving +3

Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval

1 code implementation24 Oct 2022 Zhaopeng Dou, Zhongdao Wang, Weihua Chen, YaLi Li, Shengjin Wang

(3) the data uncertainty and the model uncertainty are jointly learned in a unified network, and they serve as two fundamental criteria for the reliability assessment: if a probe is high-quality (low data uncertainty) and the model is confident in the prediction of the probe (low model uncertainty), the final ranking will be assessed as reliable.

Image Retrieval Retrieval

Self-Supervised Learning via Maximum Entropy Coding

1 code implementation20 Oct 2022 Xin Liu, Zhongdao Wang, YaLi Li, Shengjin Wang

To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks.

Instance Segmentation object-detection +4

Adaptive Affinity for Associations in Multi-Target Multi-Camera Tracking

no code implementations14 Dec 2021 Yunzhong Hou, Zhongdao Wang, Shengjin Wang, Liang Zheng

In this paper, we design experiments to verify such misfit between global re-ID feature distances and local matching in tracking, and propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT.

How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?

1 code implementation3 Dec 2021 Yuchi Liu, Zhongdao Wang, Tom Gedeon, Liang Zheng

To this end, we develop a protocol to automatically synthesize large scale MiE training data that allow us to train improved recognition models for real-world test data.

Face Generation Micro-Expression Recognition

Do Different Tracking Tasks Require Different Appearance Models?

1 code implementation NeurIPS 2021 Zhongdao Wang, Hengshuang Zhao, Ya-Li Li, Shengjin Wang, Philip H. S. Torr, Luca Bertinetto

We show how most tracking tasks can be solved within this framework, and that the same appearance model can be successfully used to obtain results that are competitive against specialised methods for most of the tasks considered.

Multi-Object Tracking Multi-Object Tracking and Segmentation +10

Synthetic Data Are as Good as the Real for Association Knowledge Learning in Multi-object Tracking

no code implementations30 Jun 2021 Yuchi Liu, Zhongdao Wang, Xiangxin Zhou, Liang Zheng

We show that compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques.

Domain Adaptation Multi-Object Tracking

CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions

no code implementations ECCV 2020 Zhongdao Wang, Jingwei Zhang, Liang Zheng, Yixuan Liu, Yifan Sun, Ya-Li Li, Shengjin Wang

This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem, where existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering.

Clustering Multi-Object Tracking +2

Circle Loss: A Unified Perspective of Pair Similarity Optimization

14 code implementations CVPR 2020 Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, Yichen Wei

This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity $s_p$ and minimize the between-class similarity $s_n$.

 Ranked #1 on Face Verification on IJB-C (training dataset metric)

Face Recognition Face Verification +5

Locality Aware Appearance Metric for Multi-Target Multi-Camera Tracking

1 code implementation27 Nov 2019 Yunzhong Hou, Liang Zheng, Zhongdao Wang, Shengjin Wang

Due to the continuity of target trajectories, tracking systems usually restrict their data association within a local neighborhood.

Multi-Object Tracking

Towards Real-Time Multi-Object Tracking

12 code implementations ECCV 2020 Zhongdao Wang, Liang Zheng, Yixuan Liu, Ya-Li Li, Shengjin Wang

In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model.

Multiple Object Tracking Multi-Task Learning +2

Softmax Dissection: Towards Understanding Intra- and Inter-class Objective for Embedding Learning

no code implementations4 Aug 2019 Lanqing He, Zhongdao Wang, Ya-Li Li, Shengjin Wang

The softmax loss and its variants are widely used as objectives for embedding learning, especially in applications like face recognition.

Face Recognition Face Verification

Linkage Based Face Clustering via Graph Convolution Network

4 code implementations CVPR 2019 Zhongdao Wang, Liang Zheng, Ya-Li Li, Shengjin Wang

The key idea is that we find the local context in the feature space around an instance (face) contains rich information about the linkage relationship between this instance and its neighbors.

Clustering Face Clustering +1

Query Adaptive Late Fusion for Image Retrieval

no code implementations31 Oct 2018 Zhongdao Wang, Liang Zheng, Shengjin Wang

That is to say, for some queries, a feature may be neither discriminative nor complementary to existing ones, while for other queries, the feature suffices.

Image Retrieval Person Recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.