Search Results for author: Zhongdao Wang

Found 23 papers, 11 papers with code

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

no code implementations15 Apr 2024 Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied.

Autonomous Driving

OccFiner: Offboard Occupancy Refinement with Hybrid Propagation

no code implementations13 Mar 2024 Hao Shi, Song Wang, Jiaming Zhang, Xiaoting Yin, Zhongdao Wang, Zhijian Zhao, Guangming Wang, Jianke Zhu, Kailun Yang, Kaiwei Wang

Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision.

3D Semantic Scene Completion

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

no code implementations7 Mar 2024 Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li

In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution.

4k Image Captioning +1

Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation

no code implementations28 Jan 2024 Zhenyu Wang, Enze Xie, Aoxue Li, Zhongdao Wang, Xihui Liu, Zhenguo Li

Given a complex text prompt containing multiple concepts including objects, attributes, and relationships, the LLM agent initially decomposes it, which entails the extraction of individual objects, their associated attributes, and the prediction of a coherent scene layout.

Attribute Language Modelling +3

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

2 code implementations30 Sep 2023 Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo Li

We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.

Image Generation Language Modelling

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

1 code implementation19 Apr 2023 Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo

These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.

3D Object Detection Autonomous Driving +3

MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation

no code implementations ICCV 2023 Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo

These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.

3D Object Detection Autonomous Driving +3

Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval

1 code implementation24 Oct 2022 Zhaopeng Dou, Zhongdao Wang, Weihua Chen, YaLi Li, Shengjin Wang

(3) the data uncertainty and the model uncertainty are jointly learned in a unified network, and they serve as two fundamental criteria for the reliability assessment: if a probe is high-quality (low data uncertainty) and the model is confident in the prediction of the probe (low model uncertainty), the final ranking will be assessed as reliable.

Image Retrieval Retrieval

Self-Supervised Learning via Maximum Entropy Coding

1 code implementation20 Oct 2022 Xin Liu, Zhongdao Wang, YaLi Li, Shengjin Wang

To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks.

Instance Segmentation object-detection +4

Adaptive Affinity for Associations in Multi-Target Multi-Camera Tracking

no code implementations14 Dec 2021 Yunzhong Hou, Zhongdao Wang, Shengjin Wang, Liang Zheng

In this paper, we design experiments to verify such misfit between global re-ID feature distances and local matching in tracking, and propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT.

How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?

1 code implementation3 Dec 2021 Yuchi Liu, Zhongdao Wang, Tom Gedeon, Liang Zheng

To this end, we develop a protocol to automatically synthesize large scale MiE training data that allow us to train improved recognition models for real-world test data.

Face Generation Micro-Expression Recognition

Do Different Tracking Tasks Require Different Appearance Models?

1 code implementation NeurIPS 2021 Zhongdao Wang, Hengshuang Zhao, Ya-Li Li, Shengjin Wang, Philip H. S. Torr, Luca Bertinetto

We show how most tracking tasks can be solved within this framework, and that the same appearance model can be successfully used to obtain results that are competitive against specialised methods for most of the tasks considered.

Multi-Object Tracking Multi-Object Tracking and Segmentation +10

Synthetic Data Are as Good as the Real for Association Knowledge Learning in Multi-object Tracking

no code implementations30 Jun 2021 Yuchi Liu, Zhongdao Wang, Xiangxin Zhou, Liang Zheng

We show that compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques.

Domain Adaptation Multi-Object Tracking

CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions

no code implementations ECCV 2020 Zhongdao Wang, Jingwei Zhang, Liang Zheng, Yixuan Liu, Yifan Sun, Ya-Li Li, Shengjin Wang

This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem, where existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering.

Clustering Multi-Object Tracking +2

Circle Loss: A Unified Perspective of Pair Similarity Optimization

11 code implementations CVPR 2020 Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, Yichen Wei

This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity $s_p$ and minimize the between-class similarity $s_n$.

 Ranked #1 on Face Verification on IJB-C (training dataset metric)

Face Recognition Face Verification +4

Locality Aware Appearance Metric for Multi-Target Multi-Camera Tracking

1 code implementation27 Nov 2019 Yunzhong Hou, Liang Zheng, Zhongdao Wang, Shengjin Wang

Due to the continuity of target trajectories, tracking systems usually restrict their data association within a local neighborhood.

Multi-Object Tracking

Towards Real-Time Multi-Object Tracking

12 code implementations ECCV 2020 Zhongdao Wang, Liang Zheng, Yixuan Liu, Ya-Li Li, Shengjin Wang

In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model.

Multiple Object Tracking Multi-Task Learning +2

Softmax Dissection: Towards Understanding Intra- and Inter-class Objective for Embedding Learning

no code implementations4 Aug 2019 Lanqing He, Zhongdao Wang, Ya-Li Li, Shengjin Wang

The softmax loss and its variants are widely used as objectives for embedding learning, especially in applications like face recognition.

Face Recognition Face Verification

Linkage Based Face Clustering via Graph Convolution Network

4 code implementations CVPR 2019 Zhongdao Wang, Liang Zheng, Ya-Li Li, Shengjin Wang

The key idea is that we find the local context in the feature space around an instance (face) contains rich information about the linkage relationship between this instance and its neighbors.

Clustering Face Clustering +1

Query Adaptive Late Fusion for Image Retrieval

no code implementations31 Oct 2018 Zhongdao Wang, Liang Zheng, Shengjin Wang

That is to say, for some queries, a feature may be neither discriminative nor complementary to existing ones, while for other queries, the feature suffices.

Image Retrieval Person Recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.