no code implementations • 12 Nov 2024 • Jianhao Li, Tianyu Sun, Xueqian Zhang, Zhongdao Wang, Bailan Feng, Hengshuang Zhao
To tackle this challenge, we find that a considerable portion of points in the accumulated point cloud is redundant, and discarding these points has minimal impact on perception accuracy.
1 code implementation • 9 Oct 2024 • Fei Xie, Weijia Zhang, Zhongdao Wang, Chao Ma
Recent advancements in State Space Models, notably Mamba, have demonstrated superior performance over the dominant Transformer models, particularly in reducing the computational complexity from quadratic to linear.
2 code implementations • 26 Sep 2024 • Qinpeng Cui, Yixuan Liu, Xinyi Zhang, Qiqi Bao, Zhongdao Wang, Qingmin Liao, Li Wang, Tian Lu, Emad Barsoum
In this paper, we present DoSSR, a Domain Shift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models while significantly enhancing efficiency by initiating the diffusion process with low-resolution (LR) images.
no code implementations • 26 Sep 2024 • Song Wang, Zhongdao Wang, Jiawei Yu, Wentong Li, Bailan Feng, Junbo Chen, Jianke Zhu
In this paper, we conduct a comprehensive evaluation of existing semantic occupancy prediction models from a reliability perspective for the first time.
no code implementations • 18 Jul 2024 • Mingkang Zhu, Xi Chen, Zhongdao Wang, Hengshuang Zhao, Jiaya Jia
Recent advances in text-to-image model customization have underscored the importance of integrating new concepts with a few examples.
no code implementations • 17 Jul 2024 • Jilai Zheng, Pin Tang, Zhongdao Wang, Guoqing Wang, Xiangxuan Ren, Bailan Feng, Chao Ma
Hence, instead of building our model from scratch, we try to blend 2D foundation models, specifically a depth model MiDaS and a semantic model CLIP, to lift the semantics to 3D space, thus fulfilling 3D occupancy.
no code implementations • 16 Jul 2024 • Jianhao Li, Tianyu Sun, Zhongdao Wang, Enze Xie, Bailan Feng, Hongbo Zhang, Ze Yuan, Ke Xu, Jiaheng Liu, Ping Luo
Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset.
no code implementations • 23 Apr 2024 • Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma
Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem.
no code implementations • CVPR 2024 • Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma
Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied.
1 code implementation • 13 Mar 2024 • Hao Shi, Song Wang, Jiaming Zhang, Xiaoting Yin, Zhongdao Wang, Guangming Wang, Jianke Zhu, Kailun Yang, Kaiwei Wang
Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision.
2 code implementations • 7 Mar 2024 • Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li
In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution.
no code implementations • 28 Jan 2024 • Zhenyu Wang, Enze Xie, Aoxue Li, Zhongdao Wang, Xihui Liu, Zhenguo Li
Given a complex text prompt containing multiple concepts including objects, attributes, and relationships, the LLM agent initially decomposes it, which entails the extraction of individual objects, their associated attributes, and the prediction of a coherent scene layout.
no code implementations • CVPR 2024 • Fei Xie, Zhongdao Wang, Chao Ma
To address this issue we cast visual tracking as a point set based denoising diffusion process and propose a novel generative learning based tracker dubbed DiffusionTrack.
3 code implementations • 30 Sep 2023 • Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo Li
We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.
2 code implementations • ICCV 2023 • Zhaopeng Dou, Zhongdao Wang, YaLi Li, Shengjin Wang
To overcome the barriers of data and annotation, we propose to utilize large-scale unsupervised data for training.
Generalizable Person Re-identification Representation Learning
1 code implementation • 19 Apr 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo
These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.
no code implementations • ICCV 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo
These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.
no code implementations • 7 Nov 2022 • Zhongdao Wang, Zhaopeng Dou, Jingwei Zhang, Liang Zheng, Yifan Sun, YaLi Li, Shengjin Wang
In this paper, we are interested in learning a generalizable person re-identification (re-ID) representation from unlabeled videos.
Domain Generalization Generalizable Person Re-identification +1
1 code implementation • 24 Oct 2022 • Zhaopeng Dou, Zhongdao Wang, Weihua Chen, YaLi Li, Shengjin Wang
(3) the data uncertainty and the model uncertainty are jointly learned in a unified network, and they serve as two fundamental criteria for the reliability assessment: if a probe is high-quality (low data uncertainty) and the model is confident in the prediction of the probe (low model uncertainty), the final ranking will be assessed as reliable.
1 code implementation • 20 Oct 2022 • Xin Liu, Zhongdao Wang, YaLi Li, Shengjin Wang
To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks.
no code implementations • 14 Dec 2021 • Yunzhong Hou, Zhongdao Wang, Shengjin Wang, Liang Zheng
In this paper, we design experiments to verify such misfit between global re-ID feature distances and local matching in tracking, and propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT.
1 code implementation • 3 Dec 2021 • Yuchi Liu, Zhongdao Wang, Tom Gedeon, Liang Zheng
To this end, we develop a protocol to automatically synthesize large scale MiE training data that allow us to train improved recognition models for real-world test data.
1 code implementation • NeurIPS 2021 • Zhongdao Wang, Hengshuang Zhao, Ya-Li Li, Shengjin Wang, Philip H. S. Torr, Luca Bertinetto
We show how most tracking tasks can be solved within this framework, and that the same appearance model can be successfully used to obtain results that are competitive against specialised methods for most of the tasks considered.
Ranked #2 on Video Object Segmentation on DAVIS 2017 (mIoU metric)
Multi-Object Tracking Multi-Object Tracking and Segmentation +10
no code implementations • 30 Jun 2021 • Yuchi Liu, Zhongdao Wang, Xiangxin Zhou, Liang Zheng
We show that compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques.
no code implementations • ECCV 2020 • Zhongdao Wang, Jingwei Zhang, Liang Zheng, Yixuan Liu, Yifan Sun, Ya-Li Li, Shengjin Wang
This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem, where existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering.
14 code implementations • CVPR 2020 • Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, Yichen Wei
This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity $s_p$ and minimize the between-class similarity $s_n$.
Ranked #1 on Face Verification on IJB-C (training dataset metric)
1 code implementation • 27 Nov 2019 • Yunzhong Hou, Liang Zheng, Zhongdao Wang, Shengjin Wang
Due to the continuity of target trajectories, tracking systems usually restrict their data association within a local neighborhood.
12 code implementations • ECCV 2020 • Zhongdao Wang, Liang Zheng, Yixuan Liu, Ya-Li Li, Shengjin Wang
In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model.
Ranked #4 on Multi-Object Tracking on HiEve
no code implementations • 4 Aug 2019 • Lanqing He, Zhongdao Wang, Ya-Li Li, Shengjin Wang
The softmax loss and its variants are widely used as objectives for embedding learning, especially in applications like face recognition.
4 code implementations • CVPR 2019 • Zhongdao Wang, Liang Zheng, Ya-Li Li, Shengjin Wang
The key idea is that we find the local context in the feature space around an instance (face) contains rich information about the linkage relationship between this instance and its neighbors.
no code implementations • 31 Oct 2018 • Zhongdao Wang, Liang Zheng, Shengjin Wang
That is to say, for some queries, a feature may be neither discriminative nor complementary to existing ones, while for other queries, the feature suffices.
no code implementations • ICCV 2017 • Zhongdao Wang, Luming Tang, Xihui Liu, Zhuliang Yao, Shuai Yi, Jing Shao, Junjie Yan, Shengjin Wang, Hongsheng Li, Xiaogang Wang
In our vehicle ReID framework, an orientation invariant feature embedding module and a spatial-temporal regularization module are proposed.