no code implementations • 18 Feb 2025 • Jingtong Yue, Zhiwei Lin, Xin Lin, Xiaoyu Zhou, Xiangtai Li, Lu Qi, Yongtao Wang, Ming-Hsuan Yang
Specifically, we design a 3D Gaussian Expansion (3DGE) module to mitigate inaccuracies in radar points, including position, Radar Cross-Section (RCS), and velocity.
no code implementations • 7 Feb 2025 • Xiaoyu Zhou, Jingqi Wang, Yongtao Wang, Yufei Wei, Nan Dong, Ming-Hsuan Yang
Obtaining semantic 3D occupancy from raw sensor data without manual annotations remains an essential yet challenging task.
1 code implementation • 26 Nov 2024 • Zhongyu Xia, Jishuo Li, Zhiwei Lin, Xinhao Wang, Yongtao Wang, Ming-Hsuan Yang
Moreover, we propose a vision-centric 3D open-world object detection baseline and further introduce an ensemble method by fusing general and specialized models to address the issue of lower precision in existing open-world methods for the OpenAD benchmark.
1 code implementation • 15 Oct 2024 • Zhiwei Lin, Hongbo Jin, Yongtao Wang, Yufei Wei, Nan Dong
In addition, the proposed temporal enhancement branch is a plug-and-play module that can be easily integrated into existing occupancy prediction methods to improve the performance of occupancy prediction.
no code implementations • 8 Oct 2024 • Zhiwei Lin, Yongtao Wang, Zhi Tang
Without additional training, we connect these two generalized models with attention maps as the prompts.
no code implementations • 8 Sep 2024 • Zhiwei Lin, Zhe Liu, Yongtao Wang, Le Zhang, Ce Zhu
Secondly, the CAMF module utilizes a deformable attention mechanism to align radar and camera BEV features and adopts channel and spatial fusion layers to fuse them.
Ranked #1 on
3D Object Detection
on nuscenes Camera-Radar
1 code implementation • 28 Aug 2024 • ZhiHao Lin, Yongtao Wang, Jinhe Zhang, Xiaojie Chu, Haibin Ling
To address this gap, we propose a novel neural architecture search scheme for binary neural networks, named NAS-BNN.
1 code implementation • 3 Apr 2024 • Zhongyu Xia, Zhiwei Lin, Xinhao Wang, Yongtao Wang, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang
Three-dimensional perception from multi-view cameras is a crucial component in autonomous driving systems, which involves multiple tasks like 3D object detection and bird's-eye-view (BEV) semantic segmentation.
1 code implementation • CVPR 2024 • Zhiwei Lin, Zhe Liu, Zhongyu Xia, Xinhao Wang, Yongtao Wang, Shengxiang Qi, Yang Dong, Nan Dong, Le Zhang, Ce Zhu
In the dual-stream radar backbone, a point-based encoder and a transformer-based encoder are proposed to extract radar features, with an injection and extraction module to facilitate communication between the two encoders.
Ranked #4 on
3D Object Detection (RoI)
on View-of-Delft (val)
no code implementations • 11 Feb 2024 • Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation.
1 code implementation • CVPR 2024 • Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes.
no code implementations • ICCV 2023 • Xinzhu Ma, Yongtao Wang, Yinmin Zhang, Zhiyi Xia, Yuan Meng, Zhihui Wang, Haojie Li, Wanli Ouyang
In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection.
no code implementations • ICCV 2023 • Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
Recent novel view synthesis methods obtain promising results for relatively small scenes, e. g., indoor environments and scenes with a few objects, but tend to fail for unbounded outdoor scenes with a single image as input.
no code implementations • 19 Aug 2023 • Xiaoyu Ye, Hao Huang, Jiaqi An, Yongtao Wang
Stable Diffusion (SD) customization approaches enable users to personalize SD model outputs, greatly enhancing the flexibility and diversity of AI art.
1 code implementation • CVPR 2023 • ZhiHao Lin, Yongtao Wang, Jinhe Zhang, Xiaojie Chu
We also present a novel optimization strategy with an exiting criterion based on the detection losses for our dynamic detectors.
no code implementations • 23 Mar 2023 • Ziwei Liu, Yongtao Wang, Xiaojie Chu
Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model.
1 code implementation • 12 Dec 2022 • Zhiwei Lin, Yongtao Wang, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang
Based on the property of outdoor point clouds in autonomous driving scenarios, i. e., the point clouds of distant objects are more sparse, we propose point density prediction to enable the 3D encoder to learn location information, which is essential for object detection.
1 code implementation • CVPR 2023 • Hao Huang, Ziyan Chen, Huanran Chen, Yongtao Wang, Kevin Zhang
Then, we analogize patch optimization with regular model optimization, proposing a series of self-ensemble approaches on the input data, the attacked model, and the adversarial patch to efficiently make use of the limited information and prevent the patch from overfitting.
1 code implementation • 24 Oct 2022 • Zhiwei Lin, Zengyu Yang, Yongtao Wang
Firstly, we present a foreground guidance strategy with an off-the-shelf UOD detector to highlight the foreground regions on the feature maps and then refine object locations in an iterative fashion.
no code implementations • CVPR 2023 • Xuanyang Zhang, Yonggang Li, Xiangyu Zhang, Yongtao Wang, Jian Sun
Differentiable architecture search (DARTS) has significantly promoted the development of NAS techniques because of its high search efficiency and effectiveness but suffers from performance collapse.
Ranked #13 on
Neural Architecture Search
on NAS-Bench-201, CIFAR-10
1 code implementation • 4 Jul 2022 • Zhiwei Lin, TingTing Liang, Taihong Xiao, Yongtao Wang, Zhi Tang, Ming-Hsuan Yang
To address this issue, we propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.
1 code implementation • 30 May 2022 • Kaicheng Yu, Tang Tao, Hongwei Xie, Zhiwei Lin, Zhongwei Wu, Zhongyu Xia, TingTing Liang, Haiyang Sun, Jiong Deng, Dayang Hao, Yongtao Wang, Xiaodan Liang, Bing Wang
There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR.
2 code implementations • 27 May 2022 • TingTing Liang, Hongwei Xie, Kaicheng Yu, Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Tao Tang, Bing Wang, Zhi Tang
Fusing the camera and LiDAR information has become a de-facto standard for 3D object detection tasks.
1 code implementation • 6 Apr 2022 • Xiaojie Chu, Yongtao Wang
By combining the proposed IterVM with iterative language modeling module, we further propose a powerful scene text recognizer called IterNet.
2 code implementations • 13 Mar 2022 • Xiaojie Chu, Yongtao Wang, Chunhua Shen, Jingdong Chen, Wei Chu
The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models.
1 code implementation • 5 Jul 2021 • Zhiwei Lin, Yongtao Wang, Hongxiang Lin
In this paper, we make the first attempt to tackle the catastrophic forgetting problem in the mainstream self-supervised methods, i. e., contrastive learning methods.
4 code implementations • 1 Jul 2021 • TingTing Liang, Xiaojie Chu, Yudong Liu, Yongtao Wang, Zhi Tang, Wei Chu, Jingdong Chen, Haibin Ling
With multi-scale testing, we push the current best single model result to a new record of 60. 1% box AP and 52. 3% mask AP without using extra training data.
Ranked #2 on
Instance Segmentation
on COCO test-dev
(using extra training data)
1 code implementation • 23 May 2021 • Hao Huang, Yongtao Wang, Zhaoyu Chen, Yuze Zhang, Yuheng Li, Zhi Tang, Wei Chu, Jingdong Chen, Weisi Lin, Kai-Kuang Ma
Then, we design a two-level perturbation fusion strategy to alleviate the conflict between the adversarial watermarks generated by different facial images and models.
1 code implementation • 23 Mar 2021 • Hao Huang, Yongtao Wang, Zhaoyu Chen, Zhi Tang, Wenqiang Zhang, Kai-Kuang Ma
Firstly, we propose a patch selection and refining scheme to find the pixels which have the greatest importance for attack and remove the inconsequential perturbations gradually.
1 code implementation • CVPR 2021 • TingTing Liang, Yongtao Wang, Zhi Tang, Guosheng Hu, Haibin Ling
Encouraged by the success, we propose a novel One-Shot Path Aggregation Network Architecture Search (OPANAS) algorithm, which significantly improves both searching efficiency and detection accuracy.
1 code implementation • 15 Sep 2020 • Jianwei Li, Yongtao Wang, Haihua Xie, Kai-Kuang Ma
Our proposed network is a single model approach that can be trained for handling a wide range of quality factors while consistently delivering superior or comparable image artifacts removal performance.
1 code implementation • 27 May 2020 • Zhuoying Wang, Yongtao Wang, Zhi Tang, Yangyan Li, Ying Chen, Haibin Ling, Weisi Lin
Existing CNN-based methods for pixel labeling heavily depend on multi-scale features to meet the requirements of both semantic comprehension and detail preservation.
1 code implementation • ECCV 2020 • Yonggang Li, Guosheng Hu, Yongtao Wang, Timothy Hospedales, Neil M. Robertson, Yongxin Yang
In this paper, we propose Differentiable Automatic Data Augmentation (DADA) which dramatically reduces the cost.
Ranked #15 on
Data Augmentation
on ImageNet
no code implementations • 19 Jan 2020 • Kaiyu Shan, Yongtao Wang, Zhuoying Wang, TingTing Liang, Zhi Tang, Ying Chen, Yangyan Li
To efficiently extract spatiotemporal features of video for action recognition, most state-of-the-art methods integrate 1D temporal convolution into a conventional 2D CNN backbone.
no code implementations • 20 Dec 2019 • Ting-Ting Liang, Yongtao Wang, Qijie Zhao, huan zhang, Zhi Tang, Haibin Ling
Feature pyramids are widely exploited in many detectors to solve the scale variation problem for object detection.
6 code implementations • 9 Sep 2019 • Yudong Liu, Yongtao Wang, Siwei Wang, Ting-Ting Liang, Qijie Zhao, Zhi Tang, Haibin Ling
In existing CNN based detectors, the backbone network is a very important component for basic feature extraction, and the performance of the detectors highly depends on it.
Ranked #49 on
Instance Segmentation
on COCO test-dev
11 code implementations • 12 Nov 2018 • Qijie Zhao, Tao Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai, Haibin Ling
Finally, we gather up the decoder layers with equivalent scales (sizes) to develop a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels.
Ranked #146 on
Object Detection
on COCO test-dev
no code implementations • 31 Jul 2018 • Qijie Zhao, Feng Ni, Yang song, Yongtao Wang, Zhi Tang
Specifically, a synthesizing method was proposed to generate well-annotated images containing barcode and QR code labels, which contributes to largely decrease the annotation time.
1 code implementation • 26 Jun 2018 • Qijie Zhao, Tao Sheng, Yongtao Wang, Feng Ni, Ling Cai
The ability to detect small objects and the speed of the object detector are very important for the application of autonomous driving, and in this paper, we propose an effective yet efficient one-stage detector, which gained the second place in the Road Object Detection competition of CVPR2018 workshop - Workshop of Autonomous Driving(WAD).
no code implementations • ICCV 2017 • Yuan Liao, Xiaoqing Lu, Chengcui Zhang, Yongtao Wang, Zhi Tang
Mutual enhancement is also included in our frame propagation mechanism that improves logo detection by utilizing the continuity of logos across frames.