1 code implementation • 18 Apr 2024 • Jin Gao, Shubo Lin, Shaoru Wang, Yutong Kou, Zeming Li, Liang Li, Congxuan Zhang, Xiaoqin Zhang, Yizheng Wang, Weiming Hu
In this paper, we question if the extremely simple ViTs' fine-tuning performance with a small-scale architecture can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-established lightweight architecture design methodology with sophisticated components introduced.
no code implementations • 6 Mar 2024 • Peng Dai, Yang Zhang, Tao Liu, Zhen Fan, Tianyuan Du, Zhuo Su, Xiaozheng Zheng, Zeming Li
It is especially challenging to achieve real-time human motion tracking on a standalone VR Head-Mounted Display (HMD) such as Meta Quest and PICO.
no code implementations • 30 Jun 2023 • Weixin Mao, Jinrong Yang, Zheng Ge, Lin Song, HongYu Zhou, Tiezheng Mao, Zeming Li, Osamu Yoshie
In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for improving depth perception in 3D object detection.
1 code implementation • NeurIPS 2021 • Lin Song, Songyang Zhang, Songtao Liu, Zeming Li, Xuming He, Hongbin Sun, Jian Sun, Nanning Zheng
Specifically, we propose a Dynamic Grained Encoder for vision transformers, which can adaptively assign a suitable number of queries to each spatial region.
no code implementations • 3 Dec 2022 • En Yu, Songtao Liu, Zhuoling Li, Jinrong Yang, Zeming Li, Shoudong Han, Wenbing Tao
VLM joints the information in the generated visual prompts and the textual prompts from a pre-defined Trackbook to obtain instance-level pseudo textual description, which is domain invariant to different tracking scenes.
2 code implementations • ICCV 2023 • HongYu Zhou, Zheng Ge, Zeming Li, Xiangyu Zhang
This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT.
Ranked #2 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU lane - 224x480 - 100x100 at 0.5 metric)
3 code implementations • 21 Sep 2022 • Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, Zeming Li
To this end, we introduce an effective temporal stereo method to dynamically select the scale of matching candidates, enable to significantly reduce computation overhead.
Ranked #11 on 3D Object Detection on nuScenes Camera Only
no code implementations • 23 Aug 2022 • Jinrong Yang, En Yu, Zeming Li, Xiaoping Li, Wenbing Tao
Recent advanced works generally employ a series of object attributes, e. g., position, size, velocity, and appearance, to provide the clues for the association in 3D MOT.
no code implementations • 22 Aug 2022 • Zengran Wang, Chen Min, Zheng Ge, Yinhao Li, Zeming Li, Hongyu Yang, Di Huang
Instead of using a sole monocular depth method, in this work, we propose a novel Surround-view Temporal Stereo (STS) technique that leverages the geometry correspondence between frames across time to facilitate accurate depth learning.
no code implementations • 19 Aug 2022 • HongYu Zhou, Zheng Ge, Weixin Mao, Zeming Li
To address this problem, we revisit the generation of BEV representation and propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling.
1 code implementation • 22 Jul 2022 • Jinrong Yang, Lin Song, Songtao Liu, Weixin Mao, Zeming Li, Xiaoping Li, Hongbin Sun, Jian Sun, Nanning Zheng
Many point-based 3D detectors adopt point-feature sampling strategies to drop some points for efficient inference.
no code implementations • 21 Jul 2022 • Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun
In this paper, we explore the performance of real time models on this metric and endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
no code implementations • 13 Jul 2022 • Shaoru Wang, Zeming Li, Jin Gao, Liang Li, Weiming Hu
However, when facing various resource budgets in real-world applications, it costs a huge computation burden to pretrain multiple networks of various sizes one by one.
2 code implementations • 6 Jul 2022 • HongYu Zhou, Zheng Ge, Songtao Liu, Weixin Mao, Zeming Li, Haiyan Yu, Jian Sun
To date, the most powerful semi-supervised object detectors (SS-OD) are based on pseudo-boxes, which need a sequence of post-processing with fine-tuned hyper-parameters.
2 code implementations • 21 Jun 2022 • Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, Zeming Li
In this research, we propose a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird's-Eye-View (BEV) 3D object detection.
Ranked #4 on 3D Object Detection on Rope3D
1 code implementation • 1 Jun 2022 • Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia
To this end, the modality-specific space is first designed to represent different inputs in the voxel feature space.
1 code implementation • CVPR 2022 • Yanwei Li, Xiaojuan Qi, Yukang Chen, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia
In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion.
2 code implementations • 28 May 2022 • Shaoru Wang, Jin Gao, Zeming Li, Xiaoqin Zhang, Weiming Hu
We also point out some defects of such pre-training, e. g., failing to benefit from large-scale pre-training data and showing inferior performance on data-insufficient downstream tasks.
1 code implementation • CVPR 2022 • Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun
In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem.
Ranked #1 on Real-Time Object Detection on Argoverse-HD (Full-Stack, Val) (sAP metric, using extra training data)
2 code implementations • 22 Mar 2022 • Zhisheng Zhong, Jiequan Cui, Zeming Li, Eric Lo, Jian Sun, Jiaya Jia
Given the promising performance of contrastive learning, we propose Rebalanced Siamese Contrastive Mining (ResCom) to tackle imbalanced recognition.
Ranked #5 on Long-tail Learning on CIFAR-10-LT (ρ=10)
1 code implementation • 17 Aug 2021 • Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Yukang Chen, Lu Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia
In particular, Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly.
1 code implementation • 27 Jul 2021 • Songyang Zhang, Lin Song, Songtao Liu, Zheng Ge, Zeming Li, Xuming He, Jian Sun
In this report, we introduce our real-time 2D object detection system for the realistic autonomous driving scenario.
41 code implementations • 18 Jul 2021 • Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, Jian Sun
In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX.
Ranked #1 on Real-Time Object Detection on Argoverse-HD (Detection-Only, Val) (using extra training data)
1 code implementation • CVPR 2021 • Zhibo Fan, Yuchen Ma, Zeming Li, Jian Sun
Recently few-shot object detection is widely adopted to deal with data-limited situations.
no code implementations • CVPR 2021 • Yuchen Ma, Songtao Liu, Zeming Li, Jian Sun
We propose a dense object detector with an instance-wise sampling strategy, named IQDet.
1 code implementation • CVPR 2021 • Songyang Zhang, Zeming Li, Shipeng Yan, Xuming He, Jian Sun
Motivated by our discovery, we propose a unified distribution alignment strategy for long-tail visual recognition.
Ranked #17 on Long-tail Learning on Places-LT
2 code implementations • CVPR 2021 • Zheng Ge, Songtao Liu, Zeming Li, Osamu Yoshie, Jian Sun
Recent advances in label assignment in object detection mainly seek to independently define positive/negative training samples for each ground-truth (gt) object.
Ranked #62 on Object Detection on COCO test-dev
1 code implementation • 19 Jan 2021 • Zeming Li, Songtao Liu, Jian Sun
The teacher's weight is a momentum update of the student, and the teacher's BN statistics is a momentum update of those in history.
1 code implementation • NeurIPS 2020 • Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Xiangyu Zhang, Hongbin Sun, Jian Sun, Nanning Zheng
The Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation.
1 code implementation • NeurIPS 2020 • Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng
To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation.
1 code implementation • CVPR 2021 • JianFeng Wang, Lin Song, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng
Mainstream object detectors based on the fully convolutional network has achieved impressive performance.
6 code implementations • CVPR 2021 • Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia
In this paper, we present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN.
Ranked #1 on Panoptic Segmentation on COCO minival (SQ metric)
no code implementations • 27 Nov 2020 • Songtao Liu, Zeming Li, Jian Sun
Our Faster R-CNN (ResNet50-FPN) baseline achieves 39. 8% mAP on COCO, which is on par with the state of the art self-supervised methods pre-trained on ImageNet.
no code implementations • 6 Oct 2020 • Zeming Li, Yuchen Ma, Yukang Chen, Xiangyu Zhang, Jian Sun
In this report, we present our object detection/instance segmentation system, MegDetV2, which works in a two-pass fashion, first to detect instances then to obtain segmentation.
1 code implementation • 5 Oct 2020 • Benjin Zhu, Junqiang Huang, Zeming Li, Xiangyu Zhang, Jian Sun
In this paper, we propose EqCo (Equivalent Rules for Contrastive Learning) to make self-supervised learning irrelevant to the number of negative samples in the contrastive learning framework.
2 code implementations • ECCV 2020 • Han Qiu, Yuchen Ma, Zeming Li, Songtao Liu, Jian Sun
In this paper, We propose a simple and efficient operator called Border-Align to extract "border features" from the extreme point of the border to enhance the point feature.
2 code implementations • 7 Jul 2020 • Benjin Zhu, Jian-Feng Wang, Zhengkai Jiang, Fuhang Zong, Songtao Liu, Zeming Li, Jian Sun
During training, to both satisfy the prior distribution of data and adapt to category characteristics, we present Center Weighting to adjust the category-specific prior distributions.
4 code implementations • 26 Apr 2020 • Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu Qi, Jian Sun, Jiaya Jia
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
1 code implementation • CVPR 2020 • Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun
To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space.
1 code implementation • NeurIPS 2019 • Lin Song, Yanwei Li, Zeming Li, Gang Yu, Hongbin Sun, Jian Sun, Nanning Zheng
To this end, tree filtering modules are embedded to formulate a unified framework for semantic segmentation.
3 code implementations • 26 Aug 2019 • Benjin Zhu, Zhengkai Jiang, Xiangxin Zhou, Zeming Li, Gang Yu
This report presents our method which wins the nuScenes3D Detection Challenge [17] held in Workshop on Autonomous Driving(WAD, CVPR 2019).
Ranked #5 on 3D Object Detection on nuScenes LiDAR only
3 code implementations • 28 Mar 2019 • Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun
In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet.
Ranked #15 on Object Detection on PASCAL VOC 2007
no code implementations • ECCV 2018 • Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun
(1) Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales.
no code implementations • NeurIPS 2018 • Tong Yang, Xiangyu Zhang, Zeming Li, Wenqiang Zhang, Jian Sun
We propose a novel and flexible anchor mechanism named MetaAnchor for object detection frameworks.
2 code implementations • 17 Apr 2018 • Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun
Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection.
6 code implementations • CVPR 2018 • Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun
The improvements in recent CNN-based object detection works, from R-CNN [11], Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly come from new network, new framework, or novel loss design.
5 code implementations • 20 Nov 2017 • Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun
More importantly, simply replacing the backbone with a tiny network (e. g, Xception), our Light-Head R-CNN gets 30. 7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy.