Search Results for author: Zeming Li

Found 47 papers, 34 papers with code

Observation, Analysis, and Solution: Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training

1 code implementation18 Apr 2024 Jin Gao, Shubo Lin, Shaoru Wang, Yutong Kou, Zeming Li, Liang Li, Congxuan Zhang, Xiaoqin Zhang, Yizheng Wang, Weiming Hu

In this paper, we question if the extremely simple ViTs' fine-tuning performance with a small-scale architecture can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-established lightweight architecture design methodology with sophisticated components introduced.

Contrastive Learning Image Classification +2

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

no code implementations6 Mar 2024 Peng Dai, Yang Zhang, Tao Liu, Zhen Fan, Tianyuan Du, Zhuo Su, Xiaozheng Zheng, Zeming Li

It is especially challenging to achieve real-time human motion tracking on a standalone VR Head-Mounted Display (HMD) such as Meta Quest and PICO.

PICO

GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection

no code implementations30 Jun 2023 Weixin Mao, Jinrong Yang, Zheng Ge, Lin Song, HongYu Zhou, Tiezheng Mao, Zeming Li, Osamu Yoshie

In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for improving depth perception in 3D object detection.

3D Object Detection Depth Estimation +3

Dynamic Grained Encoder for Vision Transformers

1 code implementation NeurIPS 2021 Lin Song, Songyang Zhang, Songtao Liu, Zeming Li, Xuming He, Hongbin Sun, Jian Sun, Nanning Zheng

Specifically, we propose a Dynamic Grained Encoder for vision transformers, which can adaptively assign a suitable number of queries to each spatial region.

Image Classification Language Modelling +2

Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

no code implementations3 Dec 2022 En Yu, Songtao Liu, Zhuoling Li, Jinrong Yang, Zeming Li, Shoudong Han, Wenbing Tao

VLM joints the information in the generated visual prompts and the textual prompts from a pre-defined Trackbook to obtain instance-level pseudo textual description, which is domain invariant to different tracking scenes.

Domain Generalization Multi-Object Tracking +1

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception

2 code implementations ICCV 2023 HongYu Zhou, Zheng Ge, Zeming Li, Xiangyu Zhang

This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT.

Ranked #2 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU lane - 224x480 - 100x100 at 0.5 metric)

Autonomous Driving Bird's-Eye View Semantic Segmentation +2

BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo

3 code implementations21 Sep 2022 Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, Zeming Li

To this end, we introduce an effective temporal stereo method to dynamically select the scale of matching candidates, enable to significantly reduce computation overhead.

3D Object Detection Depth Estimation +1

Quality Matters: Embracing Quality Clues for Robust 3D Multi-Object Tracking

no code implementations23 Aug 2022 Jinrong Yang, En Yu, Zeming Li, Xiaoping Li, Wenbing Tao

Recent advanced works generally employ a series of object attributes, e. g., position, size, velocity, and appearance, to provide the clues for the association in 3D MOT.

3D Multi-Object Tracking 3D Object Detection +2

STS: Surround-view Temporal Stereo for Multi-view 3D Detection

no code implementations22 Aug 2022 Zengran Wang, Chen Min, Zheng Ge, Yinhao Li, Zeming Li, Hongyu Yang, Di Huang

Instead of using a sole monocular depth method, in this work, we propose a novel Surround-view Temporal Stereo (STS) technique that leverages the geometry correspondence between frames across time to facilitate accurate depth learning.

3D Object Detection Depth Estimation +4

PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View

no code implementations19 Aug 2022 HongYu Zhou, Zheng Ge, Weixin Mao, Zeming Li

To address this problem, we revisit the generation of BEV representation and propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling.

Autonomous Driving object-detection +1

DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection

1 code implementation22 Jul 2022 Jinrong Yang, Lin Song, Songtao Liu, Weixin Mao, Zeming Li, Xiaoping Li, Hongbin Sun, Jian Sun, Nanning Zheng

Many point-based 3D detectors adopt point-feature sampling strategies to drop some points for efficient inference.

3D Object Detection object-detection

StreamYOLO: Real-time Object Detection for Streaming Perception

no code implementations21 Jul 2022 Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun

In this paper, we explore the performance of real time models on this metric and endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.

Autonomous Driving Object +2

DSPNet: Towards Slimmable Pretrained Networks based on Discriminative Self-supervised Learning

no code implementations13 Jul 2022 Shaoru Wang, Zeming Li, Jin Gao, Liang Li, Weiming Hu

However, when facing various resource budgets in real-world applications, it costs a huge computation burden to pretrain multiple networks of various sizes one by one.

Knowledge Distillation Self-Supervised Learning

Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection

2 code implementations6 Jul 2022 HongYu Zhou, Zheng Ge, Songtao Liu, Weixin Mao, Zeming Li, Haiyan Yu, Jian Sun

To date, the most powerful semi-supervised object detectors (SS-OD) are based on pseudo-boxes, which need a sequence of post-processing with fine-tuned hyper-parameters.

object-detection Object Detection +2

BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection

2 code implementations21 Jun 2022 Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, Zeming Li

In this research, we propose a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird's-Eye-View (BEV) 3D object detection.

3D Object Detection Depth Estimation +1

Unifying Voxel-based Representation with Transformer for 3D Object Detection

1 code implementation1 Jun 2022 Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia

To this end, the modality-specific space is first designed to represent different inputs in the voxel feature space.

3D Object Detection Object +3

Voxel Field Fusion for 3D Object Detection

1 code implementation CVPR 2022 Yanwei Li, Xiaojuan Qi, Yukang Chen, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion.

3D Object Detection Data Augmentation +2

A Closer Look at Self-Supervised Lightweight Vision Transformers

2 code implementations28 May 2022 Shaoru Wang, Jin Gao, Zeming Li, Xiaoqin Zhang, Weiming Hu

We also point out some defects of such pre-training, e. g., failing to benefit from large-scale pre-training data and showing inferior performance on data-insufficient downstream tasks.

Contrastive Learning Image Classification +1

Real-time Object Detection for Streaming Perception

1 code implementation CVPR 2022 Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun

In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem.

 Ranked #1 on Real-Time Object Detection on Argoverse-HD (Full-Stack, Val) (sAP metric, using extra training data)

Autonomous Driving Object +2

Rebalanced Siamese Contrastive Mining for Long-Tailed Recognition

2 code implementations22 Mar 2022 Zhisheng Zhong, Jiequan Cui, Zeming Li, Eric Lo, Jian Sun, Jiaya Jia

Given the promising performance of contrastive learning, we propose Rebalanced Siamese Contrastive Mining (ResCom) to tackle imbalanced recognition.

Contrastive Learning Long-tail Learning +1

Fully Convolutional Networks for Panoptic Segmentation with Point-based Supervision

1 code implementation17 Aug 2021 Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Yukang Chen, Lu Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In particular, Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly.

Panoptic Segmentation Segmentation +1

YOLOX: Exceeding YOLO Series in 2021

41 code implementations18 Jul 2021 Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, Jian Sun

In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX.

Autonomous Driving Real-Time Object Detection

OTA: Optimal Transport Assignment for Object Detection

2 code implementations CVPR 2021 Zheng Ge, Songtao Liu, Zeming Li, Osamu Yoshie, Jian Sun

Recent advances in label assignment in object detection mainly seek to independently define positive/negative training samples for each ground-truth (gt) object.

Object object-detection +1

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

1 code implementation19 Jan 2021 Zeming Li, Songtao Liu, Jian Sun

The teacher's weight is a momentum update of the student, and the teacher's BN statistics is a momentum update of those in history.

Self-Supervised Learning

Fine-Grained Dynamic Head for Object Detection

1 code implementation NeurIPS 2020 Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng

To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation.

Object object-detection +1

Fully Convolutional Networks for Panoptic Segmentation

6 code implementations CVPR 2021 Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In this paper, we present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN.

Panoptic Segmentation Segmentation

Self-EMD: Self-Supervised Object Detection without ImageNet

no code implementations27 Nov 2020 Songtao Liu, Zeming Li, Jian Sun

Our Faster R-CNN (ResNet50-FPN) baseline achieves 39. 8% mAP on COCO, which is on par with the state of the art self-supervised methods pre-trained on ImageNet.

Object object-detection +2

Joint COCO and Mapillary Workshop at ICCV 2019: COCO Instance Segmentation Challenge Track

no code implementations6 Oct 2020 Zeming Li, Yuchen Ma, Yukang Chen, Xiangyu Zhang, Jian Sun

In this report, we present our object detection/instance segmentation system, MegDetV2, which works in a two-pass fashion, first to detect instances then to obtain segmentation.

Instance Segmentation object-detection +3

EqCo: Equivalent Rules for Self-supervised Contrastive Learning

1 code implementation5 Oct 2020 Benjin Zhu, Junqiang Huang, Zeming Li, Xiangyu Zhang, Jian Sun

In this paper, we propose EqCo (Equivalent Rules for Contrastive Learning) to make self-supervised learning irrelevant to the number of negative samples in the contrastive learning framework.

Contrastive Learning Self-Supervised Learning

BorderDet: Border Feature for Dense Object Detection

2 code implementations ECCV 2020 Han Qiu, Yuchen Ma, Zeming Li, Songtao Liu, Jian Sun

In this paper, We propose a simple and efficient operator called Border-Align to extract "border features" from the extreme point of the border to enhance the point feature.

Dense Object Detection Object +1

AutoAssign: Differentiable Label Assignment for Dense Object Detection

2 code implementations7 Jul 2020 Benjin Zhu, Jian-Feng Wang, Zhengkai Jiang, Fuhang Zong, Songtao Liu, Zeming Li, Jian Sun

During training, to both satisfy the prior distribution of data and adapt to category characteristics, we present Center Weighting to adjust the category-specific prior distributions.

Dense Object Detection Object +1

Dynamic Scale Training for Object Detection

4 code implementations26 Apr 2020 Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu Qi, Jian Sun, Jiaya Jia

We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.

Instance Segmentation Model Optimization +4

Learning Dynamic Routing for Semantic Segmentation

1 code implementation CVPR 2020 Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun

To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space.

Segmentation Semantic Segmentation

Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection

3 code implementations26 Aug 2019 Benjin Zhu, Zhengkai Jiang, Xiangxin Zhou, Zeming Li, Gang Yu

This report presents our method which wins the nuScenes3D Detection Challenge [17] held in Workshop on Autonomous Driving(WAD, CVPR 2019).

3D Object Detection Autonomous Driving +1

ThunderNet: Towards Real-time Generic Object Detection

3 code implementations28 Mar 2019 Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun

In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet.

Object object-detection +1

DetNet: Design Backbone for Object Detection

no code implementations ECCV 2018 Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

(1) Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales.

Classification General Classification +7

DetNet: A Backbone network for Object Detection

2 code implementations17 Apr 2018 Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection.

Classification General Classification +7

MegDet: A Large Mini-Batch Object Detector

6 code implementations CVPR 2018 Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun

The improvements in recent CNN-based object detection works, from R-CNN [11], Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly come from new network, new framework, or novel loss design.

Object object-detection +1

Light-Head R-CNN: In Defense of Two-Stage Object Detector

5 code implementations20 Nov 2017 Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

More importantly, simply replacing the backbone with a tiny network (e. g, Xception), our Light-Head R-CNN gets 30. 7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy.

Vocal Bursts Valence Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.