Search Results for author: Hongyang Li

Found 76 papers, 56 papers with code

TAPTR: Tracking Any Point with Transformers as Detection

no code implementations • 19 Mar 2024 • Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang

Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP.

object-detection Object Detection +2

Paper
Add Code

SparseFusion: Efficient Sparse Multi-Modal Fusion Framework for Long-Range 3D Perception

no code implementations • 15 Mar 2024 • Yiheng Li, Hongyang Li, Zehao Huang, Hong Chang, Naiyan Wang

The versatility of SparseFusion is also validated in the temporal object detection task and 3D lane detection task.

3D Lane Detection 3D Object Detection +1

Paper
Add Code

Generalized Predictive Model for Autonomous Driving

1 code implementation • 14 Mar 2024 • Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li

In this paper, we introduce the first large-scale video prediction model in the autonomous driving discipline.

Autonomous Driving Video Prediction

357

Paper
Code

FastMAC: Stochastic Spectral Sampling of Correspondence Graph

1 code implementation • 13 Mar 2024 • Yifei Zhang, Hao Zhao, Hongyang Li, Siheng Chen

As such, the core of our method is the stochastic spectral sampling of correspondence graph.

Point Cloud Registration

Paper
Code

Embodied Understanding of Driving Scenarios

1 code implementation • 7 Mar 2024 • Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li

Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents' understanding of driving scenes with large spatial and temporal spans.

Autonomous Driving Language Modelling +1

Paper
Code

Enhancing Generalization in Medical Visual Question Answering Tasks via Gradient-Guided Model Perturbation

no code implementations • 5 Mar 2024 • Gang Liu, Hongyang Li, Zerui He, Shenjun Zhong

In this paper, we introduce a method that incorporates gradient-guided parameter perturbations to the visual encoder of the multimodality model during both pre-training and fine-tuning phases, to improve model generalization for downstream medical VQA tasks.

Data Augmentation Medical Visual Question Answering +2

Paper
Add Code

Translating Images to Road Network:A Non-Autoregressive Sequence-to-Sequence Approach

2 code implementations • 13 Feb 2024 • Jiachen Lu, Renyuan Peng, Xinyue Cai, Hang Xu, Hongyang Li, Feng Wen, Wei zhang, Li Zhang

Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non-Euclidean data into an integer series called RoadNet Sequence.

Paper
Code

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

1 code implementation • 25 Jan 2024 • Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).

Segmentation

13,402

Paper
Code

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

1 code implementation • 29 Dec 2023 • Zetong Yang, Li Chen, Yanan sun, Hongyang Li

To resolve this, we bring up a new pre-training task termed as visual point cloud forecasting - predicting future point clouds from historical visual input.

Motion Forecasting

174

Paper
Code

Fully Sparse 3D Occupancy Prediction

3 code implementations • 28 Dec 2023 • Haisong Liu, Yang Chen, Haiguang Wang, Zetong Yang, Tianyu Li, Jia Zeng, Li Chen, Hongyang Li, LiMin Wang

Occupancy prediction plays a pivotal role in autonomous driving.

Autonomous Driving

482

Paper
Code

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

1 code implementation • 26 Dec 2023 • Tianyu Li, Peijin Jia, Bangjun Wang, Li Chen, Kun Jiang, Junchi Yan, Hongyang Li

A map, as crucial information for downstream applications of an autonomous driving system, is usually represented in lanelines or centerlines.

Autonomous Driving

185

Paper
Code

DriveLM: Driving with Graph Visual Question Answering

1 code implementation • 21 Dec 2023 • Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Ping Luo, Andreas Geiger, Hongyang Li

The experiments demonstrate that Graph VQA provides a simple, principled framework for reasoning about a driving scene, and DriveLM-Data provides a challenging benchmark for this task.

Autonomous Driving Question Answering +1

618

Paper
Code

A Survey of Reasoning with Foundation Models

1 code implementation • 17 Dec 2023 • Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.

Medical Diagnosis

340

Paper
Code

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

2 code implementations • 6 Dec 2023 • Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao

With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem.

Autonomous Driving

357

Paper
Code

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation • 5 Dec 2023 • Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

230

Paper
Code

Visual In-Context Prompting

3 code implementations • 22 Nov 2023 • Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.

Segmentation Visual Prompting

1,904

Paper
Code

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

1 code implementation • 2 Nov 2023 • Zhenjie Yang, Xiaosong Jia, Hongyang Li, Junchi Yan

Recently, large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.

Autonomous Driving Few-Shot Learning +1

595

Paper
Code

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

1 code implementation • NeurIPS 2023 • Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li

Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert).

Ranked #6 on 3D Object Detection on nuScenes Camera Only

3D Object Detection object-detection

1,060

Paper
Code

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

1 code implementation • ICCV 2023 • Xiaosong Jia, Yulu Gao, Li Chen, Junchi Yan, Patrick Langechuan Liu, Hongyang Li

We find that even equipped with a SOTA perception model, directly letting the student model learn the required inputs of the teacher model leads to poor driving performance, which comes from the large distribution gap between predicted privileged inputs and the ground-truth.

Ranked #2 on CARLA longest6 on CARLA

Autonomous Driving CARLA longest6

149

Paper
Code

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

no code implementations • ICCV 2023 • Hongyang Li, Hao Zhang, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe Ren, Lei Zhang

Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-pass operation without feature refinement, or ignore depth and lift features by 2D attention mechanisms, which achieve finer semantics while suffering from a depth ambiguity problem.

3D Object Detection object-detection

Paper
Add Code

Density-invariant Features for Distant Point Cloud Registration

2 code implementations • ICCV 2023 • Quan Liu, Hongzi Zhu, Yunsong Zhou, Hongyang Li, Shan Chang, Minyi Guo

Registration of distant outdoor LiDAR point clouds is crucial to extending the 3D vision of collaborative autonomous vehicles, and yet is challenging due to small overlapping area and a huge disparity between observed point densities.

Ranked #1 on Point Cloud Registration on nuScenes (Distant PCR)

Autonomous Vehicles Contrastive Learning +1

Paper
Code

End-to-end Autonomous Driving: Challenges and Frontiers

1 code implementation • 29 Jun 2023 • Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li

The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction.

Autonomous Driving motion prediction

1,389

Paper
Code

detrex: Benchmarking Detection Transformers

1 code implementation • 12 Jun 2023 • Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

1,813

Paper
Code

Scene as Occupancy

2 code implementations • ICCV 2023 • Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li

Human driver can easily describe the complex traffic scene by visual system.

Motion Planning

482

Paper
Code

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

1 code implementation • 25 May 2023 • Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei zhang, Hongyang Li, Yu Qiao, Hao Dong, Zhongjiang He, Peng Gao

In this paper, we propose MUTR, a Multi-modal Unified Temporal transformer for Referring video object segmentation.

Ranked #1 on Referring Expression Segmentation on Referring Expressions for DAVIS 2016 & 2017

Object Referring Expression Segmentation +3

Paper
Code

Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving

1 code implementation • CVPR 2023 • Xiaosong Jia, Penghao Wu, Li Chen, Jiangwei Xie, Conghui He, Junchi Yan, Hongyang Li

End-to-end autonomous driving has made impressive progress in recent years.

Ranked #4 on CARLA longest6 on CARLA

Autonomous Driving CARLA longest6 +1

175

Paper
Code

A Strong and Reproducible Object Detector with Only Public Datasets

2 code implementations • 25 Apr 2023 • Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.

Ranked #5 on Object Detection on COCO minival (using extra training data)

object-detection Object Detection

647

Paper
Code

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

1 code implementation • NeurIPS 2023 • Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, Zhenbo Liu, Bangjun Wang, Peijin Jia, Yuting Wang, Shengyin Jiang, Feng Wen, Hang Xu, Ping Luo, Junchi Yan, Wei zhang, Hongyang Li

Accurately depicting the complex traffic scene is a vital component for autonomous vehicles to execute correct judgments.

3D Lane Detection

483

Paper
Code

Graph-based Topology Reasoning for Driving Scenes

1 code implementation • 11 Apr 2023 • Tianyu Li, Li Chen, Huijie Wang, Yang Li, Jiazhi Yang, Xiangwei Geng, Shengyin Jiang, Yuting Wang, Hang Xu, Chunjing Xu, Junchi Yan, Ping Luo, Hongyang Li

Understanding the road genome is essential to realize autonomous driving.

Ranked #5 on 3D Lane Detection on OpenLane-V2 val

3D Lane Detection Autonomous Driving +1

231

Paper
Code

Detection Transformer with Stable Matching

1 code implementation • ICCV 2023 • Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang

We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR.

Position

175

Paper
Code

Sparse Dense Fusion for 3D Object Detection

no code implementations • 9 Apr 2023 • Yulu Gao, Chonghao Sima, Shaoshuai Shi, Shangzhe Di, Si Liu, Hongyang Li

With the prevalence of multimodal learning, camera-LiDAR fusion has gained popularity in 3D object detection.

3D Object Detection Object +1

Paper
Add Code

Geometric-aware Pretraining for Vision-centric 3D Object Detection

1 code implementation • 6 Apr 2023 • Linyan Huang, Huijie Wang, Jia Zeng, Shengchuan Zhang, Liujuan Cao, Junchi Yan, Hongyang Li

We also conduct experiments on various image backbones and view transformations to validate the efficacy of our approach.

3D Object Detection Autonomous Driving +2

1,061

Paper
Code

3D Data Augmentation for Driving Scenes on Camera

no code implementations • 18 Mar 2023 • Wenwen Tong, Jiangwei Xie, Tianyu Li, Hanming Deng, Xiangwei Geng, Ruoyi Zhou, Dingchen Yang, Bo Dai, Lewei Lu, Hongyang Li

The proposed data augmentation approach contributes to a gain of 1. 7% and 1. 4% in terms of detection accuracy, on Waymo and nuScences respectively.

Autonomous Driving Data Augmentation +1

Paper
Add Code

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

1 code implementation • 13 Mar 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.

object-detection Object Detection

176

Paper
Code

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

1 code implementation • 9 Mar 2023 • Peng Gao, Renrui Zhang, Rongyao Fang, Ziyi Lin, Hongyang Li, Hongsheng Li, Qiao Yu

To alleviate this, previous methods simply replace the pixel reconstruction targets of 75% masked tokens by encoded features from pre-trained image-image (DINO) or image-language (CLIP) contrastive learning.

Contrastive Learning

452

Paper
Code

Introducing Depth into Transformer-based 3D Object Detection

no code implementations • 25 Feb 2023 • Hao Zhang, Hongyang Li, Ailing Zeng, Feng Li, Shilong Liu, Xingyu Liao, Lei Zhang

To address the second issue, we introduce an auxiliary learning task called Depth-aware Negative Suppression loss.

3D Object Detection Auxiliary Learning +3

Paper
Add Code

Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling

1 code implementation • 3 Jan 2023 • Penghao Wu, Li Chen, Hongyang Li, Xiaosong Jia, Junchi Yan, Yu Qiao

Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving.

Autonomous Driving Decision Making

106

Paper
Code

Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach

no code implementations • ICCV 2023 • Jiachen Lu, Renyuan Peng, Xinyue Cai, Hang Xu, Hongyang Li, Feng Wen, Wei zhang, Li Zhang

The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections.

Paper
Add Code

Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection

no code implementations • CVPR 2023 • Jia Zeng, Li Chen, Hanming Deng, Lewei Lu, Junchi Yan, Yu Qiao, Hongyang Li

Specifically, a set of queries are leveraged to locate the instance-level areas for masked feature generation, to intensify feature representation ability in these areas.

3D Object Detection Knowledge Distillation +2

Paper
Add Code

Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR

no code implementations • CVPR 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.

object-detection Object Detection

Paper
Add Code

Planning-oriented Autonomous Driving

1 code implementation • CVPR 2023 • Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li

Oriented at this, we revisit the key components within perception and prediction, and prioritize the tasks such that all these tasks contribute to planning.

Autonomous Driving Philosophy

2,780

Paper
Code

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

2 code implementations • CVPR 2023 • Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai

The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.

Ranked #5 on 3D Object Detection on Rope3D

3D Object Detection

2,842

Paper
Code

Stare at What You See: Masked Image Modeling without Reconstruction

no code implementations • CVPR 2023 • Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo

However, unlike the low-level features such as pixel values, we argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image. This raises one question: is reconstruction necessary in Masked Image Modeling (MIM) with a teacher model?

Paper
Add Code

DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation

1 code implementation • 11 Oct 2022 • Hongyang Li, Jiehong Lin, Kui Jia

Establishment of point correspondence between camera and object coordinate systems is a promising way to solve 6D object poses.

6D Pose Estimation 6D Pose Estimation using RGB +2

Paper
Code

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

2 code implementations • 12 Sep 2022 • Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao

As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.

Autonomous Driving

2,842

Paper
Code

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

1 code implementation • 15 Jul 2022 • Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, Junchi Yan, DaCheng Tao

In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3.

Ranked #7 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU ped - 224x480 - Vis filter. - 100x100 at 0.5 metric)

Autonomous Driving Bird's-Eye View Semantic Segmentation +1

266

Paper
Code

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

1 code implementation • 16 Jun 2022 • Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, Yu Qiao

The two branches are connected so that the control branch receives corresponding guidance from the trajectory branch at each time step.

Ranked #3 on Autonomous Driving on CARLA Leaderboard

Autonomous Driving CARLA longest6 +1

Paper
Code

Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot

no code implementations • 16 Jun 2022 • Li Chen, Tutian Tang, Zhitian Cai, Yang Li, Penghao Wu, Hongyang Li, Jianping Shi, Junchi Yan, Yu Qiao

Equipped with a wide span of sensors, predominant autonomous driving solutions are becoming more modular-oriented for safe system design.

Autonomous Driving

Paper
Add Code

HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding

1 code implementation • 30 Apr 2022 • Xiaosong Jia, Penghao Wu, Li Chen, Yu Liu, Hongyang Li, Junchi Yan

Based on these observations, we propose Heterogeneous Driving Graph Transformer (HDGT), a backbone modelling the driving scene as a heterogeneous graph with different types of nodes and edges.

Autonomous Driving graph construction +2

Paper
Code

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

3 code implementations • 31 Mar 2022 • Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai

In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.

Ranked #2 on Bird's-Eye View Semantic Segmentation on Lyft Level 5

3D Object Detection Autonomous Driving +2

2,842

Paper
Code

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark

2 code implementations • 21 Mar 2022 • Li Chen, Chonghao Sima, Yang Li, Zehan Zheng, Jiajie Xu, Xiangwei Geng, Hongyang Li, Conghui He, Jianping Shi, Yu Qiao, Junchi Yan

Methods for 3D lane detection have been recently proposed to address the issue of inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.).

Ranked #5 on 3D Lane Detection on Apollo Synthetic 3D Lane

3D Lane Detection Autonomous Driving +1

469

Paper
Code

Align Representations With Base: A New Approach to Self-Supervised Learning

no code implementations • CVPR 2022 • Shaofeng Zhang, Lyn Qiu, Feng Zhu, Junchi Yan, Hengrui Zhang, Rui Zhao, Hongyang Li, Xiaokang Yang

Existing symmetric contrastive learning methods suffer from collapses (complete and dimensional) or quadratic complexity of objectives.

Contrastive Learning Self-Supervised Learning

Paper
Add Code

Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space

1 code implementation • NeurIPS 2021 • Jiehong Lin, Hongyang Li, Ke Chen, Jiangbo Lu, Kui Jia

In this paper, we propose a novel design of Sparse Steerable Convolution (SS-Conv) to address the shortcoming; SS-Conv greatly accelerates steerable convolution with sparse tensors, while strictly preserving the property of SE(3)-equivariance.

6D Pose Estimation Pose Tracking

Paper
Code

Monocular 3D Object Detection: An Extrinsic Parameter Free Approach

no code implementations • CVPR 2021 • Yunsong Zhou, Yuan He, Hongzi Zhu, Cheng Wang, Hongyang Li, Qinhong Jiang

Due to the lack of insight in industrial application, existing methods on open datasets neglect the camera pose information, which inevitably results in the detector being susceptible to camera extrinsic parameters.

Ranked #9 on Monocular 3D Object Detection on KITTI Cars Moderate (using extra training data)

Autonomous Driving Monocular 3D Object Detection +2

Paper
Add Code

Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search

no code implementations • CVPR 2021 • Yibo Yang, Shan You, Hongyang Li, Fei Wang, Chen Qian, Zhouchen Lin

Our method enables differentiable sparsification, and keeps the derived architecture equivalent to that of Engine-cell, which further improves the consistency between search and evaluation.

Neural Architecture Search

Paper
Add Code

EnTranNAS: Towards Closing the Gap between the Architectures in Search and Evaluation

no code implementations • 1 Jan 2021 • Yibo Yang, Shan You, Hongyang Li, Fei Wang, Chen Qian, Zhouchen Lin

The Engine-cell is differentiable for architecture search, while the Transit-cell only transits the current sub-graph by architecture derivation.

Neural Architecture Search

Paper
Add Code

Exploring intermediate representation for monocular vehicle pose estimation

1 code implementation • CVPR 2021 • Shichao Li, Zengqiang Yan, Hongyang Li, Kwang-Ting Cheng

The latter question motivates us to incorporate geometry knowledge with a new loss function based on a projective invariant.

Ranked #1 on Vehicle Pose Estimation on KITTI Cars Hard

3D Pose Estimation Representation Learning +1

166

Paper
Code

ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

1 code implementation • NeurIPS 2020 • Yibo Yang, Hongyang Li, Shan You, Fei Wang, Chen Qian, Zhouchen Lin

By doing so, our network for search at each update satisfies the sparsity constraint and is efficient to train.

Neural Architecture Search

Paper
Code

Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation

1 code implementation • ECCV 2020 • Fangyun Wei, Xiao Sun, Hongyang Li, Jingdong Wang, Stephen Lin

A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person.

Instance Segmentation Object +5

Paper
Code

Dynamical System Inspired Adaptive Time Stepping Controller for Residual Network Families

no code implementations • 23 Nov 2019 • Yibo Yang, Jianlong Wu, Hongyang Li, Xia Li, Tiancheng Shen, Zhouchen Lin

We establish a stability condition for ResNets with step sizes and weight parameters, and point out the effects of step sizes on the stability and performance.

Paper
Add Code

SOGNet: Scene Overlap Graph Network for Panoptic Segmentation

1 code implementation • 18 Nov 2019 • Yibo Yang, Hongyang Li, Xia Li, Qijie Zhao, Jianlong Wu, Zhouchen Lin

In order to overcome the lack of supervision, we introduce a differentiable module to resolve the overlap between any pair of instances.

Ranked #8 on Panoptic Segmentation on Cityscapes test

Instance Segmentation Panoptic Segmentation +1

Paper
Code

Deepsleep: Fast and Accurate Delineation of Sleep Arousals at Millisecond Resolution by Deep Learning

1 code implementation • Lancet 2019 • Hongyang Li, Yuanfang Guan

Background: Sleep arousals are transient periods of wakefulness punctuated into sleep.

Ranked #1 on Sleep Arousal Detection on You Snooze You Win - The PhysioNet Computing in Cardiology Challenge 2018

Sleep Arousal Detection Sleep Micro-event detection

Paper
Code

Finding Task-Relevant Features for Few-Shot Learning by Category Traversal

1 code implementation • CVPR 2019 • Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, Xiaogang Wang

Few-shot learning is an important area of research.

Few-Shot Learning Metric Learning

154

Paper
Code

Feature Intertwiner for Object Detection

2 code implementations • ICLR 2019 • Hongyang Li, Bo Dai, Shaoshuai Shi, Wanli Ouyang, Xiaogang Wang

We argue that the reliable set could guide the feature learning of the less reliable set during training - in spirit of student mimicking teacher behavior and thus pushing towards a more compact class centroid in the feature space.

Ranked #134 on Object Detection on COCO test-dev

Object object-detection +1

107

Paper
Code

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

1 code implementation • 5 Nov 2018 • Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, Marcel Prastawa, Esther Alberts, Jana Lipkova, John Freymann, Justin Kirby, Michel Bilello, Hassan Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Benedikt Wiestler, Rivka Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko, Arash Nazeri, Marc-Andre Weber, Abhishek Mahajan, Ujjwal Baid, Elizabeth Gerstner, Dongjin Kwon, Gagan Acharya, Manu Agarwal, Mahbubul Alam, Alberto Albiol, Antonio Albiol, Francisco J. Albiol, Varghese Alex, Nigel Allinson, Pedro H. A. Amorim, Abhijit Amrutkar, Ganesh Anand, Simon Andermatt, Tal Arbel, Pablo Arbelaez, Aaron Avery, Muneeza Azmat, Pranjal B., W Bai, Subhashis Banerjee, Bill Barth, Thomas Batchelder, Kayhan Batmanghelich, Enzo Battistella, Andrew Beers, Mikhail Belyaev, Martin Bendszus, Eze Benson, Jose Bernal, Halandur Nagaraja Bharath, George Biros, Sotirios Bisdas, James Brown, Mariano Cabezas, Shilei Cao, Jorge M. Cardoso, Eric N Carver, Adrià Casamitjana, Laura Silvana Castillo, Marcel Catà, Philippe Cattin, Albert Cerigues, Vinicius S. Chagas, Siddhartha Chandra, Yi-Ju Chang, Shiyu Chang, Ken Chang, Joseph Chazalon, Shengcong Chen, Wei Chen, Jefferson W. Chen, Zhaolin Chen, Kun Cheng, Ahana Roy Choudhury, Roger Chylla, Albert Clérigues, Steven Colleman, Ramiro German Rodriguez Colmeiro, Marc Combalia, Anthony Costa, Xiaomeng Cui, Zhenzhen Dai, Lutao Dai, Laura Alexandra Daza, Eric Deutsch, Changxing Ding, Chao Dong, Shidu Dong, Wojciech Dudzik, Zach Eaton-Rosen, Gary Egan, Guilherme Escudero, Théo Estienne, Richard Everson, Jonathan Fabrizio, Yong Fan, Longwei Fang, Xue Feng, Enzo Ferrante, Lucas Fidon, Martin Fischer, Andrew P. French, Naomi Fridman, Huan Fu, David Fuentes, Yaozong Gao, Evan Gates, David Gering, Amir Gholami, Willi Gierke, Ben Glocker, Mingming Gong, Sandra González-Villá, T. Grosges, Yuanfang Guan, Sheng Guo, Sudeep Gupta, Woo-Sup Han, Il Song Han, Konstantin Harmuth, Huiguang He, Aura Hernández-Sabaté, Evelyn Herrmann, Naveen Himthani, Winston Hsu, Cheyu Hsu, Xiaojun Hu, Xiaobin Hu, Yan Hu, Yifan Hu, Rui Hua, Teng-Yi Huang, Weilin Huang, Sabine Van Huffel, Quan Huo, Vivek HV, Khan M. Iftekharuddin, Fabian Isensee, Mobarakol Islam, Aaron S. Jackson, Sachin R. Jambawalikar, Andrew Jesson, Weijian Jian, Peter Jin, V Jeya Maria Jose, Alain Jungo, B Kainz, Konstantinos Kamnitsas, Po-Yu Kao, Ayush Karnawat, Thomas Kellermeier, Adel Kermi, Kurt Keutzer, Mohamed Tarek Khadir, Mahendra Khened, Philipp Kickingereder, Geena Kim, Nik King, Haley Knapp, Urspeter Knecht, Lisa Kohli, Deren Kong, Xiangmao Kong, Simon Koppers, Avinash Kori, Ganapathy Krishnamurthi, Egor Krivov, Piyush Kumar, Kaisar Kushibar, Dmitrii Lachinov, Tryphon Lambrou, Joon Lee, Chengen Lee, Yuehchou Lee, M Lee, Szidonia Lefkovits, Laszlo Lefkovits, James Levitt, Tengfei Li, Hongwei Li, Hongyang Li, Xiaochuan Li, Yuexiang Li, Heng Li, Zhenye Li, Xiaoyu Li, Zeju Li, Xiaogang Li, Wenqi Li, Zheng-Shen Lin, Fengming Lin, Pietro Lio, Chang Liu, Boqiang Liu, Xiang Liu, Mingyuan Liu, Ju Liu, Luyan Liu, Xavier Llado, Marc Moreno Lopez, Pablo Ribalta Lorenzo, Zhentai Lu, Lin Luo, Zhigang Luo, Jun Ma, Kai Ma, Thomas Mackie, Anant Madabushi, Issam Mahmoudi, Klaus H. Maier-Hein, Pradipta Maji, CP Mammen, Andreas Mang, B. S. Manjunath, Michal Marcinkiewicz, S McDonagh, Stephen McKenna, Richard McKinley, Miriam Mehl, Sachin Mehta, Raghav Mehta, Raphael Meier, Christoph Meinel, Dorit Merhof, Craig Meyer, Robert Miller, Sushmita Mitra, Aliasgar Moiyadi, David Molina-Garcia, Miguel A. B. Monteiro, Grzegorz Mrukwa, Andriy Myronenko, Jakub Nalepa, Thuyen Ngo, Dong Nie, Holly Ning, Chen Niu, Nicholas K Nuechterlein, Eric Oermann, Arlindo Oliveira, Diego D. C. Oliveira, Arnau Oliver, Alexander F. I. Osman, Yu-Nian Ou, Sebastien Ourselin, Nikos Paragios, Moo Sung Park, Brad Paschke, J. Gregory Pauloski, Kamlesh Pawar, Nick Pawlowski, Linmin Pei, Suting Peng, Silvio M. Pereira, Julian Perez-Beteta, Victor M. Perez-Garcia, Simon Pezold, Bao Pham, Ashish Phophalia, Gemma Piella, G. N. Pillai, Marie Piraud, Maxim Pisov, Anmol Popli, Michael P. Pound, Reza Pourreza, Prateek Prasanna, Vesna Prkovska, Tony P. Pridmore, Santi Puch, Élodie Puybareau, Buyue Qian, Xu Qiao, Martin Rajchl, Swapnil Rane, Michael Rebsamen, Hongliang Ren, Xuhua Ren, Karthik Revanuru, Mina Rezaei, Oliver Rippel, Luis Carlos Rivera, Charlotte Robert, Bruce Rosen, Daniel Rueckert, Mohammed Safwan, Mostafa Salem, Joaquim Salvi, Irina Sanchez, Irina Sánchez, Heitor M. Santos, Emmett Sartor, Dawid Schellingerhout, Klaudius Scheufele, Matthew R. Scott, Artur A. Scussel, Sara Sedlar, Juan Pablo Serrano-Rubio, N. Jon Shah, Nameetha Shah, Mazhar Shaikh, B. Uma Shankar, Zeina Shboul, Haipeng Shen, Dinggang Shen, Linlin Shen, Haocheng Shen, Varun Shenoy, Feng Shi, Hyung Eun Shin, Hai Shu, Diana Sima, M Sinclair, Orjan Smedby, James M. Snyder, Mohammadreza Soltaninejad, Guidong Song, Mehul Soni, Jean Stawiaski, Shashank Subramanian, Li Sun, Roger Sun, Jiawei Sun, Kay Sun, Yu Sun, Guoxia Sun, Shuang Sun, Yannick R Suter, Laszlo Szilagyi, Sanjay Talbar, DaCheng Tao, Zhongzhao Teng, Siddhesh Thakur, Meenakshi H Thakur, Sameer Tharakan, Pallavi Tiwari, Guillaume Tochon, Tuan Tran, Yuhsiang M. Tsai, Kuan-Lun Tseng, Tran Anh Tuan, Vadim Turlapov, Nicholas Tustison, Maria Vakalopoulou, Sergi Valverde, Rami Vanguri, Evgeny Vasiliev, Jonathan Ventura, Luis Vera, Tom Vercauteren, C. A. Verrastro, Lasitha Vidyaratne, Veronica Vilaplana, Ajeet Vivekanandan, Qian Wang, Chiatse J. Wang, Wei-Chung Wang, Duo Wang, Ruixuan Wang, Yuanyuan Wang, Chunliang Wang, Guotai Wang, Ning Wen, Xin Wen, Leon Weninger, Wolfgang Wick, Shaocheng Wu, Qiang Wu, Yihong Wu, Yong Xia, Yanwu Xu, Xiaowen Xu, Peiyuan Xu, Tsai-Ling Yang, Xiaoping Yang, Hao-Yu Yang, Junlin Yang, Haojin Yang, Guang Yang, Hongdou Yao, Xujiong Ye, Changchang Yin, Brett Young-Moxon, Jinhua Yu, Xiangyu Yue, Songtao Zhang, Angela Zhang, Kun Zhang, Xue-jie Zhang, Lichi Zhang, Xiaoyue Zhang, Yazhuo Zhang, Lei Zhang, Jian-Guo Zhang, Xiang Zhang, Tianhao Zhang, Sicheng Zhao, Yu Zhao, Xiaomei Zhao, Liang Zhao, Yefeng Zheng, Liming Zhong, Chenhong Zhou, Xiaobing Zhou, Fan Zhou, Hongtu Zhu, Jin Zhu, Ying Zhuge, Weiwei Zong, Jayashree Kalpathy-Cramer, Keyvan Farahani, Christos Davatzikos, Koen van Leemput, Bjoern Menze

This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i. e., 2012-2018.

Brain Tumor Segmentation Survival Prediction +1

Paper
Code

Neural Network Encapsulation

2 code implementations • ECCV 2018 • Hongyang Li, Xiaoyang Guo, Bo Dai, Wanli Ouyang, Xiaogang Wang

Motivated by the routing to make higher capsule have agreement with lower capsule, we extend the mechanism as a compensation for the rapid loss of information in nearby layers.

Paper
Code

Rethinking Feature Discrimination and Polymerization for Large-scale Recognition

1 code implementation • 2 Oct 2017 • Yu Liu, Hongyang Li, Xiaogang Wang

Feature matters.

Clustering Metric Learning

175

Paper
Code

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection

1 code implementation • 13 Sep 2017 • Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang

A key observation is that it is difficult to classify anchors of different sizes with the same set of features.

Ranked #2 on Region Proposal on COCO test-dev

object-detection Object Detection +1

Paper
Code

Recurrent Scale Approximation for Object Detection in CNN

1 code implementation • ICCV 2017 • Yu Liu, Hongyang Li, Junjie Yan, Fangyin Wei, Xiaogang Wang, Xiaoou Tang

To further increase efficiency and accuracy, we (a): design a scale-forecast network to globally predict potential scales in the image since there is no need to compute maps on all levels of the pyramid.

Ranked #3 on Face Detection on Annotated Faces in the Wild

Face Detection Object +2

238

Paper
Code

Learning Deep Features via Congenerous Cosine Loss for Person Recognition

1 code implementation • 22 Feb 2017 • Yu Liu, Hongyang Li, Xiaogang Wang

Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance.

Person Recognition

175

Paper
Code

Zoom Out-and-In Network with Recursive Training for Object Proposal

1 code implementation • 19 Feb 2017 • Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a zoom-out-and-in network for generating object proposals.

Paper
Code

Dual Deep Network for Visual Tracking

1 code implementation • 19 Dec 2016 • Zhizhen Chi, Hongyang Li, Huchuan Lu, Ming-Hsuan Yang

In this paper, we propose a dual network to better utilize features among layers for visual tracking.

Visual Tracking

Paper
Code

Multi-Bias Non-linear Activation in Deep Neural Networks

no code implementations • 3 Apr 2016 • Hongyang Li, Wanli Ouyang, Xiaogang Wang

It provides great flexibility of selecting responses to different visual patterns in different magnitude ranges to form rich representations in higher layers.

Paper
Add Code

Learning Deep Representation With Large-Scale Attributes

no code implementations • ICCV 2015 • Wanli Ouyang, Hongyang Li, Xingyu Zeng, Xiaogang Wang

Experimental results show that the attributes are helpful in learning better features and improving the object detection accuracy by 2. 6% in mAP on the ILSVRC 2014 object detection dataset and 2. 4% in mAP on PASCAL VOC 2007 object detection dataset.

Attribute Clustering +3

Paper
Add Code

LCNN: Low-level Feature Embedded CNN for Salient Object Detection

no code implementations • 17 Aug 2015 • Hongyang Li, Huchuan Lu, Zhe Lin, Xiaohui Shen, Brian Price

In this paper, we propose a novel deep neural network framework embedded with low-level features (LCNN) for salient object detection in complex images.

object-detection RGB Salient Object Detection +1

Paper
Add Code

Inner and Inter Label Propagation: Salient Object Detection in the Wild

2 code implementations • 27 May 2015 • Hongyang Li, Huchuan Lu, Zhe Lin, Xiaohui Shen, Brian Price

For most natural images, some boundary superpixels serve as the background labels and the saliency of other superpixels are determined by ranking their similarities to the boundary labels based on an inner propagation scheme.

Computational Efficiency object-detection +4

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.