Monocular 3D Object Detection

76 papers with code • 15 benchmarks • 5 datasets

Monocular 3D Object Detection is the task to draw 3D bounding box around objects in a single 2D RGB image. It is localization task but without any extra information like depth or other sensors or multiple-images.

Benchmarks

Add a Result

These leaderboards are used to track progress in Monocular 3D Object Detection

Dataset	Best Model	Compare
KITTI Cars Moderate	CIE	See all
SUN RGB-D	IM3D	See all
KITTI Cars Hard	CIE	See all
KITTI Pedestrian Hard	DD3D	See all
KITTI Cars Easy	CIE	See all
KITTI Pedestrian Easy	CMKD	See all
KITTI Pedestrian Moderate	CMKD	See all
Google Objectron	Lin2021	See all
KITTI Cyclist Easy	CMKD	See all
KITTI Cyclist Moderate	CMKD	See all
KITTI Cyclist Hard	CMKD	See all
KITTI Pedestrians Moderate val	CubifAE-3D	See all
Virtual KITTI 2	CubifAE-3D	See all
CoPerception-UAVs	Where2comm	See all
OPV2V	Where2comm	See all

Show all 15 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Monocular 3D Object Detection models and implementations

open-mmlab/mmdetection3d

3 papers

4,808

PaddlePaddle/Paddle3D

2 papers

536

Owen-Liuyuxuan/visualDet3D

2 papers

358

Datasets

Most implemented papers

Most implemented Social Latest No code

Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection

Xianpeng919/MonoCon • • 9 Dec 2021

It presents the MonoCon method which learns Monocular Contexts, as auxiliary tasks in training, to help monocular 3D object detection.

Paper
Code

DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection

abhi1kumar/deviant • • 21 Jul 2022

As a result, DEVIANT is equivariant to the depth translations in the projective manifold whereas vanilla networks are not.

Paper
Code

Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps

mediabrain-sjtu/where2comm • • 26 Sep 2022

Where2comm has two distinct advantages: i) it considers pragmatic compression and uses less communication to achieve higher perception performance by focusing on perceptually critical areas; and ii) it can handle varying communication bandwidth by dynamically adjusting spatial areas involved in communication.

Paper
Code

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

thusiyuan/holistic_scene_parsing • ECCV 2018

We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model.

Paper
Code

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

thusiyuan/cooperative_scene_parsing • NeurIPS 2018

Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera pose, all in 3D.

Paper
Code

Orthographic Feature Transform for Monocular 3D Object Detection

tom-roddick/oft • • 20 Nov 2018

This allows us to reason holistically about the spatial configuration of the scene in a domain where scale is consistent and distances between objects are meaningful.

Paper
Code

MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization

Zengyi-Qin/MonoGRNet • • 26 Nov 2018

We propose MonoGRNet for the amodal 3D object detection from a monocular RGB image via geometric reasoning in both the observed 2D projection and the unobserved depth dimension.

Paper
Code

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud

xinshuoweng/mono3D_PLiDAR • • 23 Mar 2019

Following the pipeline of two-stage 3D detection algorithms, we detect 2D object proposals in the input image and extract a point cloud frustum from the pseudo-LiDAR for each proposal.

Paper
Code