TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Video Object Tracking	CATER	Inferno	Top 1 Accuracy	71.7	# 6
Video Object Tracking	CATER	Inferno	Top 5 Accuracy	88.9	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inferno-inferring-object-centric-3d-scene/video-object-tracking-on-cater)](https://paperswithcode.com/sota/video-object-tracking-on-cater?p=inferno-inferring-object-centric-3d-scene)`

INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision

29 Sep 2021 · Lluis Castrejon, Nicolas Ballas, Aaron Courville ·

We propose INFERNO, a method to infer object-centric representations of visual scenes without relying on annotations. Our method learns to decompose a scene into multiple objects, each object having a structured representation that disentangles its shape, appearance and 3D pose. To impose this structure we rely on recent advances in neural 3D rendering. Each object representation defines a localized neural radiance field that is used to generate 2D views of the scene through a differentiable rendering process. Our model is subsequently trained by minimizing a reconstruction loss between inputs and corresponding rendered scenes. We empirically show that INFERNO discovers objects in a scene without supervision. We also validate the interpretability of the learned representations by manipulating inferred scenes and showing the corresponding effect in the rendered output. Finally, we demonstrate the usefulness of our 3D object representations in a visual reasoning task using the CATER dataset.

PDF Abstract