We present ObSuRF, a method which turns a single image of a scene into a 3D model represented as a set of Neural Radiance Fields (NeRFs), with each NeRF corresponding to a different object.
We propose NeRF-VAE, a 3D scene generative model that incorporates geometric structure via NeRF and differentiable volume rendering.
Relational reasoning, the ability to model interactions and relations between objects, is valuable for robust multi-object tracking and pivotal for trajectory prediction.
We develop a functional encoder-decoder approach to supervised meta-learning, where labeled data is encoded into an infinite-dimensional functional representation rather than a finite-dimensional one.
Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning.
Ranked #1 on Image Generation on Multi-dSprites
The majority of contemporary object-tracking approaches do not model interactions between objects.
In the second stage, SCAE predicts parameters of a few object capsules, which are then used to reconstruct part poses.
Ranked #2 on Unsupervised MNIST on MNIST
Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances.
Conversely, training on an easy dataset where visual cues are positively correlated with stability, the baseline model learns a bias leading to poor performance on a harder dataset.
It can reliably discover and track objects throughout the sequence of frames, and can also generate future frames conditioning on the current frame, thereby simulating expected motion of objects.
Stochastic control-flow models (SCFMs) are a class of generative models that involve branching on choices from discrete random variables.
We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator.
Class-agnostic object tracking is particularly difficult in cluttered environments as target specific discriminative models cannot be learned a priori.