Omni3D re-purposes and combines existing datasets resulting in 234k images annotated with more than 3 million instances and 97 categories. 3D detection at such scale is challenging due to variations in camera intrinsics and the rich diversity of scene and object types.
This paper presents a framework that combines traditional keypoint-based camera pose optimization with an invertible neural rendering mechanism.
Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics.
We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner.
Efficiently reconstructing complex and intricate surfaces at scale is a long-standing goal in machine perception.
Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task.
Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures.
2 code implementations • 13 Jun 2019 • Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. Strasdat, Renzo De Nardi, Michael Goesele, Steven Lovegrove, Richard Newcombe
We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale.
We propose a system that uses a convolution neural network (CNN) to estimate depth from a stereo pair followed by volumetric fusion of the predicted depth maps to produce a 3D reconstruction of a scene.
13 code implementations • • Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra
We present Habitat, a platform for research in embodied artificial intelligence (AI).
Ranked #2 on PointGoal Navigation on Gibson PointGoal Navigation
In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape representation, interpolation and completion from partial and noisy 3D input data.
In a novel manner, we demonstrate how the sparsity of the personal road network of a driver in conjunction with a hierarchical topic model allows data driven predictions about destinations as well as likely road conditions.
To aide simultaneous localization and mapping (SLAM), future perception systems will incorporate forms of scene understanding.
Based on the small-variance limit of Bayesian nonparametric von-Mises-Fisher (vMF) mixture distributions, we propose two new flexible and efficient k-means-like clustering algorithms for directional data such as surface normals.
Point cloud alignment is a common problem in computer vision and robotics, with applications ranging from 3D object recognition to reconstruction.
This paper presents a methodology for creating streaming, distributed inference algorithms for Bayesian nonparametric (BNP) models.
Traditional approaches to scene representation exploit this phenomenon via the somewhat restrictive assumption that every plane is perpendicular to one of the axes of a single coordinate system.