Search Results for author: Thomas Funkhouser

Found 49 papers, 20 papers with code

Multiresolution Deep Implicit Functions for 3D Shape Representation

no code implementations ICCV 2021 Zhang Chen, yinda zhang, Kyle Genova, Sean Fanello, Sofien Bouaziz, Christian Haene, Ruofei Du, Cem Keskin, Thomas Funkhouser, Danhang Tang

To the best of our knowledge, MDIF is the first deep implicit function model that can at the same time (1) represent different levels of detail and allow progressive decoding; (2) support both encoder-decoder inference and decoder-only latent optimization, and fulfill multiple applications; (3) perform detailed decoder-only shape completion.

3D Reconstruction 3D Shape Representation

Contrastive Multimodal Fusion with TupleInfoNCE

no code implementations ICCV 2021 Yunze Liu, Qingnan Fan, Shanghang Zhang, Hao Dong, Thomas Funkhouser, Li Yi

Another approach is to concatenate all the modalities into a tuple and then contrast positive and negative tuple correspondences.

Contrastive Learning Representation Learning

Spatial Intention Maps for Multi-Agent Mobile Manipulation

1 code implementation23 Mar 2021 Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

The ability to communicate intention enables decentralized multi-agent robots to collaborate while performing physical tasks.

IBRNet: Learning Multi-View Image-Based Rendering

no code implementations CVPR 2021 Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser

Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes.

Neural Rendering Novel View Synthesis

P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding

no code implementations24 Dec 2020 Yunze Liu, Li Yi, Shanghang Zhang, Qingnan Fan, Thomas Funkhouser, Hao Dong

Self-supervised representation learning is a critical problem in computer vision, as it provides a way to pretrain feature extractors on large unlabeled datasets that can be used as an initialization for more efficient and effective training on downstream tasks.

Contrastive Learning Representation Learning +1

Object-Centric Neural Scene Rendering

no code implementations15 Dec 2020 Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser

We present a method for composing photorealistic scenes from captured images of objects.

Forecasting Characteristic 3D Poses of Human Actions

no code implementations30 Nov 2020 Christian Diller, Thomas Funkhouser, Angela Dai

We propose the task of forecasting characteristic 3D poses: from a monocular video observation of a person, to predict a future 3D pose of that person in a likely action-defining, characteristic pose - for instance, from observing a person reaching for a banana, predict the pose of the person eating the banana.

Human motion prediction motion prediction +1

Learning to Infer Semantic Parameters for 3D Shape Editing

no code implementations9 Nov 2020 Fangyin Wei, Elena Sizikova, Avneesh Sud, Szymon Rusinkiewicz, Thomas Funkhouser

Many applications in 3D shape design and augmentation require the ability to make specific edits to an object's semantic parameters (e. g., the pose of a person's arm or the length of an airplane's wing) while preserving as much existing details as possible.

Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection

no code implementations24 Sep 2020 Yue Wang, Alireza Fathi, Jiajun Wu, Thomas Funkhouser, Justin Solomon

A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing.

3D Object Detection Autonomous Driving +1

Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds

no code implementations CVPR 2021 Li Yi, Boqing Gong, Thomas Funkhouser

We study an unsupervised domain adaptation problem for the semantic labeling of 3D point clouds, with a particular focus on domain discrepancies induced by different LiDAR sensors.

Semantic Segmentation Unsupervised Domain Adaptation

Spatial Action Maps for Mobile Manipulation

1 code implementation20 Apr 2020 Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon Rusinkiewicz, Thomas Funkhouser

Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e. g., step forward, turn left, turn right, etc.)

Q-Learning Value prediction

Local Implicit Grid Representations for 3D Scenes

no code implementations19 Mar 2020 Chiyu Max Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Nießner, Thomas Funkhouser

Then, we use the decoder as a component in a shape optimization that solves for a set of latent codes on a regular grid of overlapping crops such that an interpolation of the decoded local shapes matches a partial or noisy observation.

3D Shape Representation

Adversarial Texture Optimization from RGB-D Scans

1 code implementation CVPR 2020 Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu Max Jiang, Leonidas Guibas, Matthias Nießner, Thomas Funkhouser

In this work, we present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views.

Texture Synthesis

Local Deep Implicit Functions for 3D Shape

no code implementations CVPR 2020 Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, Thomas Funkhouser

The goal of this project is to learn a 3D shape representation that enables accurate surface reconstruction, compact storage, efficient computation, consistency for similar shapes, generalization across diverse shape categories, and inference from depth camera observations.

3D Shape Representation

Grasping in the Wild:Learning 6DoF Closed-Loop Grasping from Low-Cost Demonstrations

no code implementations9 Dec 2019 Shuran Song, Andy Zeng, Johnny Lee, Thomas Funkhouser

A key aspect of our grasping model is that it uses "action-view" based rendering to simulate future states with respect to different possible actions.

Rescan: Inductive Instance Segmentation for Indoor RGBD Scans

no code implementations ICCV 2019 Maciej Halber, Yifei Shi, Kai Xu, Thomas Funkhouser

In depth-sensing applications ranging from home robotics to AR/VR, it will be common to acquire 3D scans of interior spaces repeatedly at sparse time intervals (e. g., as part of regular daily use).

Instance Segmentation Semantic Segmentation

Neural Illumination: Lighting Prediction for Indoor Environments

no code implementations CVPR 2019 Shuran Song, Thomas Funkhouser

This paper addresses the task of estimating the light arriving from all directions to a 3D point observed at a selected pixel in an RGB image.

Learning Shape Templates with Structured Implicit Functions

no code implementations ICCV 2019 Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T. Freeman, Thomas Funkhouser

To allow for widely varying geometry and topology, we choose an implicit surface representation based on composition of local shape elements.

Semantic Segmentation

FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image

1 code implementation ICCV 2019 Jingwei Huang, Yichao Zhou, Thomas Funkhouser, Leonidas Guibas

In this work, we introduce the novel problem of identifying dense canonical 3D coordinate frames from a single RGB image.

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

no code implementations27 Mar 2019 Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error.

TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes

1 code implementation CVPR 2019 Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkhouser, Matthias Nießner, Leonidas Guibas

We introduce, TextureNet, a neural network architecture designed to extract features from high-resolution signals associated with 3D surface meshes (e. g., color texture maps).

3D Semantic Segmentation

Structure-Aware Shape Synthesis

no code implementations4 Aug 2018 Elena Balashova, Vivek Singh, Jiangping Wang, Brian Teixeira, Terrence Chen, Thomas Funkhouser

We propose a new procedure to guide training of a data-driven shape generative model using a structure-aware loss function.

Im2Pano3D: Extrapolating 360° Structure and Semantics Beyond the Field of View

no code implementations CVPR 2018 Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation ( <=50%) in the form of an RGB-D image.

Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning

4 code implementations27 Mar 2018 Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser

Skilled robotic manipulation benefits from complex synergies between non-prehensile (e. g. pushing) and prehensile (e. g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free.


Deep Depth Completion of a Single RGB-D Image

1 code implementation CVPR 2018 Yinda Zhang, Thomas Funkhouser

The goal of our work is to complete the depth channel of an RGB-D image.

Depth Completion Depth Estimation

PlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction

no code implementations ECCV 2018 Yifei Shi, Kai Xu, Matthias Niessner, Szymon Rusinkiewicz, Thomas Funkhouser

We introduce a novel RGB-D patch descriptor designed for detecting coplanar surfaces in SLAM reconstruction.

Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

no code implementations12 Dec 2017 Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image.

MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments

2 code implementations11 Dec 2017 Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, Vladlen Koltun

We present MINOS, a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments.

Interactive 3D Modeling with a Generative Adversarial Network

no code implementations16 Jun 2017 Jerry Liu, Fisher Yu, Thomas Funkhouser

This paper proposes the idea of using a generative adversarial network (GAN) to assist a novice user in designing real-world shapes with a simple interface.


Dilated Residual Networks

2 code implementations CVPR 2017 Fisher Yu, Vladlen Koltun, Thomas Funkhouser

Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible.

Classification General Classification +4

Learning Where to Look: Data-Driven Viewpoint Set Selection for 3D Scenes

no code implementations7 Apr 2017 Kyle Genova, Manolis Savva, Angel X. Chang, Thomas Funkhouser

We provide a search algorithm that generates a sampling of likely candidate views according to the example distribution, and a set selection algorithm that chooses a subset of the candidates that jointly cover the example distribution.

Semantic Segmentation

Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks

no code implementations CVPR 2017 Yinda Zhang, Shuran Song, Ersin Yumer, Manolis Savva, Joon-Young Lee, Hailin Jin, Thomas Funkhouser

One of the bottlenecks in training for better representations is the amount of available per-pixel ground truth data that is required for core scene understanding tasks such as semantic segmentation, normal prediction, and object edge detection.

Boundary Detection Edge Detection +4

Semantic Scene Completion from a Single Depth Image

3 code implementations CVPR 2017 Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser

This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation.

Fine-To-Coarse Global Registration of RGB-D Scans

no code implementations CVPR 2017 Maciej Halber, Thomas Funkhouser

RGB-D scanning of indoor environments is important for many applications, including real estate, interior design, and virtual reality.

Virtual Reality

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

1 code implementation CVPR 2017 Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, Thomas Funkhouser

To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions.

3D Reconstruction Point Cloud Registration

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

3 code implementations10 Jun 2015 Fisher Yu, Ari Seff, yinda zhang, Shuran Song, Thomas Funkhouser, Jianxiong Xiao

While there has been remarkable progress in the performance of visual recognition algorithms, the state-of-the-art models tend to be exceptionally data-hungry.

Semantic Alignment of LiDAR Data at City Scale

no code implementations CVPR 2015 Fisher Yu, Jianxiong Xiao, Thomas Funkhouser

This paper describes an automatic algorithm for global alignment of LiDAR data collected with Google Street View cars in urban environments.

Pose Estimation Structure from Motion

Cannot find the paper you are looking for? You can Submit a new open access paper.