no code implementations • 11 Apr 2025 • Boyang Deng, Songyou Peng, Kyle Genova, Gordon Wetzstein, Noah Snavely, Leonidas Guibas, Thomas Funkhouser
We present a system using Multimodal LLMs (MLLMs) to analyze a large database with tens of millions of images captured at different times, with the aim of discovering patterns in temporal changes.
no code implementations • 8 Mar 2025 • Anh Thai, Songyou Peng, Kyle Genova, Leonidas Guibas, Thomas Funkhouser
Language-guided 3D scene understanding is important for advancing applications in robotics, AR/VR, and human-computer interaction, enabling models to comprehend and interact with 3D environments through natural language.
no code implementations • 4 Jan 2024 • Zihao Xiao, Longlong Jing, Shangxuan Wu, Alex Zihao Zhu, Jingwei Ji, Chiyu Max Jiang, Wei-Chih Hung, Thomas Funkhouser, Weicheng Kuo, Anelia Angelova, Yin Zhou, Shiwei Sheng
3D panoptic segmentation is a challenging perception task, especially in autonomous driving.
no code implementations • 5 Dec 2023 • Yushi Lan, Feitong Tan, Di Qiu, Qiangeng Xu, Kyle Genova, Zeng Huang, Sean Fanello, Rohit Pandey, Thomas Funkhouser, Chen Change Loy, yinda zhang
We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility.
1 code implementation • 9 May 2023 • Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, Thomas Funkhouser
For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios.
no code implementations • ICCV 2023 • Fangyin Wei, Thomas Funkhouser, Szymon Rusinkiewicz
Once clutter is removed, we inpaint geometry and texture in the resulting holes by merging inpainted RGB-D images.
no code implementations • 10 Mar 2023 • Hong-Xing Yu, Michelle Guo, Alireza Fathi, Yen-Yu Chang, Eric Ryan Chan, Ruohan Gao, Thomas Funkhouser, Jiajun Wu
We propose Object-Centric Neural Scattering Functions (OSFs) for learning to reconstruct object appearance from only images.
no code implementations • CVPR 2023 • Xiaoshuai Zhang, Abhijit Kundu, Thomas Funkhouser, Leonidas Guibas, Hao Su, Kyle Genova
We address efficient and structure-aware 3D scene representation from images.
1 code implementation • 9 Feb 2023 • Guandao Yang, Sagie Benaim, Varun Jampani, Kyle Genova, Jonathan T. Barron, Thomas Funkhouser, Bharath Hariharan, Serge Belongie
We use this framework to design Fourier PNFs, which match state-of-the-art performance in signal representation tasks that use neural fields.
1 code implementation • CVPR 2023 • Songyou Peng, Kyle Genova, Chiyu "Max" Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision.
Ranked #7 on
3D Open-Vocabulary Instance Segmentation
on Replica
3D Open-Vocabulary Instance Segmentation
3D Semantic Segmentation
+1
no code implementations • CVPR 2024 • Christian Diller, Thomas Funkhouser, Angela Dai
Thus, we design our method to only require 2D RGB data at inference time while being able to generate 3D human motion sequences.
1 code implementation • CVPR 2023 • Zhiqin Chen, Thomas Funkhouser, Peter Hedman, Andrea Tagliasacchi
Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views.
Ranked #2 on
Novel View Synthesis
on LLFF
no code implementations • CVPR 2022 • Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser
Our model builds a panoptic radiance field representation of any scene from just color images.
1 code implementation • 5 Apr 2022 • Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser
We investigate pneumatic non-prehensile manipulation (i. e., blowing) as a means of efficiently moving scattered objects into a target receptacle.
2 code implementations • 4 Feb 2022 • Zhiqin Chen, Andrea Tagliasacchi, Thomas Funkhouser, Hao Zhang
We introduce neural dual contouring (NDC), a new data-driven approach to mesh reconstruction based on dual contouring (DC).
no code implementations • NeurIPS 2021 • Boyang Deng, Charles R. Qi, Mahyar Najibi, Thomas Funkhouser, Yin Zhou, Dragomir Anguelov
Given the insight that SDE would benefit from more accurate geometry descriptions, we propose to represent objects as amodal contours, specifically amodal star-shaped polygons, and devise a simple model, StarPoly, to predict such contours.
no code implementations • CVPR 2022 • Konstantinos Rematas, Andrew Liu, Pratul P. Srinivasan, Jonathan T. Barron, Andrea Tagliasacchi, Thomas Funkhouser, Vittorio Ferrari
The goal of this work is to perform 3D reconstruction and novel view synthesis from data captured by scanning platforms commonly deployed for world mapping in urban outdoor environments (e. g., Street View).
1 code implementation • CVPR 2022 • Mehdi S. M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus Greff, Noha Radwan, Suhani Vora, Mario Lucic, Daniel Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas Funkhouser, Andrea Tagliasacchi
In this work, we propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area, infers a "set-latent scene representation", and synthesises novel views, all in a single feed-forward pass.
no code implementations • 21 Oct 2021 • Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester Cole, Avneesh Sud, Brian Brewington, Brian Shucker, Thomas Funkhouser
With the recent growth of urban mapping and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras.
Ranked #9 on
LIDAR Semantic Segmentation
on nuScenes
no code implementations • ICCV 2021 • Zhang Chen, yinda zhang, Kyle Genova, Sean Fanello, Sofien Bouaziz, Christian Haene, Ruofei Du, Cem Keskin, Thomas Funkhouser, Danhang Tang
To the best of our knowledge, MDIF is the first deep implicit function model that can at the same time (1) represent different levels of detail and allow progressive decoding; (2) support both encoder-decoder inference and decoder-only latent optimization, and fulfill multiple applications; (3) perform detailed decoder-only shape completion.
1 code implementation • ICCV 2021 • Yunze Liu, Qingnan Fan, Shanghang Zhang, Hao Dong, Thomas Funkhouser, Li Yi
Another approach is to concatenate all the modalities into a tuple and then contrast positive and negative tuple correspondences.
Ranked #81 on
Semantic Segmentation
on NYU Depth v2
1 code implementation • 23 Mar 2021 • Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser
The ability to communicate intention enables decentralized multi-agent robots to collaborate while performing physical tasks.
1 code implementation • CVPR 2021 • Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser
Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes.
no code implementations • 24 Dec 2020 • Yunze Liu, Li Yi, Shanghang Zhang, Qingnan Fan, Thomas Funkhouser, Hao Dong
Self-supervised representation learning is a critical problem in computer vision, as it provides a way to pretrain feature extractors on large unlabeled datasets that can be used as an initialization for more efficient and effective training on downstream tasks.
no code implementations • 15 Dec 2020 • Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser
We present a method for composing photorealistic scenes from captured images of objects.
1 code implementation • CVPR 2021 • Siyan Dong, Qingnan Fan, He Wang, Ji Shi, Li Yi, Thomas Funkhouser, Baoquan Chen, Leonidas Guibas
Localizing the camera in a known indoor environment is a key building block for scene mapping, robot navigation, AR, etc.
no code implementations • CVPR 2022 • Christian Diller, Thomas Funkhouser, Angela Dai
To predict characteristic poses, we propose a probabilistic approach that models the possible multi-modality in the distribution of likely characteristic poses.
no code implementations • 9 Nov 2020 • Fangyin Wei, Elena Sizikova, Avneesh Sud, Szymon Rusinkiewicz, Thomas Funkhouser
Many applications in 3D shape design and augmentation require the ability to make specific edits to an object's semantic parameters (e. g., the pose of a person's arm or the length of an airplane's wing) while preserving as much existing details as possible.
no code implementations • 24 Sep 2020 • Yue Wang, Alireza Fathi, Jiajun Wu, Thomas Funkhouser, Justin Solomon
A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing.
1 code implementation • ECCV 2020 • Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru
Features from multiple per view predictions are finally fused on 3D mesh vertices to predict mesh semantic segmentation labels.
Ranked #15 on
Semantic Segmentation
on ScanNet
(test mIoU metric)
no code implementations • ECCV 2020 • Rui Huang, Wanyue Zhang, Abhijit Kundu, Caroline Pantofaru, David A. Ross, Thomas Funkhouser, Alireza Fathi
We use a U-Net style 3D sparse convolution network to extract features for each frame's LiDAR point-cloud.
1 code implementation • ECCV 2020 • Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Thomas Funkhouser, Justin Solomon
We present a simple and flexible object detection framework optimized for autonomous driving.
no code implementations • CVPR 2021 • Li Yi, Boqing Gong, Thomas Funkhouser
We study an unsupervised domain adaptation problem for the semantic labeling of 3D point clouds, with a particular focus on domain discrepancies induced by different LiDAR sensors.
1 code implementation • 20 Apr 2020 • Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon Rusinkiewicz, Thomas Funkhouser
Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e. g., step forward, turn left, turn right, etc.)
no code implementations • CVPR 2020 • Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi
In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes.
1 code implementation • 19 Mar 2020 • Chiyu Max Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Nießner, Thomas Funkhouser
Then, we use the decoder as a component in a shape optimization that solves for a set of latent codes on a regular grid of overlapping crops such that an interpolation of the decoded local shapes matches a partial or noisy observation.
1 code implementation • CVPR 2020 • Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu Max Jiang, Leonidas Guibas, Matthias Nießner, Thomas Funkhouser
In this work, we present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views.
1 code implementation • CVPR 2020 • Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, Thomas Funkhouser
The goal of this project is to learn a 3D shape representation that enables accurate surface reconstruction, compact storage, efficient computation, consistency for similar shapes, generalization across diverse shape categories, and inference from depth camera observations.
no code implementations • 9 Dec 2019 • Shuran Song, Andy Zeng, Johnny Lee, Thomas Funkhouser
A key aspect of our grasping model is that it uses "action-view" based rendering to simulate future states with respect to different possible actions.
no code implementations • ICCV 2019 • Maciej Halber, Yifei Shi, Kai Xu, Thomas Funkhouser
In depth-sensing applications ranging from home robotics to AR/VR, it will be common to acquire 3D scans of interior spaces repeatedly at sparse time intervals (e. g., as part of regular daily use).
no code implementations • CVPR 2019 • Shuran Song, Thomas Funkhouser
This paper addresses the task of estimating the light arriving from all directions to a 3D point observed at a selected pixel in an RGB image.
1 code implementation • ICCV 2019 • Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T. Freeman, Thomas Funkhouser
To allow for widely varying geometry and topology, we choose an implicit surface representation based on composition of local shape elements.
1 code implementation • ICCV 2019 • Jingwei Huang, Yichao Zhou, Thomas Funkhouser, Leonidas Guibas
In this work, we introduce the novel problem of identifying dense canonical 3D coordinate frames from a single RGB image.
no code implementations • 27 Mar 2019 • Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser
In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error.
1 code implementation • CVPR 2019 • Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkhouser, Matthias Nießner, Leonidas Guibas
We introduce, TextureNet, a neural network architecture designed to extract features from high-resolution signals associated with 3D surface meshes (e. g., color texture maps).
Ranked #26 on
Semantic Segmentation
on ScanNet
(test mIoU metric)
no code implementations • 4 Aug 2018 • Elena Balashova, Vivek Singh, Jiangping Wang, Brian Teixeira, Terrence Chen, Thomas Funkhouser
We propose a new procedure to guide training of a data-driven shape generative model using a structure-aware loss function.
1 code implementation • ECCV 2018 • Yinda Zhang, Sameh Khamis, Christoph Rhemann, Julien Valentin, Adarsh Kowdle, Vladimir Tankovich, Michael Schoenberg, Shahram Izadi, Thomas Funkhouser, Sean Fanello
In this paper we present ActiveStereoNet, the first deep learning solution for active stereo systems.
no code implementations • CVPR 2018 • Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser
We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation ( <=50%) in the form of an RGB-D image.
4 code implementations • 27 Mar 2018 • Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser
Skilled robotic manipulation benefits from complex synergies between non-prehensile (e. g. pushing) and prehensile (e. g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free.
1 code implementation • CVPR 2018 • Yinda Zhang, Thomas Funkhouser
The goal of our work is to complete the depth channel of an RGB-D image.
no code implementations • ECCV 2018 • Yifei Shi, Kai Xu, Matthias Niessner, Szymon Rusinkiewicz, Thomas Funkhouser
We introduce a novel RGB-D patch descriptor designed for detecting coplanar surfaces in SLAM reconstruction.
2 code implementations • 22 Mar 2018 • Kevin Chen, Christopher B. Choy, Manolis Savva, Angel X. Chang, Thomas Funkhouser, Silvio Savarese
To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes.
no code implementations • 12 Dec 2017 • Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser
We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image.
2 code implementations • 11 Dec 2017 • Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, Vladlen Koltun
We present MINOS, a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments.
1 code implementation • 17 Oct 2017 • Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra, Narasimha Murthy, Bhargava Ramu, Bharadwaj Manda, M. Ramanathan, Gautam Kumar, P Preetham, Siddharth Srivastava, Swati Bhugra, Brejesh lall, Christian Haene, Shubham Tulsiani, Jitendra Malik, Jared Lafer, Ramsey Jones, Siyuan Li, Jie Lu, Shi Jin, Jingyi Yu, Qi-Xing Huang, Evangelos Kalogerakis, Silvio Savarese, Pat Hanrahan, Thomas Funkhouser, Hao Su, Leonidas Guibas
We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database.
3 code implementations • 3 Oct 2017 • Andy Zeng, Shuran Song, Kuan-Ting Yu, Elliott Donlon, Francois R. Hogan, Maria Bauza, Daolin Ma, Orion Taylor, Melody Liu, Eudald Romo, Nima Fazeli, Ferran Alet, Nikhil Chavan Dafle, Rachel Holladay, Isabella Morona, Prem Qu Nair, Druck Green, Ian Taylor, Weber Liu, Thomas Funkhouser, Alberto Rodriguez
Since product images are readily available for a wide range of objects (e. g., from the web), the system works out-of-the-box for novel objects without requiring any additional training data.
1 code implementation • 18 Sep 2017 • Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, yinda zhang
Access to large, diverse RGB-D datasets is critical for training RGB-D scene understanding algorithms.
no code implementations • 16 Jun 2017 • Jerry Liu, Fisher Yu, Thomas Funkhouser
This paper proposes the idea of using a generative adversarial network (GAN) to assist a novice user in designing real-world shapes with a simple interface.
3 code implementations • CVPR 2017 • Fisher Yu, Vladlen Koltun, Thomas Funkhouser
Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible.
no code implementations • 7 Apr 2017 • Kyle Genova, Manolis Savva, Angel X. Chang, Thomas Funkhouser
We provide a search algorithm that generates a sampling of likely candidate views according to the example distribution, and a set selection algorithm that chooses a subset of the candidates that jointly cover the example distribution.
1 code implementation • CVPR 2017 • Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, Matthias Nießner
A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets.
Ranked #11 on
Semantic Segmentation
on ScanNetV2
no code implementations • CVPR 2017 • Yinda Zhang, Shuran Song, Ersin Yumer, Manolis Savva, Joon-Young Lee, Hailin Jin, Thomas Funkhouser
One of the bottlenecks in training for better representations is the amount of available per-pixel ground truth data that is required for core scene understanding tasks such as semantic segmentation, normal prediction, and object edge detection.
3 code implementations • CVPR 2017 • Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser
This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation.
Ranked #2 on
3D Semantic Scene Completion
on KITTI-360
no code implementations • CVPR 2017 • Maciej Halber, Thomas Funkhouser
RGB-D scanning of indoor environments is important for many applications, including real estate, interior design, and virtual reality.
2 code implementations • CVPR 2017 • Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, Thomas Funkhouser
To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions.
Ranked #2 on
3D Reconstruction
on Scan2CAD
15 code implementations • 9 Dec 2015 • Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qi-Xing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, Fisher Yu
We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects.
4 code implementations • 10 Jun 2015 • Fisher Yu, Ari Seff, yinda zhang, Shuran Song, Thomas Funkhouser, Jianxiong Xiao
While there has been remarkable progress in the performance of visual recognition algorithms, the state-of-the-art models tend to be exceptionally data-hungry.
no code implementations • CVPR 2015 • Fisher Yu, Jianxiong Xiao, Thomas Funkhouser
This paper describes an automatic algorithm for global alignment of LiDAR data collected with Google Street View cars in urban environments.