no code implementations • CVPR 2022 • Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser
Our model builds a panoptic radiance field representation of any scene from just color images.
2 code implementations • 10 Feb 2022 • Fitsum Reda, Janne Kontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru, Brian Curless
Recent methods use multiple networks to estimate optical flow or depth and a separate network dedicated to frame synthesis.
Ranked #2 on
Video Frame Interpolation
on Middlebury
(SSIM metric)
no code implementations • 21 Oct 2021 • Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester Cole, Avneesh Sud, Brian Brewington, Brian Shucker, Thomas Funkhouser
With the recent growth of urban mapping and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras.
Ranked #8 on
LIDAR Semantic Segmentation
on nuScenes
no code implementations • 5 May 2021 • Candice Schumann, Susanna Ricco, Utsav Prabhu, Vittorio Ferrari, Caroline Pantofaru
In this paper, we present a new set of annotations on a subset of the Open Images dataset called the MIAP (More Inclusive Annotations for People) subset, containing bounding boxes and attributes for all of the people visible in those images.
1 code implementation • ECCV 2020 • Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru
Features from multiple per view predictions are finally fused on 3D mesh vertices to predict mesh semantic segmentation labels.
Ranked #13 on
Semantic Segmentation
on ScanNet
(test mIoU metric)
no code implementations • ECCV 2020 • Rui Huang, Wanyue Zhang, Abhijit Kundu, Caroline Pantofaru, David A. Ross, Thomas Funkhouser, Alireza Fathi
We use a U-Net style 3D sparse convolution network to extract features for each frame's LiDAR point-cloud.
1 code implementation • ECCV 2020 • Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Thomas Funkhouser, Justin Solomon
We present a simple and flexible object detection framework optimized for autonomous driving.
no code implementations • CVPR 2020 • Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi
In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes.
1 code implementation • 5 Jan 2019 • Joseph Roth, Sourish Chaudhuri, Ondrej Klejch, Radhika Marvin, Andrew Gallagher, Liat Kaver, Sharadh Ramaswamy, Arkadiusz Stopczynski, Cordelia Schmid, Zhonghua Xi, Caroline Pantofaru
The dataset contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible.
Active Speaker Detection
Audio-Visual Active Speaker Detection
+4
1 code implementation • 2 Aug 2018 • Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi
Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization.
Sound Audio and Speech Processing
no code implementations • 31 May 2017 • Ken Hoover, Sourish Chaudhuri, Caroline Pantofaru, Malcolm Slaney, Ian Sturdy
In this paper, we present a system that associates faces with voices in a video by fusing information from the audio and visual signals.
9 code implementations • CVPR 2018 • Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.
Ranked #6 on
Action Detection
on UCF101-24
no code implementations • 7 Oct 2015 • Vinay Bettadapura, Irfan Essa, Caroline Pantofaru
We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization.
no code implementations • 1 Jul 2015 • Greg Mori, Caroline Pantofaru, Nisarg Kothari, Thomas Leung, George Toderici, Alexander Toshev, Weilong Yang
We present a method for learning an embedding that places images of humans in similar poses nearby.
no code implementations • CVPR 2013 • Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese
Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification.
Ranked #7 on
Room Layout Estimation
on SUN RGB-D