Search Results for author: Caroline Pantofaru

Found 15 papers, 6 papers with code

Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

no code implementations • CVPR 2022 • Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser

Our model builds a panoptic radiance field representation of any scene from just color images.

3D scene Editing Depth Estimation +4

Paper
Add Code

FILM: Frame Interpolation for Large Motion

2 code implementations • 10 Feb 2022 • Fitsum Reda, Janne Kontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru, Brian Curless

Recent methods use multiple networks to estimate optical flow or depth and a separate network dedicated to frame synthesis.

Ranked #2 on Video Frame Interpolation on Xiph-4k

Optical Flow Estimation Video Frame Interpolation

2,661

Paper
Code

Learning 3D Semantic Segmentation with only 2D Image Supervision

no code implementations • 21 Oct 2021 • Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester Cole, Avneesh Sud, Brian Brewington, Brian Shucker, Thomas Funkhouser

With the recent growth of urban mapping and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras.

Ranked #8 on LIDAR Semantic Segmentation on nuScenes

3D Semantic Segmentation Autonomous Driving +1

Paper
Add Code

A Step Toward More Inclusive People Annotations for Fairness

no code implementations • 5 May 2021 • Candice Schumann, Susanna Ricco, Utsav Prabhu, Vittorio Ferrari, Caroline Pantofaru

In this paper, we present a new set of annotations on a subset of the Open Images dataset called the MIAP (More Inclusive Annotations for People) subset, containing bounding boxes and attributes for all of the people visible in those images.

Attribute Fairness

Paper
Add Code

Virtual Multi-view Fusion for 3D Semantic Segmentation

1 code implementation • ECCV 2020 • Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David Ross, Brian Brewington, Thomas Funkhouser, Caroline Pantofaru

Features from multiple per view predictions are finally fused on 3D mesh vertices to predict mesh semantic segmentation labels.

Ranked #12 on Semantic Segmentation on ScanNet

2D Semantic Segmentation 3D Semantic Segmentation +2

Paper
Code

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

no code implementations • ECCV 2020 • Rui Huang, Wanyue Zhang, Abhijit Kundu, Caroline Pantofaru, David A. Ross, Thomas Funkhouser, Alireza Fathi

We use a U-Net style 3D sparse convolution network to extract features for each frame's LiDAR point-cloud.

3D Object Detection Autonomous Driving +2

Paper
Add Code

Pillar-based Object Detection for Autonomous Driving

1 code implementation • ECCV 2020 • Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Thomas Funkhouser, Justin Solomon

We present a simple and flexible object detection framework optimized for autonomous driving.

3D Object Detection Autonomous Driving +2

131

Paper
Code

DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes

no code implementations • CVPR 2020 • Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi

In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes.

3D Object Detection Autonomous Driving +2

Paper
Add Code

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

1 code implementation • 5 Jan 2019 • Joseph Roth, Sourish Chaudhuri, Ondrej Klejch, Radhika Marvin, Andrew Gallagher, Liat Kaver, Sharadh Ramaswamy, Arkadiusz Stopczynski, Cordelia Schmid, Zhonghua Xi, Caroline Pantofaru

The dataset contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible.

Audio-Visual Active Speaker Detection speaker-diarization +2

Paper
Code

AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

1 code implementation • 2 Aug 2018 • Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi

Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization.

Sound Audio and Speech Processing

Paper
Code

Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers

no code implementations • 31 May 2017 • Ken Hoover, Sourish Chaudhuri, Caroline Pantofaru, Malcolm Slaney, Ian Sturdy

In this paper, we present a system that associates faces with voices in a video by fusing information from the audio and visual signals.

Paper
Add Code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

8 code implementations • CVPR 2018 • Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Ranked #6 on Action Detection on UCF101-24

Actin Detection Action Detection +3

76,579

Paper
Code

Egocentric Field-of-View Localization Using First-Person Point-of-View Devices

no code implementations • 7 Oct 2015 • Vinay Bettadapura, Irfan Essa, Caroline Pantofaru

We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization.

Paper
Add Code

Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

no code implementations • 1 Jul 2015 • Greg Mori, Caroline Pantofaru, Nisarg Kothari, Thomas Leung, George Toderici, Alexander Toshev, Weilong Yang

We present a method for learning an embedding that places images of humans in similar poses nearby.

Retrieval

Paper
Add Code

Understanding Indoor Scenes Using 3D Geometric Phrases

no code implementations • CVPR 2013 • Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese

Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification.

Ranked #7 on Room Layout Estimation on SUN RGB-D

General Classification Object +5

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.