Search Results for author: Caroline Pantofaru

Found 15 papers, 6 papers with code

FILM: Frame Interpolation for Large Motion

2 code implementations10 Feb 2022 Fitsum Reda, Janne Kontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru, Brian Curless

Recent methods use multiple networks to estimate optical flow or depth and a separate network dedicated to frame synthesis.

Optical Flow Estimation Video Frame Interpolation

Learning 3D Semantic Segmentation with only 2D Image Supervision

no code implementations21 Oct 2021 Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester Cole, Avneesh Sud, Brian Brewington, Brian Shucker, Thomas Funkhouser

With the recent growth of urban mapping and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras.

3D Semantic Segmentation Autonomous Driving +1

A Step Toward More Inclusive People Annotations for Fairness

no code implementations5 May 2021 Candice Schumann, Susanna Ricco, Utsav Prabhu, Vittorio Ferrari, Caroline Pantofaru

In this paper, we present a new set of annotations on a subset of the Open Images dataset called the MIAP (More Inclusive Annotations for People) subset, containing bounding boxes and attributes for all of the people visible in those images.

Attribute Fairness

AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

1 code implementation2 Aug 2018 Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi

Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization.

Sound Audio and Speech Processing

Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers

no code implementations31 May 2017 Ken Hoover, Sourish Chaudhuri, Caroline Pantofaru, Malcolm Slaney, Ian Sturdy

In this paper, we present a system that associates faces with voices in a video by fusing information from the audio and visual signals.

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

8 code implementations CVPR 2018 Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Actin Detection Action Detection +3

Egocentric Field-of-View Localization Using First-Person Point-of-View Devices

no code implementations7 Oct 2015 Vinay Bettadapura, Irfan Essa, Caroline Pantofaru

We present a technique that uses images, videos and sensor data taken from first-person point-of-view devices to perform egocentric field-of-view (FOV) localization.

Cannot find the paper you are looking for? You can Submit a new open access paper.