Search Results for author: Bryan Russell

Found 35 papers, 15 papers with code

Segmenting Scenes by Matching Image Composites

no code implementations NeurIPS 2009 Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes.

Scene Segmentation Segmentation

Localizing 3D cuboids in single-view images

no code implementations NeurIPS 2012 Jianxiong Xiao, Bryan Russell, Antonio Torralba

In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes.

Understanding deep features with computer-generated imagery

no code implementations ICCV 2015 Mathieu Aubry, Bryan Russell

The rendered images are presented to a trained CNN and responses for different layers are studied with respect to the input scene factors.

Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views

no code implementations CVPR 2016 Francisco Massa, Bryan Russell, Mathieu Aubry

This paper presents an end-to-end convolutional neural network (CNN) for 2D-3D exemplar detection.

Marr Revisited: 2D-3D Alignment via Surface Normal Prediction

no code implementations CVPR 2016 Aayush Bansal, Bryan Russell, Abhinav Gupta

We introduce an approach that leverages surface normal predictions, along with appearance cues, to retrieve 3D models for objects depicted in 2D still images from a large CAD object library.

Object Pose Prediction +1

PixelNet: Towards a General Pixel-level Architecture

no code implementations21 Sep 2016 Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Semantic Segmentation +1

SURGE: Surface Regularized Geometry Estimation from a Single Image

no code implementations NeurIPS 2016 Peng Wang, Xiaohui Shen, Bryan Russell, Scott Cohen, Brian Price, Alan L. Yuille

This paper introduces an approach to regularize 2. 5D surface normal and depth predictions at each pixel given a single input image.

PixelNet: Representation of the pixels, by the pixels, and for the pixels

1 code implementation21 Feb 2017 Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Segmentation +2

ActionVLAD: Learning spatio-temporal aggregation for action classification

no code implementations CVPR 2017 Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell

In this work, we introduce a new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video.

Action Classification Classification +3

Localizing Moments in Video with Natural Language

2 code implementations ICCV 2017 Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

A key obstacle to training our MCN model is that current video datasets do not include pairs of localized video segments and referring expressions, or text descriptions which uniquely identify a corresponding moment.

Natural Language Queries

Localizing Moments in Video with Temporal Language

1 code implementation EMNLP 2018 Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

To benchmark whether our model, and other recent video localization models, can effectively reason about temporal language, we collect the novel TEMPOral reasoning in video and language (TEMPO) dataset.

Natural Language Queries Retrieval +1

B-Script: Transcript-based B-roll Video Editing with Recommendations

no code implementations28 Feb 2019 Bernd Huber, Hijung Valentina Shin, Bryan Russell, Oliver Wang, Gautham J. Mysore

In video production, inserting B-roll is a widely used technique to enrich the story and make a video more engaging.

Video Editing

Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces

no code implementations ICLR 2019 Senthil Purushwalkam, Abhinav Gupta, Danny M. Kaufman, Bryan Russell

To achieve our results, we introduce the Bounce Dataset comprising 5K RGB-D videos of bouncing trajectories of a foam ball to probe surfaces of varying shapes and materials in everyday scenes including homes and offices.

Finding Moments in Video Collections Using Natural Language

2 code implementations30 Jul 2019 Victor Escorcia, Mattia Soldan, Josef Sivic, Bernard Ghanem, Bryan Russell

We evaluate our approach on two recently proposed datasets for temporal localization of moments in video with natural language (DiDeMo and Charades-STA) extended to our video corpus moment retrieval setting.

Moment Retrieval Re-Ranking +3

Neural Re-Simulation for Generating Bounces in Single Images

no code implementations ICCV 2019 Carlo Innamorati, Bryan Russell, Danny M. Kaufman, and Niloy J. Mitra

We introduce a method to generate videos of dynamic virtual objects plausibly interacting via collisions with a still image's environment.

Telling Left from Right: Learning Spatial Correspondence of Sight and Sound

no code implementations CVPR 2020 Karren Yang, Bryan Russell, Justin Salamon

Self-supervised audio-visual learning aims to capture useful representations of video by leveraging correspondences between visual and audio inputs.

audio-visual learning

Contact and Human Dynamics from Monocular Video

1 code implementation ECCV 2020 Davis Rempe, Leonidas J. Guibas, Aaron Hertzmann, Bryan Russell, Ruben Villegas, Jimei Yang

Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors that violate physical constraints, such as feet penetrating the ground and bodies leaning at extreme angles.

Human Dynamics Pose Estimation

Editing Conditional Radiance Fields

1 code implementation ICCV 2021 Steven Liu, Xiuming Zhang, Zhoutong Zhang, Richard Zhang, Jun-Yan Zhu, Bryan Russell

In this paper, we explore enabling user editing of a category-level NeRF - also known as a conditional radiance field - trained on a shape category.

Novel View Synthesis

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

1 code implementation ICCV 2021 Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell

Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.

Human-Object Interaction Detection Object +2

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations20 Oct 2021 Reuben Tan, Bryan A. Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

Contrastive Feature Loss for Image Prediction

1 code implementation12 Nov 2021 Alex Andonian, Taesung Park, Bryan Russell, Phillip Isola, Jun-Yan Zhu, Richard Zhang

Training supervised image synthesis models requires a critic to compare two images: the ground truth to the result.

Image Generation

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations NeurIPS 2021 Reuben Tan, Bryan Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

Focal Length and Object Pose Estimation via Render and Compare

2 code implementations CVPR 2022 Georgy Ponimatkin, Yann Labbé, Bryan Russell, Mathieu Aubry, Josef Sivic

We introduce FocalPose, a neural render-and-compare method for jointly estimating the camera-object 6D pose and camera focal length given a single RGB input image depicting a known object.

Object Pose Estimation +1

Neural Volumetric Object Selection

no code implementations CVPR 2022 Zhongzheng Ren, Aseem Agarwala, Bryan Russell, Alexander G. Schwing, Oliver Wang

We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF).

Object Segmentation

It's Time for Artistic Correspondence in Music and Video

no code implementations CVPR 2022 Didac Suris, Carl Vondrick, Bryan Russell, Justin Salamon

In order to capture the high-level concepts that are required to solve the task, we propose modeling the long-term temporal context of both the video and the music signals, using Transformer networks for each modality.

Retrieval

Monocular Dynamic View Synthesis: A Reality Check

1 code implementation24 Oct 2022 Hang Gao, RuiLong Li, Shubham Tulsiani, Bryan Russell, Angjoo Kanazawa

We study the recent progress on dynamic view synthesis (DVS) from monocular video.

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

no code implementations CVPR 2023 Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko

We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.

Audio Source Separation Natural Language Queries

Conditional Generation of Audio from Video via Foley Analogies

1 code implementation CVPR 2023 Yuexi Du, Ziyang Chen, Justin Salamon, Bryan Russell, Andrew Owens

Second, we propose a model for generating a soundtrack for a silent input video, given a user-supplied example that specifies what the video should "sound like".

Language-Guided Music Recommendation for Video via Prompt Analogies

no code implementations CVPR 2023 Daniel McKee, Justin Salamon, Josef Sivic, Bryan Russell

A key challenge of this problem setting is that existing music video datasets provide the needed (video, music) training pairs, but lack text descriptions of the music.

Language Modelling Music Recommendation +1

FocalPose++: Focal Length and Object Pose Estimation via Render and Compare

1 code implementation15 Nov 2023 Martin Cífka, Georgy Ponimatkin, Yann Labbé, Bryan Russell, Mathieu Aubry, Vladimir Petrik, Josef Sivic

We introduce FocalPose++, a neural render-and-compare method for jointly estimating the camera-object 6D pose and camera focal length given a single RGB input image depicting a known object.

Object Pose Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.