Search Results for author: Bryan Russell

Found 37 papers, 15 papers with code

Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval

no code implementations • 6 May 2024 • Jiacheng Cheng, Hijung Valentina Shin, Nuno Vasconcelos, Bryan Russell, Fabian Caba Heilbron

In this work, we consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries.

Image Retrieval Language Modelling +5

Paper
Add Code

Koala: Key frame-conditioned long video-LLM

no code implementations • 5 Apr 2024 • Reuben Tan, Ximeng Sun, Ping Hu, Jui-Hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko

Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships.

Action Recognition Question Answering +2

Paper
Add Code

Customizing Motion in Text-to-Video Diffusion Models

no code implementations • 7 Dec 2023 • Joanna Materzynska, Josef Sivic, Eli Shechtman, Antonio Torralba, Richard Zhang, Bryan Russell

To avoid overfitting to the new custom motion, we introduce an approach for regularization over videos.

Text-to-Video Generation Video Generation

Paper
Add Code

FocalPose++: Focal Length and Object Pose Estimation via Render and Compare

1 code implementation • 15 Nov 2023 • Martin Cífka, Georgy Ponimatkin, Yann Labbé, Bryan Russell, Mathieu Aubry, Vladimir Petrik, Josef Sivic

We introduce FocalPose++, a neural render-and-compare method for jointly estimating the camera-object 6D pose and camera focal length given a single RGB input image depicting a known object.

Object Pose Estimation

Paper
Code

Meta-Personalizing Vision-Language Models to Find Named Instances in Video

1 code implementation • CVPR 2023 • Chun-Hsiao Yeh, Bryan Russell, Josef Sivic, Fabian Caba Heilbron, Simon Jenni

Large-scale vision-language models (VLM) have shown impressive results for language-guided search applications.

Retrieval Word Embeddings

Paper
Code

Language-Guided Music Recommendation for Video via Prompt Analogies

no code implementations • CVPR 2023 • Daniel McKee, Justin Salamon, Josef Sivic, Bryan Russell

A key challenge of this problem setting is that existing music video datasets provide the needed (video, music) training pairs, but lack text descriptions of the music.

4k Language Modelling +2

Paper
Add Code

Conditional Generation of Audio from Video via Foley Analogies

1 code implementation • CVPR 2023 • Yuexi Du, Ziyang Chen, Justin Salamon, Bryan Russell, Andrew Owens

Second, we propose a model for generating a soundtrack for a silent input video, given a user-supplied example that specifies what the video should "sound like".

Paper
Code

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

no code implementations • CVPR 2023 • Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko

We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.

Audio Source Separation Natural Language Queries

Paper
Add Code

Monocular Dynamic View Synthesis: A Reality Check

1 code implementation • 24 Oct 2022 • Hang Gao, RuiLong Li, Shubham Tulsiani, Bryan Russell, Angjoo Kanazawa

We study the recent progress on dynamic view synthesis (DVS) from monocular video.

171

Paper
Code

It's Time for Artistic Correspondence in Music and Video

no code implementations • CVPR 2022 • Didac Suris, Carl Vondrick, Bryan Russell, Justin Salamon

In order to capture the high-level concepts that are required to solve the task, we propose modeling the long-term temporal context of both the video and the music signals, using Transformer networks for each modality.

Retrieval

Paper
Add Code

Neural Volumetric Object Selection

no code implementations • CVPR 2022 • Zhongzheng Ren, Aseem Agarwala, Bryan Russell, Alexander G. Schwing, Oliver Wang

We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF).

Object Segmentation

Paper
Add Code

Focal Length and Object Pose Estimation via Render and Compare

2 code implementations • CVPR 2022 • Georgy Ponimatkin, Yann Labbé, Bryan Russell, Mathieu Aubry, Josef Sivic

We introduce FocalPose, a neural render-and-compare method for jointly estimating the camera-object 6D pose and camera focal length given a single RGB input image depicting a known object.

Object Pose Estimation +1

Paper
Code

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations • NeurIPS 2021 • Reuben Tan, Bryan Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

Paper
Add Code

Contrastive Feature Loss for Image Prediction

1 code implementation • 12 Nov 2021 • Alex Andonian, Taesung Park, Bryan Russell, Phillip Isola, Jun-Yan Zhu, Richard Zhang

Training supervised image synthesis models requires a critic to compare two images: the ground truth to the result.

Image Generation

Paper
Code

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

no code implementations • 20 Oct 2021 • Reuben Tan, Bryan A. Plummer, Kate Saenko, Hailin Jin, Bryan Russell

Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations.

Paper
Add Code

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

1 code implementation • ICCV 2021 • Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell

Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.

Human-Object Interaction Detection Object +2

Paper
Code

Editing Conditional Radiance Fields

1 code implementation • ICCV 2021 • Steven Liu, Xiuming Zhang, Zhoutong Zhang, Richard Zhang, Jun-Yan Zhu, Bryan Russell

In this paper, we explore enabling user editing of a category-level NeRF - also known as a conditional radiance field - trained on a shape category.

Ranked #1 on Novel View Synthesis on PhotoShape

Novel View Synthesis

254

Paper
Code

Contact and Human Dynamics from Monocular Video

1 code implementation • ECCV 2020 • Davis Rempe, Leonidas J. Guibas, Aaron Hertzmann, Bryan Russell, Ruben Villegas, Jimei Yang

Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors that violate physical constraints, such as feet penetrating the ground and bodies leaning at extreme angles.

Human Dynamics Pose Estimation

264

Paper
Code

Telling Left from Right: Learning Spatial Correspondence of Sight and Sound

no code implementations • CVPR 2020 • Karren Yang, Bryan Russell, Justin Salamon

Self-supervised audio-visual learning aims to capture useful representations of video by leveraging correspondences between visual and audio inputs.

audio-visual learning

Paper
Add Code

FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images

no code implementations • ICCV 2019 • Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus, Thomas Brox

We show that methods trained on our dataset consistently perform well when tested on other datasets.

Ranked #8 on 3D Hand Pose Estimation on FreiHAND

3D Hand Pose Estimation

Paper
Add Code

Neural Re-Simulation for Generating Bounces in Single Images

no code implementations • ICCV 2019 • Carlo Innamorati, Bryan Russell, Danny M. Kaufman, and Niloy J. Mitra

We introduce a method to generate videos of dynamic virtual objects plausibly interacting via collisions with a still image's environment.

Paper
Add Code

Finding Moments in Video Collections Using Natural Language

2 code implementations • 30 Jul 2019 • Victor Escorcia, Mattia Soldan, Josef Sivic, Bernard Ghanem, Bryan Russell

We evaluate our approach on two recently proposed datasets for temporal localization of moments in video with natural language (DiDeMo and Charades-STA) extended to our video corpus moment retrieval setting.

Moment Retrieval Re-Ranking +3

149

Paper
Code

Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces

no code implementations • ICLR 2019 • Senthil Purushwalkam, Abhinav Gupta, Danny M. Kaufman, Bryan Russell

To achieve our results, we introduce the Bounce Dataset comprising 5K RGB-D videos of bouncing trajectories of a foam ball to probe surfaces of varying shapes and materials in everyday scenes including homes and offices.

Paper
Add Code

B-Script: Transcript-based B-roll Video Editing with Recommendations

no code implementations • 28 Feb 2019 • Bernd Huber, Hijung Valentina Shin, Bryan Russell, Oliver Wang, Gautham J. Mysore

In video production, inserting B-roll is a widely used technique to enrich the story and make a video more engaging.

Video Editing

Paper
Add Code

Localizing Moments in Video with Temporal Language

1 code implementation • EMNLP 2018 • Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

To benchmark whether our model, and other recent video localization models, can effectively reason about temporal language, we collect the novel TEMPOral reasoning in video and language (TEMPO) dataset.

Natural Language Queries Retrieval +1

Paper
Code

BodyNet: Volumetric Inference of 3D Human Body Shapes

2 code implementations • ECCV 2018 • Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, Cordelia Schmid

Human shape estimation is an important task for video editing, animation and fashion industry.

Ranked #3 on 3D Human Pose Estimation on Surreal (using extra training data)

3D Human Pose Estimation Segmentation +1

261

Paper
Code

Learning Visual Importance for Graphic Designs and Data Visualizations

1 code implementation • 8 Aug 2017 • Zoya Bylinskii, Nam Wook Kim, Peter O'Donovan, Sami Alsheikh, Spandan Madan, Hanspeter Pfister, Fredo Durand, Bryan Russell, Aaron Hertzmann

Our models are neural networks trained on human clicks and importance annotations on hundreds of designs.

Retrieval

Paper
Code

Localizing Moments in Video with Natural Language

2 code implementations • ICCV 2017 • Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

A key obstacle to training our MCN model is that current video datasets do not include pairs of localized video segments and referring expressions, or text descriptions which uniquely identify a corresponding moment.

Natural Language Queries

185

Paper
Code

ActionVLAD: Learning spatio-temporal aggregation for action classification

no code implementations • CVPR 2017 • Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell

In this work, we introduce a new video representation for action classification that aggregates local convolutional features across the entire spatio-temporal extent of the video.

Ranked #8 on Long-video Activity Recognition on Breakfast

Action Classification Classification +3

Paper
Add Code

PixelNet: Representation of the pixels, by the pixels, and for the pixels

1 code implementation • 21 Feb 2017 • Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Segmentation +2

Paper
Code

SURGE: Surface Regularized Geometry Estimation from a Single Image

no code implementations • NeurIPS 2016 • Peng Wang, Xiaohui Shen, Bryan Russell, Scott Cohen, Brian Price, Alan L. Yuille

This paper introduces an approach to regularize 2. 5D surface normal and depth predictions at each pixel given a single input image.

Paper
Add Code

PixelNet: Towards a General Pixel-level Architecture

no code implementations • 21 Sep 2016 • Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Semantic Segmentation +1

Paper
Add Code

Marr Revisited: 2D-3D Alignment via Surface Normal Prediction

no code implementations • CVPR 2016 • Aayush Bansal, Bryan Russell, Abhinav Gupta

We introduce an approach that leverages surface normal predictions, along with appearance cues, to retrieve 3D models for objects depicted in 2D still images from a large CAD object library.

Object Pose Prediction +1

Paper
Add Code

Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views

no code implementations • CVPR 2016 • Francisco Massa, Bryan Russell, Mathieu Aubry

This paper presents an end-to-end convolutional neural network (CNN) for 2D-3D exemplar detection.

Paper
Add Code

Understanding deep features with computer-generated imagery

no code implementations • ICCV 2015 • Mathieu Aubry, Bryan Russell

The rendered images are presented to a trained CNN and responses for different layers are studied with respect to the input scene factors.

Paper
Add Code

Localizing 3D cuboids in single-view images

no code implementations • NeurIPS 2012 • Jianxiong Xiao, Bryan Russell, Antonio Torralba

In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes.

Paper
Add Code

Segmenting Scenes by Matching Image Composites

no code implementations • NeurIPS 2009 • Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes.

Scene Segmentation Segmentation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.