Search Results for author: Ruohan Gao

Found 25 papers, 10 papers with code

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

no code implementations20 Dec 2023 Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao

We propose a unified multi-modal framework -- Audio-Visual Conversational Attention (AV-CONV), for the joint prediction of conversation behaviors -- speaking and listening -- for both the camera wearer as well as all other social partners present in the egocentric video.

SoundCam: A Dataset for Finding Humans Using Room Acoustics

no code implementations NeurIPS 2023 Mason Wang, Samuel Clarke, Jui-Hsien Wang, Ruohan Gao, Jiajun Wu

A room's acoustic properties are a product of the room's geometry, the objects within the room, and their specific positions.

NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities

no code implementations2 Nov 2023 Ruohan Zhang, Sharon Lee, Minjune Hwang, Ayano Hiranaka, Chen Wang, Wensi Ai, Jin Jie Ryan Tan, Shreya Gupta, Yilun Hao, Gabrael Levine, Ruohan Gao, Anthony Norcia, Li Fei-Fei, Jiajun Wu

We present Neural Signal Operated Intelligent Robots (NOIR), a general-purpose, intelligent brain-robot interface system that enables humans to command robots to perform everyday activities through brain signals.


Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

1 code implementation1 Jun 2023 Ruohan Gao, Hao Li, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio Savarese, Li Fei-Fei, Jiajun Wu

We introduce Sonicverse, a multisensory simulation platform with integrated audio-visual simulation for training household agents that can both see and hear.

Multi-Task Learning Visual Navigation

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

no code implementations CVPR 2023 Ruohan Gao, Yiming Dou, Hao Li, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, Jiajun Wu

We introduce the ObjectFolder Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch.

Benchmarking Object +1

An Extensible Multimodal Multi-task Object Dataset with Materials

no code implementations29 Apr 2023 Trevor Standley, Ruohan Gao, Dawn Chen, Jiajun Wu, Silvio Savarese

For example, we can train a model to predict the object category from the listing text, or the mass and price from the product listing image.

Attribute Multi-Task Learning +1

Differentiable Physics Simulation of Dynamics-Augmented Neural Objects

no code implementations17 Oct 2022 Simon Le Cleac'h, Hong-Xing Yu, Michelle Guo, Taylor A. Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, Mac Schwager

A robot can use this simulation to optimize grasps and manipulation trajectories of neural objects, or to improve the neural object models through gradient-based real-to-simulation transfer.

Friction Object

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

1 code implementation CVPR 2022 Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu

We present ObjectFolder 2. 0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1. 0 in three aspects.


Visual Acoustic Matching

no code implementations CVPR 2022 Changan Chen, Ruohan Gao, Paul Calamia, Kristen Grauman

We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment.

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

no code implementations21 Nov 2021 Rishabh Garg, Ruohan Gao, Kristen Grauman

Binaural audio provides human listeners with an immersive spatial sound experience, but most existing videos lack binaural audio recordings.

Multi-Task Learning Room Impulse Response (RIR)

VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency

1 code implementation CVPR 2021 Ruohan Gao, Kristen Grauman

Given a video, the goal is to extract the speech associated with a face in spite of simultaneous background sounds and/or other human speakers.

Speech Separation

Learning to Set Waypoints for Audio-Visual Navigation

1 code implementation ICLR 2021 Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e. g., a phone ringing in another room).

Visual Navigation

VisualEchoes: Spatial Image Representation Learning through Echolocation

no code implementations ECCV 2020 Ruohan Gao, Changan Chen, Ziad Al-Halah, Carl Schissler, Kristen Grauman

Several animal species (e. g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate objects in the world.

Monocular Depth Estimation Representation Learning +2

Co-Separating Sounds of Visual Objects

3 code implementations ICCV 2019 Ruohan Gao, Kristen Grauman

Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel.

Audio Denoising Audio Source Separation +1

2.5D Visual Sound

2 code implementations CVPR 2019 Ruohan Gao, Kristen Grauman

We devise a deep convolutional neural network that learns to decode the monaural (single-channel) soundtrack into its binaural counterpart by injecting visual information about object and scene configurations.

Learning to Separate Object Sounds by Watching Unlabeled Video

2 code implementations ECCV 2018 Ruohan Gao, Rogerio Feris, Kristen Grauman

Our work is the first to learn audio source separation from large-scale "in the wild" videos containing multiple audio sources per video.

Audio Denoising Audio Source Separation +2

Im2Flow: Motion Hallucination from Static Images for Action Recognition

4 code implementations CVPR 2018 Ruohan Gao, Bo Xiong, Kristen Grauman

Second, we show the power of hallucinated flow for recognition, successfully transferring the learned motion into a standard two-stream network for activity recognition.

Action Recognition Hallucination +2

ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids

no code implementations ECCV 2018 Dinesh Jayaraman, Ruohan Gao, Kristen Grauman

We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation.

Object Object Recognition

On-Demand Learning for Deep Image Restoration

1 code implementation ICCV 2017 Ruohan Gao, Kristen Grauman

While machine learning approaches to image restoration offer great promise, current methods risk training models fixated on performing well only for image corruption of a particular level of difficulty---such as a certain level of noise or blur.

Deblurring Image Deblurring +3

Object-Centric Representation Learning from Unlabeled Videos

no code implementations1 Dec 2016 Ruohan Gao, Dinesh Jayaraman, Kristen Grauman

Compared to existing temporal coherence methods, our idea has the advantage of lightweight preprocessing of the unlabeled video (no tracking required) while still being able to extract object-level regions from which to learn invariances.

Image Classification Object +2

Cannot find the paper you are looking for? You can Submit a new open access paper.