Search Results for author: Changan Chen

Found 20 papers, 8 papers with code

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

no code implementations • 8 Apr 2024 • Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman

We propose a novel self-supervised embedding to learn how actions sound from narrated in-the-wild egocentric videos.

Paper
Add Code

F$^3$Loc: Fusion and Filtering for Floorplan Localization

no code implementations • 5 Mar 2024 • Changan Chen, Rui Wang, Christoph Vogel, Marc Pollefeys

In this paper we propose an efficient data-driven solution to self-localization within a floorplan.

Paper
Add Code

Overview of the L3DAS23 Challenge on Audio-Visual Extended Reality

no code implementations • 14 Feb 2024 • Christian Marinoni, Riccardo Fosco Gramaccioni, Changan Chen, Aurelio Uncini, Danilo Comminiello

The primary goal of the L3DAS23 Signal Processing Grand Challenge at ICASSP 2023 is to promote and support collaborative research on machine learning for 3D audio signal processing, with a specific emphasis on 3D speech enhancement and 3D Sound Event Localization and Detection in Extended Reality applications.

Audio Signal Processing Sound Event Localization and Detection +1

Paper
Add Code

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

no code implementations • 30 Nov 2023 • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

Paper
Add Code

Measuring Acoustics with Collaborative Multiple Agents

no code implementations • 9 Oct 2023 • Yinfeng Yu, Changan Chen, Lele Cao, Fangkai Yang, Fuchun Sun

As humans, we hear sound every second of our life.

Paper
Add Code

Replay: Multi-modal Multi-view Acted Videos for Casual Holography

1 code implementation • ICCV 2023 • Roman Shapovalov, Yanir Kleiman, Ignacio Rocco, David Novotny, Andrea Vedaldi, Changan Chen, Filippos Kokkinos, Ben Graham, Natalia Neverova

We introduce Replay, a collection of multi-view, multi-modal videos of humans interacting socially.

3D Reconstruction Novel View Synthesis

Paper
Code

Novel-View Acoustic Synthesis

no code implementations • CVPR 2023 • Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?

Neural Rendering Novel View Synthesis

Paper
Add Code

Retrospectives on the Embodied AI Workshop

no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

We present a retrospective on the state of Embodied AI research.

Visual Navigation

Paper
Add Code

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

2 code implementations • 16 Jun 2022 • Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman

We introduce SoundSpaces 2. 0, a platform for on-the-fly geometry-based audio rendering for 3D environments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

311

Paper
Code

Few-Shot Audio-Visual Learning of Environment Acoustics

no code implementations • 8 Jun 2022 • Sagnik Majumder, Changan Chen, Ziad Al-Halah, Kristen Grauman

Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics.

audio-visual learning Room Impulse Response (RIR)

Paper
Add Code

Sound Adversarial Audio-Visual Navigation

1 code implementation • ICLR 2022 • Yinfeng Yu, Wenbing Huang, Fuchun Sun, Changan Chen, Yikai Wang, Xiaohong Liu

In this work, we design an acoustically complex environment in which, besides the target sound, there exists a sound attacker playing a zero-sum game with the agent.

Navigate Visual Navigation

Paper
Code

Visual Acoustic Matching

no code implementations • CVPR 2022 • Changan Chen, Ruohan Gao, Paul Calamia, Kristen Grauman

We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment.

Paper
Add Code

Learning Audio-Visual Dereverberation

1 code implementation • 14 Jun 2021 • Changan Chen, Wei Sun, David Harwath, Kristen Grauman

We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed monaural sound and visual scene.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Semantic Audio-Visual Navigation

no code implementations • CVPR 2021 • Changan Chen, Ziad Al-Halah, Kristen Grauman

We propose a transformer-based model to tackle this new semantic AudioGoal task, incorporating an inferred goal descriptor that captures both spatial and semantic properties of the target.

Position Visual Navigation

Paper
Add Code

Learning to Set Waypoints for Audio-Visual Navigation

1 code implementation • ICLR 2021 • Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e. g., a phone ringing in another room).

Visual Navigation

311

Paper
Code

VisualEchoes: Spatial Image Representation Learning through Echolocation

no code implementations • ECCV 2020 • Ruohan Gao, Changan Chen, Ziad Al-Halah, Carl Schissler, Kristen Grauman

Several animal species (e. g., bats, dolphins, and whales) and even visually impaired humans have the remarkable ability to perform echolocation: a biological sonar used to perceive spatial layout and locate objects in the world.

Monocular Depth Estimation Representation Learning +2

Paper
Add Code

SoundSpaces: Audio-Visual Navigation in 3D Environments

2 code implementations • ECCV 2020 • Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment.

Navigate Visual Navigation

311

Paper
Code

Relational Graph Learning for Crowd Navigation

1 code implementation • 28 Sep 2019 • Changan Chen, Sha Hu, Payam Nikdel, Greg Mori, Manolis Savva

We present a relational graph learning approach for robotic crowd navigation using model-based deep reinforcement learning that plans actions by looking into the future.

Graph Learning Reinforcement Learning (RL)

121

Paper
Code

Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning

6 code implementations • 24 Sep 2018 • Changan Chen, Yuejiang Liu, Sven Kreiss, Alexandre Alahi

We propose to (i) rethink pairwise interactions with a self-attention mechanism, and (ii) jointly model Human-Robot as well as Human-Human interactions in the deep reinforcement learning framework.

Human Dynamics Navigate +3

538

Paper
Code

Constraint-Aware Deep Neural Network Compression

no code implementations • ECCV 2018 • Changan Chen, Frederick Tung, Naveen Vedula, Greg Mori

Deep neural network compression has the potential to bring modern resource-hungry deep networks to resource-limited devices.

Bayesian Optimization Neural Network Compression +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.