Search Results for author: Vamsi Krishna Ithapu

Found 20 papers, 5 papers with code

Hearing Loss Detection from Facial Expressions in One-on-one Conversations

no code implementations17 Jan 2024 Yufeng Yin, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Stavros Petridis, Yu-Hsiang Wu, Christi Miller

In this work, we build on this idea and introduce the problem of detecting hearing loss from an individual's facial expressions during a conversation.

Representation Learning

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

no code implementations20 Dec 2023 Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao

We propose a unified multi-modal, multi-task framework -- Audio-Visual Conversational Attention (Av-CONV), for the joint prediction of conversation behaviors -- speaking and listening -- for both the camera wearer as well as all other social partners present in the egocentric video.

Egocentric Auditory Attention Localization in Conversations

no code implementations CVPR 2023 Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu

In a noisy conversation environment such as a dinner party, people often exhibit selective auditory attention, or the ability to focus on a particular speaker while tuning out others.

Novel-View Acoustic Synthesis

no code implementations CVPR 2023 Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?

Neural Rendering Novel View Synthesis

Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations

no code implementations CVPR 2023 Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman, Vamsi Krishna Ithapu

Can conversational videos captured from multiple egocentric viewpoints reveal the map of a scene in a cost-efficient way?

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

no code implementations20 Nov 2022 Rodrigo Mira, Buye Xu, Jacob Donley, Anurag Kumar, Stavros Petridis, Vamsi Krishna Ithapu, Maja Pantic

Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements.

Speech Enhancement Speech Synthesis

Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Mapping for Single-channel Speech Enhancement

no code implementations16 Nov 2022 Kuan-Lin Chen, Daniel D. E. Wong, Ke Tan, Buye Xu, Anurag Kumar, Vamsi Krishna Ithapu

During training, our approach augments a model learning complex spectral mapping with a temporary submodel to predict the covariance of the enhancement error at each time-frequency bin.

Speech Enhancement

Towards Improved Room Impulse Response Estimation for Speech Recognition

no code implementations8 Nov 2022 Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Pablo Hoffmann, Dinesh Manocha, Paul Calamia

We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

2 code implementations17 Feb 2022 Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.

Speech Enhancement Unsupervised Domain Adaptation

Deep Impulse Responses: Estimating and Parameterizing Filters with Deep Networks

no code implementations7 Feb 2022 Alexander Richard, Peter Dodds, Vamsi Krishna Ithapu

Impulse response estimation in high noise and in-the-wild settings, with minimal control of the underlying data distributions, is a challenging problem.

Representation Learning

Ego4D: Around the World in 3,000 Hours of Egocentric Video

5 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Filtered Noise Shaping for Time Domain Room Impulse Response Estimation From Reverberant Speech

no code implementations15 Jul 2021 Christian J. Steinmetz, Vamsi Krishna Ithapu, Paul Calamia

Deep learning approaches have emerged that aim to transform an audio signal so that it sounds as if it was recorded in the same room as a reference recording, with applications both in audio post-production and augmented reality.

Room Impulse Response (RIR)

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments

1 code implementation9 Jul 2021 Jacob Donley, Vladimir Tourbabin, Jung-Suk Lee, Mark Broyles, Hao Jiang, Jie Shen, Maja Pantic, Vamsi Krishna Ithapu, Ravish Mehra

In this work, we describe, evaluate and release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer.

Speech Enhancement

Egocentric Pose Estimation from Human Vision Span

no code implementations ICCV 2021 Hao Jiang, Vamsi Krishna Ithapu

Existing approaches either use a narrow field of view front facing camera that barely captures the wearer, or an extruded head-mounted top-down camera for maximal wearer visibility.

Egocentric Pose Estimation Pose Estimation

SoundSpaces: Audio-Visual Navigation in 3D Environments

2 code implementations ECCV 2020 Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment.

Navigate Visual Navigation

Secost: Sequential co-supervision for large scale weakly labeled audio event detection

no code implementations25 Oct 2019 Anurag Kumar, Vamsi Krishna Ithapu

Weakly supervised learning algorithms are critical for scaling audio event detection to several hundreds of sound categories.

Event Detection Knowledge Distillation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.