Search Results for author: Anurag Kumar

Found 49 papers, 15 papers with code

Few Shot Class Incremental Learning using Vision-Language models

no code implementations • 2 May 2024 • Anurag Kumar, Chinmay Bharti, Saikat Dutta, Srikrishna Karanam, Biplab Banerjee

Our proposed framework not only empowers the model to embrace novel classes with limited data, but also ensures the preservation of performance on base classes.

Few-Shot Class-Incremental Learning Incremental Learning +1

Paper
Add Code

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

no code implementations • 27 Mar 2024 • Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard

The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms.

Few-Shot Learning Pose Tracking +1

Paper
Add Code

A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

no code implementations • 3 Mar 2024 • Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others.

Automatic Speech Recognition Keyword Spotting +5

Paper
Add Code

Ambisonics Networks -- The Effect Of Radial Functions Regularization

no code implementations • 29 Feb 2024 • Bar Shaybet, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely

Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field.

Paper
Add Code

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

1 code implementation • 27 Oct 2023 • Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

TorchAudio is an open-source audio and speech processing library built for PyTorch.

Self-Supervised Learning Speech Enhancement +2

2,397

Paper
Code

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

no code implementations • 27 Sep 2023 • Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment.

Room Impulse Response (RIR)

Paper
Add Code

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

no code implementations • 31 Jul 2023 • Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu

We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner.

Paper
Add Code

TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio

no code implementations • 4 Apr 2023 • Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu

To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed.

Paper
Add Code

Egocentric Audio-Visual Object Localization

1 code implementation • CVPR 2023 • Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even within a short duration; 2) The out-of-view sound components can be created while wearers shift their attention.

Object Object Localization

Paper
Code

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

2 code implementations • 16 Feb 2023 • Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We propose an objective for perceptual quality based on temporal acoustic parameters.

Speaker Recognition Speech Enhancement

Paper
Code

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

2 code implementations • 16 Feb 2023 • Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We can add this criterion as an auxiliary loss to any model that produces speech, to optimize speech outputs to match the values of clean speech in these features.

Speech Enhancement Time Series +1

Paper
Code

Rethinking complex-valued deep neural networks for monaural speech enhancement

no code implementations • 11 Jan 2023 • Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong

By comparing complex- and real-valued versions of fundamental building blocks in the recently developed gated convolutional recurrent network (GCRN), we show how different mechanisms for basic blocks affect the performance.

Open-Ended Question Answering Speech Enhancement

Paper
Add Code

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

no code implementations • 20 Nov 2022 • Rodrigo Mira, Buye Xu, Jacob Donley, Anurag Kumar, Stavros Petridis, Vamsi Krishna Ithapu, Maja Pantic

Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements.

Speech Enhancement Speech Synthesis

Paper
Add Code

Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Mapping for Single-channel Speech Enhancement

no code implementations • 16 Nov 2022 • Kuan-Lin Chen, Daniel D. E. Wong, Ke Tan, Buye Xu, Anurag Kumar, Vamsi Krishna Ithapu

During training, our approach augments a model learning complex spectral mapping with a temporary submodel to predict the covariance of the enhancement error at each time-frequency bin.

Speech Enhancement

Paper
Add Code

Improving Speech Enhancement through Fine-Grained Speech Characteristics

1 code implementation • 1 Jul 2022 • Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We first identify key acoustic parameters that have been found to correlate well with voice quality (e. g. jitter, shimmer, and spectral flux) and then propose objective functions which are aimed at reducing the difference between clean speech and enhanced speech with respect to these features.

Speech Enhancement

Paper
Code

SAQAM: Spatial Audio Quality Assessment Metric

no code implementations • 24 Jun 2022 • Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

Audio quality assessment is critical for assessing the perceptual realism of sounds.

Multi-Task Learning Speech Enhancement

Paper
Add Code

Speech Quality Assessment through MOS using Non-Matching References

1 code implementation • 24 Jun 2022 • Pranay Manocha, Anurag Kumar

Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way to assess the quality of speech signals.

Self-Supervised Learning

Paper
Code

Curriculum optimization for low-resource speech recognition

no code implementations • 17 Feb 2022 • Anastasia Kuznetsova, Anurag Kumar, Jennifer Drexler Fox, Francis Tyers

Modern end-to-end speech recognition models show astonishing results in transcribing audio signals into written text.

speech-recognition Speech Recognition

Paper
Add Code

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

2 code implementations • 17 Feb 2022 • Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.

Ranked #4 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge

Speech Enhancement Unsupervised Domain Adaptation

Paper
Code

The impact of removing head movements on audio-visual speech enhancement

no code implementations • 1 Feb 2022 • Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley, Anurag Kumar

This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE).

Speech Enhancement

Paper
Add Code

Continual self-training with bootstrapped remixing for speech enhancement

1 code implementation • 19 Oct 2021 • Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar

Specifically, a separation teacher model is pre-trained on an out-of-domain dataset and is used to infer estimated target signals for a batch of in-domain mixtures.

Ranked #13 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge

Speech Enhancement Unsupervised Domain Adaptation

Paper
Code

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

no code implementations • 14 Oct 2021 • Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf

While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks.

Ranked #5 on Audio Classification on Balanced Audio Set

Audio Classification Representation Learning +1

Paper
Add Code

Ego4D: Around the World in 3,000 Hours of Egocentric Video

7 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

5,211

Paper
Code

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

1 code implementation • NeurIPS 2021 • Pranay Manocha, Buye Xu, Anurag Kumar

We show that neural networks trained using our framework produce scores that correlate well with subjective mean opinion scores (MOS) and are also competitive to methods such as DNSMOS, which explicitly relies on MOS from humans for training networks.

Speech Enhancement

Paper
Code

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

no code implementations • 11 Sep 2021 • Yangyang Xia, Buye Xu, Anurag Kumar

Supervised speech enhancement relies on parallel databases of degraded speech signals and their clean reference signals during training.

Speech Enhancement

Paper
Add Code

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

no code implementations • 25 Jun 2021 • Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar

Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0. 8dB for monaural inputs and 0. 3dB for binaural inputs while reaching a real-time factor of 0. 65.

blind source separation Speaker Separation

Paper
Add Code

Do sound event representations generalize to other audio tasks? A case study in audio transfer learning

no code implementations • 21 Jun 2021 • Anurag Kumar, Yun Wang, Vamsi Krishna Ithapu, Christian Fuegen

We also provide insights into the attributes of sound event representations that enable such efficient information transfer.

Event Detection Sound Event Detection +1

Paper
Add Code

DPLM: A Deep Perceptual Spatial-Audio Localization Metric

no code implementations • 29 May 2021 • Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality.

Audio Synthesis

Paper
Add Code

A Low-Delay MAC for IoT Applications: Decentralized Optimal Scheduling of Queues without Explicit State Information Sharing

no code implementations • 24 May 2021 • Avinash Mohan, Arpan Chattopadhyay, Shivam Vinayak Vatsa, Anurag Kumar

Limiting the policy to this class reduces the problem to obtaining a queue switching policy at queue emptiness instants.

Fairness Scheduling

Paper
Add Code

A bandit approach to curriculum generation for automatic speech recognition

no code implementations • 6 Feb 2021 • Anastasia Kuznetsova, Anurag Kumar, Francis M. Tyers

The Automated Speech Recognition (ASR) task has been a challenging domain especially for low data scenarios with few audio examples.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

1 code implementation • 2 Sep 2020 • Ke Tan, Buye Xu, Anurag Kumar, Eliya Nachmani, Yossi Adi

In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.

Audio and Speech Processing Sound

Paper
Code

A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition

no code implementations • ICML 2020 • Anurag Kumar, Vamsi Krishna Ithapu

An important problem in machine auditory perception is to recognize and detect sound events.

Ranked #35 on Audio Classification on AudioSet

Audio Classification Transfer Learning

Paper
Add Code

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

no code implementations • 29 May 2020 • Haytham M. Fayek, Anurag Kumar

Recognizing sounds is a key aspect of computational audio scene analysis and machine perception.

Ranked #26 on Audio Classification on AudioSet

Audio Classification

Paper
Add Code

Secost: Sequential co-supervision for large scale weakly labeled audio event detection

no code implementations • 25 Oct 2019 • Anurag Kumar, Vamsi Krishna Ithapu

Weakly supervised learning algorithms are critical for scaling audio event detection to several hundreds of sound categories.

Event Detection Knowledge Distillation +2

Paper
Add Code

Learning Sound Events From Webly Labeled Data

1 code implementation • 28th International Joint Conference on Artificial Intelligence 2019 • Anurag Kumar, Ankit Shah, Alex Hauptmann, Bhiksha Raj

In the last couple of years, weakly labeled learning for sound events has turned out to be an exciting approach for audio event detection.

Event Detection Sound Event Detection +1

Paper
Code

A Closer Look at Weak Label Learning for Audio Events

1 code implementation • 24 Apr 2018 • Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj

In this work, we first describe a CNN based approach for weakly supervised training of audio events.

Audio Classification Event Detection +2

Paper
Code

NELS-Never-Ending Learner of Sounds

no code implementations • NIPS Workshop on Machine Learning for Audio 2018 • Benjamin Elizalde, Rohan Badlani, Ankit Shah, Anurag Kumar, and Bhiksha Raj.

Sounds are essential to how humans perceive and interact with the world.

Retrieval

Paper
Add Code

Knowledge Transfer from Weakly Labeled Audio using Convolutional Neural Network for Sound Events and Scenes

1 code implementation • 4 Nov 2017 • Anurag Kumar, Maksim Khadkevich, Christian Fugen

In this work we propose approaches to effectively transfer knowledge from weakly labeled web audio data.

Sound Multimedia Audio and Speech Processing

Paper
Code

Framework for evaluation of sound event detection in web videos

no code implementations • 2 Nov 2017 • Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj

The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets.

Event Detection Sound Event Detection

Paper
Add Code

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data

no code implementations • 9 Jul 2017 • Anurag Kumar, Bhiksha Raj

We propose that learning algorithms that can exploit weak labels offer an effective method to learn from web data.

Paper
Add Code

Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data

no code implementations • 12 Nov 2016 • Anurag Kumar, Bhiksha Raj

In this paper we propose a novel learning framework called Supervised and Weakly Supervised Learning where the goal is to learn simultaneously from weakly and strongly labeled data.

Scene Recognition Weakly-supervised Learning

Paper
Add Code

Discovering Sound Concepts and Acoustic Relations In Text

no code implementations • 23 Sep 2016 • Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole

In this paper we describe approaches for discovering acoustic concepts and relations in text.

Dependency Parsing

Paper
Add Code

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations • 20 Sep 2016 • Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

Paper
Add Code

Features and Kernels for Audio Event Recognition

no code implementations • 19 Jul 2016 • Anurag Kumar, Bhiksha Raj

One of the most important problems in audio event detection research is absence of benchmark results for comparison with any proposed method.

Sound Multimedia

Paper
Add Code

Classifier Risk Estimation under Limited Labeling Resources

no code implementations • 9 Jul 2016 • Anurag Kumar, Bhiksha Raj

In this paper we propose strategies for estimating performance of a classifier when labels cannot be obtained for the whole test set.

Paper
Add Code

Weakly Supervised Scalable Audio Content Analysis

no code implementations • 12 Jun 2016 • Anurag Kumar, Bhiksha Raj

Audio Event Detection is an important task for content analysis of multimedia data.

Event Detection Multiple Instance Learning +1

Paper
Add Code

Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks

2 code implementations • 9 May 2016 • Anurag Kumar, Dinei Florencio

In this paper we consider the problem of speech enhancement in real-world like conditions where multiple noises can simultaneously corrupt speech.

Sound

Paper
Code

Audio Event Detection using Weakly Labeled Data

no code implementations • 9 May 2016 • Anurag Kumar, Bhiksha Raj

This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.

Event Detection Multiple Instance Learning

Paper
Add Code

Unsupervised Fusion Weight Learning in Multiple Classifier Systems

no code implementations • 6 Feb 2015 • Anurag Kumar, Bhiksha Raj

We also introduce a novel metric for ranking instances based on an index which depends upon the rank of weighted scores of test points among the weighted scores of training points.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.