Search Results for author: Anurag Kumar

Found 47 papers, 15 papers with code

A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

no code implementations3 Mar 2024 Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others.

Automatic Speech Recognition Keyword Spotting +5

Ambisonics Networks -- The Effect Of Radial Functions Regularization

no code implementations29 Feb 2024 Bar Shaybet, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely

Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field.

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

no code implementations27 Sep 2023 Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment.

Room Impulse Response (RIR)

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

no code implementations31 Jul 2023 Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu

We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner.

TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio

no code implementations4 Apr 2023 Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu

To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed.

Egocentric Audio-Visual Object Localization

1 code implementation CVPR 2023 Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even within a short duration; 2) The out-of-view sound components can be created while wearers shift their attention.

Object Object Localization

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

2 code implementations16 Feb 2023 Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We can add this criterion as an auxiliary loss to any model that produces speech, to optimize speech outputs to match the values of clean speech in these features.

Speech Enhancement Time Series +1

Rethinking complex-valued deep neural networks for monaural speech enhancement

no code implementations11 Jan 2023 Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong

By comparing complex- and real-valued versions of fundamental building blocks in the recently developed gated convolutional recurrent network (GCRN), we show how different mechanisms for basic blocks affect the performance.

Open-Ended Question Answering Speech Enhancement

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

no code implementations20 Nov 2022 Rodrigo Mira, Buye Xu, Jacob Donley, Anurag Kumar, Stavros Petridis, Vamsi Krishna Ithapu, Maja Pantic

Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements.

Speech Enhancement Speech Synthesis

Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Mapping for Single-channel Speech Enhancement

no code implementations16 Nov 2022 Kuan-Lin Chen, Daniel D. E. Wong, Ke Tan, Buye Xu, Anurag Kumar, Vamsi Krishna Ithapu

During training, our approach augments a model learning complex spectral mapping with a temporary submodel to predict the covariance of the enhancement error at each time-frequency bin.

Speech Enhancement

Improving Speech Enhancement through Fine-Grained Speech Characteristics

1 code implementation1 Jul 2022 Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We first identify key acoustic parameters that have been found to correlate well with voice quality (e. g. jitter, shimmer, and spectral flux) and then propose objective functions which are aimed at reducing the difference between clean speech and enhanced speech with respect to these features.

Speech Enhancement

Speech Quality Assessment through MOS using Non-Matching References

1 code implementation24 Jun 2022 Pranay Manocha, Anurag Kumar

Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way to assess the quality of speech signals.

Self-Supervised Learning

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

2 code implementations17 Feb 2022 Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.

Speech Enhancement Unsupervised Domain Adaptation

Curriculum optimization for low-resource speech recognition

no code implementations17 Feb 2022 Anastasia Kuznetsova, Anurag Kumar, Jennifer Drexler Fox, Francis Tyers

Modern end-to-end speech recognition models show astonishing results in transcribing audio signals into written text.

speech-recognition Speech Recognition

Continual self-training with bootstrapped remixing for speech enhancement

1 code implementation19 Oct 2021 Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar

Specifically, a separation teacher model is pre-trained on an out-of-domain dataset and is used to infer estimated target signals for a batch of in-domain mixtures.

Speech Enhancement Unsupervised Domain Adaptation

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

no code implementations14 Oct 2021 Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf

While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks.

Audio Classification Representation Learning +1

Ego4D: Around the World in 3,000 Hours of Egocentric Video

5 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

1 code implementation NeurIPS 2021 Pranay Manocha, Buye Xu, Anurag Kumar

We show that neural networks trained using our framework produce scores that correlate well with subjective mean opinion scores (MOS) and are also competitive to methods such as DNSMOS, which explicitly relies on MOS from humans for training networks.

Speech Enhancement

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

no code implementations11 Sep 2021 Yangyang Xia, Buye Xu, Anurag Kumar

Supervised speech enhancement relies on parallel databases of degraded speech signals and their clean reference signals during training.

Speech Enhancement

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

no code implementations25 Jun 2021 Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar

Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0. 8dB for monaural inputs and 0. 3dB for binaural inputs while reaching a real-time factor of 0. 65.

blind source separation Speaker Separation

DPLM: A Deep Perceptual Spatial-Audio Localization Metric

no code implementations29 May 2021 Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality.

Audio Synthesis

A bandit approach to curriculum generation for automatic speech recognition

no code implementations6 Feb 2021 Anastasia Kuznetsova, Anurag Kumar, Francis M. Tyers

The Automated Speech Recognition (ASR) task has been a challenging domain especially for low data scenarios with few audio examples.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

1 code implementation2 Sep 2020 Ke Tan, Buye Xu, Anurag Kumar, Eliya Nachmani, Yossi Adi

In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.

Audio and Speech Processing Sound

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

no code implementations29 May 2020 Haytham M. Fayek, Anurag Kumar

Recognizing sounds is a key aspect of computational audio scene analysis and machine perception.

Audio Classification

Secost: Sequential co-supervision for large scale weakly labeled audio event detection

no code implementations25 Oct 2019 Anurag Kumar, Vamsi Krishna Ithapu

Weakly supervised learning algorithms are critical for scaling audio event detection to several hundreds of sound categories.

Event Detection Knowledge Distillation +2

A Closer Look at Weak Label Learning for Audio Events

1 code implementation24 Apr 2018 Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj

In this work, we first describe a CNN based approach for weakly supervised training of audio events.

Audio Classification Event Detection +2

Knowledge Transfer from Weakly Labeled Audio using Convolutional Neural Network for Sound Events and Scenes

1 code implementation4 Nov 2017 Anurag Kumar, Maksim Khadkevich, Christian Fugen

In this work we propose approaches to effectively transfer knowledge from weakly labeled web audio data.

Sound Multimedia Audio and Speech Processing

Framework for evaluation of sound event detection in web videos

no code implementations2 Nov 2017 Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj

The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets.

Event Detection Sound Event Detection

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data

no code implementations9 Jul 2017 Anurag Kumar, Bhiksha Raj

We propose that learning algorithms that can exploit weak labels offer an effective method to learn from web data.

Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data

no code implementations12 Nov 2016 Anurag Kumar, Bhiksha Raj

In this paper we propose a novel learning framework called Supervised and Weakly Supervised Learning where the goal is to learn simultaneously from weakly and strongly labeled data.

Scene Recognition Weakly-supervised Learning

Discovering Sound Concepts and Acoustic Relations In Text

no code implementations23 Sep 2016 Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole

In this paper we describe approaches for discovering acoustic concepts and relations in text.

Dependency Parsing

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations20 Sep 2016 Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

Features and Kernels for Audio Event Recognition

no code implementations19 Jul 2016 Anurag Kumar, Bhiksha Raj

One of the most important problems in audio event detection research is absence of benchmark results for comparison with any proposed method.

Sound Multimedia

Classifier Risk Estimation under Limited Labeling Resources

no code implementations9 Jul 2016 Anurag Kumar, Bhiksha Raj

In this paper we propose strategies for estimating performance of a classifier when labels cannot be obtained for the whole test set.

Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks

2 code implementations9 May 2016 Anurag Kumar, Dinei Florencio

In this paper we consider the problem of speech enhancement in real-world like conditions where multiple noises can simultaneously corrupt speech.

Sound

Audio Event Detection using Weakly Labeled Data

no code implementations9 May 2016 Anurag Kumar, Bhiksha Raj

This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.

Event Detection Multiple Instance Learning

Unsupervised Fusion Weight Learning in Multiple Classifier Systems

no code implementations6 Feb 2015 Anurag Kumar, Bhiksha Raj

We also introduce a novel metric for ranking instances based on an index which depends upon the rank of weighted scores of test points among the weighted scores of training points.

Cannot find the paper you are looking for? You can Submit a new open access paper.