Search Results for author: Anurag Kumar

Found 31 papers, 6 papers with code

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

no code implementations17 Feb 2022 Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.

Speech Enhancement Unsupervised Domain Adaptation

Curriculum optimization for low-resource speech recognition

no code implementations17 Feb 2022 Anastasia Kuznetsova, Anurag Kumar, Jennifer Drexler Fox, Francis Tyers

Modern end-to-end speech recognition models show astonishing results in transcribing audio signals into written text.

Speech Recognition

Continual self-training with bootstrapped remixing for speech enhancement

no code implementations19 Oct 2021 Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar

Specifically, a separation teacher model is pre-trained on an out-of-domain dataset and is used to infer estimated target signals for a batch of in-domain mixtures.

Speech Enhancement Unsupervised Domain Adaptation

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

no code implementations14 Oct 2021 Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf

While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks.

Audio Classification Representation Learning +1

Ego4D: Around the World in 3,000 Hours of Egocentric Video

no code implementations13 Oct 2021 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

1 code implementation NeurIPS 2021 Pranay Manocha, Buye Xu, Anurag Kumar

We show that neural networks trained using our framework produce scores that correlate well with subjective mean opinion scores (MOS) and are also competitive to methods such as DNSMOS, which explicitly relies on MOS from humans for training networks.

Speech Enhancement

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

no code implementations11 Sep 2021 Yangyang Xia, Buye Xu, Anurag Kumar

Supervised speech enhancement relies on parallel databases of degraded speech signals and their clean reference signals during training.

Speech Enhancement

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

no code implementations25 Jun 2021 Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar

Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0. 8dB for monaural inputs and 0. 3dB for binaural inputs while reaching a real-time factor of 0. 65.

Speaker Separation

DPLM: A Deep Perceptual Spatial-Audio Localization Metric

no code implementations29 May 2021 Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality.

Decentralized, Hybrid MAC Design with Reduced State Information Exchange for Low-Delay IoT Applications

no code implementations24 May 2021 Avinash Mohan, Arpan Chattopadhyay, Shivam Vinayak Vatsa, Anurag Kumar

The theory developed to reduce delay is also shown to work %with different traffic types (batch arrivals, for example) and even in the presence of transmission errors and fast fading.

Fairness

A bandit approach to curriculum generation for automatic speech recognition

no code implementations6 Feb 2021 Anastasia Kuznetsova, Anurag Kumar, Francis M. Tyers

The Automated Speech Recognition (ASR) task has been a challenging domain especially for low data scenarios with few audio examples.

Automatic Speech Recognition reinforcement-learning +1

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

1 code implementation2 Sep 2020 Ke Tan, Buye Xu, Anurag Kumar, Eliya Nachmani, Yossi Adi

In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.

Audio and Speech Processing Sound

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

no code implementations29 May 2020 Haytham M. Fayek, Anurag Kumar

Recognizing sounds is a key aspect of computational audio scene analysis and machine perception.

Audio Classification

Secost: Sequential co-supervision for large scale weakly labeled audio event detection

no code implementations25 Oct 2019 Anurag Kumar, Vamsi Krishna Ithapu

Weakly supervised learning algorithms are critical for scaling audio event detection to several hundreds of sound categories.

Event Detection Knowledge Distillation +1

Learning Sound Events From Webly Labeled Data

1 code implementation25 Nov 2018 Anurag Kumar, Ankit Shah, Alex Hauptmann, Bhiksha Raj

In the last couple of years, weakly labeled learning for sound events has turned out to be an exciting approach for audio event detection.

Event Detection Sound Event Detection +1

A Closer Look at Weak Label Learning for Audio Events

1 code implementation24 Apr 2018 Ankit Shah, Anurag Kumar, Alexander G. Hauptmann, Bhiksha Raj

In this work, we first describe a CNN based approach for weakly supervised training of audio events.

Audio Classification Event Detection +1

Knowledge Transfer from Weakly Labeled Audio using Convolutional Neural Network for Sound Events and Scenes

1 code implementation4 Nov 2017 Anurag Kumar, Maksim Khadkevich, Christian Fugen

In this work we propose approaches to effectively transfer knowledge from weakly labeled web audio data.

Sound Multimedia Audio and Speech Processing

Framework for evaluation of sound event detection in web videos

no code implementations2 Nov 2017 Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj

The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets.

Event Detection Sound Event Detection

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data

no code implementations9 Jul 2017 Anurag Kumar, Bhiksha Raj

We propose that learning algorithms that can exploit weak labels offer an effective method to learn from web data.

Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data

no code implementations12 Nov 2016 Anurag Kumar, Bhiksha Raj

In this paper we propose a novel learning framework called Supervised and Weakly Supervised Learning where the goal is to learn simultaneously from weakly and strongly labeled data.

Scene Recognition

Discovering Sound Concepts and Acoustic Relations In Text

no code implementations23 Sep 2016 Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole

In this paper we describe approaches for discovering acoustic concepts and relations in text.

Dependency Parsing

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations20 Sep 2016 Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

Features and Kernels for Audio Event Recognition

no code implementations19 Jul 2016 Anurag Kumar, Bhiksha Raj

One of the most important problems in audio event detection research is absence of benchmark results for comparison with any proposed method.

Sound Multimedia

Classifier Risk Estimation under Limited Labeling Resources

no code implementations9 Jul 2016 Anurag Kumar, Bhiksha Raj

In this paper we propose strategies for estimating performance of a classifier when labels cannot be obtained for the whole test set.

Weakly Supervised Scalable Audio Content Analysis

no code implementations12 Jun 2016 Anurag Kumar, Bhiksha Raj

Audio Event Detection is an important task for content analysis of multimedia data.

Event Detection Multiple Instance Learning

Audio Event Detection using Weakly Labeled Data

no code implementations9 May 2016 Anurag Kumar, Bhiksha Raj

This helps in obtaining a complete description of the recording and is notable since temporal information was never known in the first place in weakly labeled data.

Event Detection Multiple Instance Learning

Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks

2 code implementations9 May 2016 Anurag Kumar, Dinei Florencio

In this paper we consider the problem of speech enhancement in real-world like conditions where multiple noises can simultaneously corrupt speech.

Sound

Unsupervised Fusion Weight Learning in Multiple Classifier Systems

no code implementations6 Feb 2015 Anurag Kumar, Bhiksha Raj

We also introduce a novel metric for ranking instances based on an index which depends upon the rank of weighted scores of test points among the weighted scores of training points.

Cannot find the paper you are looking for? You can Submit a new open access paper.