Search Results for author: Aswin Shanmugam Subramanian

Found 9 papers, 0 papers with code

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

no code implementations7 Mar 2023 Christoph Boeddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux

Since diarization and source separation of meeting data are closely related tasks, we here propose an approach to perform the two objectives jointly.

 Ranked #1 on Speech Recognition on LibriCSS (using extra training data)

Action Detection Activity Detection +1

Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

no code implementations14 Dec 2022 Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux

In this paper, we focus on the cocktail fork problem, which takes a three-pronged approach to source separation by separating an audio mixture such as a movie soundtrack or podcast into the three broad categories of speech, music, and sound effects (SFX - understood to include ambient noise and natural sound events).

Action Detection Activity Detection +4

Reverberation as Supervision for Speech Separation

no code implementations15 Nov 2022 Rohith Aralikatti, Christoph Boeddeker, Gordon Wichern, Aswin Shanmugam Subramanian, Jonathan Le Roux

This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation.

Speech Separation

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

no code implementations16 Feb 2021 Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

In addition to using the prediction error as a metric for evaluating our localization model, we also establish its potency as a frontend with automatic speech recognition (ASR) as the downstream task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

no code implementations30 Oct 2020 Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu

The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions

no code implementations19 Apr 2019 Aswin Shanmugam Subramanian, Xiaofei Wang, Shinji Watanabe, Toru Taniguchi, Dung Tran, Yuya Fujita

This report investigates the ability of E2E ASR from standard close-talk to far-field applications by encompassing entire multichannel speech enhancement and ASR components within the S2S model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

no code implementations27 Mar 2018 Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, Shinji Watanabe

This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kaldi speech recognition toolkit.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Cannot find the paper you are looking for? You can Submit a new open access paper.