Search Results for author: Aswin Shanmugam Subramanian

Found 9 papers, 0 papers with code

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

no code implementations • 7 Mar 2023 • Christoph Boeddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux

Since diarization and source separation of meeting data are closely related tasks, we here propose an approach to perform the two objectives jointly.

Ranked #1 on Speech Recognition on LibriCSS (using extra training data)

Action Detection Activity Detection +1

Paper
Add Code

Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

no code implementations • 14 Dec 2022 • Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux

In this paper, we focus on the cocktail fork problem, which takes a three-pronged approach to source separation by separating an audio mixture such as a movie soundtrack or podcast into the three broad categories of speech, music, and sound effects (SFX - understood to include ambient noise and natural sound events).

Action Detection Activity Detection +4

Paper
Add Code

Reverberation as Supervision for Speech Separation

no code implementations • 15 Nov 2022 • Rohith Aralikatti, Christoph Boeddeker, Gordon Wichern, Aswin Shanmugam Subramanian, Jonathan Le Roux

This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation.

Speech Separation

Paper
Add Code

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

no code implementations • 9 Oct 2021 • Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-Yi Lee, Shinji Watanabe

We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

no code implementations • 16 Feb 2021 • Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

In addition to using the prediction error as a metric for evaluating our localization model, we also establish its potency as a frontend with automatic speech recognition (ASR) as the downstream task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

no code implementations • 30 Oct 2020 • Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu

The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

no code implementations • 20 Apr 2020 • Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant

Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6).

speaker-diarization Speaker Diarization +4

Paper
Add Code

An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions

no code implementations • 19 Apr 2019 • Aswin Shanmugam Subramanian, Xiaofei Wang, Shinji Watanabe, Toru Taniguchi, Dung Tran, Yuya Fujita

This report investigates the ability of E2E ASR from standard close-talk to far-field applications by encompassing entire multichannel speech enhancement and ASR components within the S2S model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

no code implementations • 27 Mar 2018 • Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, Shinji Watanabe

This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kaldi speech recognition toolkit.

Ranked #2 on Noisy Speech Recognition on CHiME real

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.