no code implementations • 25 Oct 2018 • Yusuke Kida, Dung Tran, Motoi Omachi, Toru Taniguchi, Yuya Fujita
The proposed method firstly utilizes a DNN-based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remaining background speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 19 Apr 2019 • Aswin Shanmugam Subramanian, Xiaofei Wang, Shinji Watanabe, Toru Taniguchi, Dung Tran, Yuya Fujita
This report investigates the ability of E2E ASR from standard close-talk to far-field applications by encompassing entire multichannel speech enhancement and ASR components within the S2S model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 8 Dec 2021 • Trung Dang, Dung Tran, Peter Chin, Kazuhito Koishida
Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data.
no code implementations • 21 Dec 2021 • Melikasadat Emami, Dung Tran, Kazuhito Koishida
Improving generalization is a major challenge in audio classification due to labeled data scarcity.
no code implementations • 5 Jan 2022 • Hieu Le, Hans Walker, Dung Tran, Peter Chin
Although deep neural networks have achieved great performance on classification tasks, recent studies showed that well trained networks can be fooled by adding subtle noises.
1 code implementation • 19 Sep 2023 • Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi
Diffusion models power a vast majority of text-to-audio (TTA) generation methods.
Ranked #10 on Audio Generation on AudioCaps
no code implementations • 13 Feb 2024 • Chih-Yu Lai, Dung Tran, Kazuhito Koishida
Learned image compression has gained widespread popularity for their efficiency in achieving ultra-low bit-rates.
1 code implementation • 14 Mar 2024 • Afrina Tabassum, Dung Tran, Trung Dang, Ismini Lourentzou, Kazuhito Koishida
Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks.