1 code implementation • 16 Oct 2020 • Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma
With these data, three neural TTS models -- Tacotron2, Transformer and FastSpeech are applied for building bilingual and code-switched TTS.
1 code implementation • 3 Feb 2021 • Shengkui Zhao, Trung Hieu Nguyen, Bin Ma
In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complex-valued convolutional layers by constructing more informative features.
Ranked #1 on Speech Enhancement on DNS Challenge
no code implementations • 3 Feb 2021 • Shengkui Zhao, Hao Wang, Trung Hieu Nguyen, Bin Ma
Cross-lingual voice conversion (VC) is an important and challenging problem due to significant mismatches of the phonetic set and the speech prosody of different languages.
no code implementations • 2 Oct 2021 • Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma
We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement.
1 code implementation • 23 Feb 2023 • Shengkui Zhao, Bin Ma
To effectively solve the indirect elemental interactions across chunks in the dual-path architecture, MossFormer employs a joint local and global self-attention architecture that simultaneously performs a full-computation self-attention on local chunks and a linearised low-cost self-attention over the full sequence.
Ranked #2 on Speech Separation on WHAMR!
1 code implementation • 20 May 2023 • Jia Qi Yip, Tuan Truong, Dianwen Ng, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma
In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling.
1 code implementation • 22 Sep 2023 • Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma
Dual-path is a popular architecture for speech separation models (e. g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships.
Ranked #5 on Speech Separation on WSJ0-2mix