no code implementations • 26 Oct 2023 • Gene-Ping Yang, Hao Tang
We study two key properties that enable matching, namely, whether cluster centroids of self-supervised representations reduce the variability of phone instances and respect the relationship among phones.
no code implementations • 6 Jul 2023 • Gene-Ping Yang, Yue Gu, Qingming Tang, Dongsu Du, Yuzong Liu
Our approach used a teacher-student framework to transfer knowledge from a larger, more complex model to a smaller, light-weight model using dual-view cross-correlation distillation and the teacher's codebook as learning objectives.
no code implementations • 25 Apr 2022 • Gene-Ping Yang, Hao Tang
Attention mechanism in sequence-to-sequence models is designed to model the alignments between acoustic features and output tokens in speech recognition.
1 code implementation • 29 Oct 2020 • Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-Yi Lee
Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.
Ranked #6 on Speech Separation on Libri2Mix (using extra training data)
1 code implementation • 28 Oct 2019 • Gene-Ping Yang, Szu-Lin Wu, Yao-Wen Mao, Hung-Yi Lee, Lin-shan Lee
Permutation Invariant Training (PIT) has long been a stepping stone method for training speech separation model in handling the label ambiguity problem.
Ranked #22 on Speech Separation on WSJ0-2mix
1 code implementation • 16 Apr 2019 • Gene-Ping Yang, Chao-I Tuan, Hung-Yi Lee, Lin-shan Lee
Substantial effort has been reported based on approaches over spectrogram, which is well known as the standard time-and-frequency cross-domain representation for speech signals.
Ranked #24 on Speech Separation on WSJ0-2mix