Search Results for author: Gene-Ping Yang

Found 6 papers, 3 papers with code

Towards Matching Phones and Speech Representations

no code implementations26 Oct 2023 Gene-Ping Yang, Hao Tang

We study two key properties that enable matching, namely, whether cluster centroids of self-supervised representations reduce the variability of phone instances and respect the relationship among phones.

Self-Supervised Learning

On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation

no code implementations6 Jul 2023 Gene-Ping Yang, Yue Gu, Qingming Tang, Dongsu Du, Yuzong Liu

Our approach used a teacher-student framework to transfer knowledge from a larger, more complex model to a smaller, light-weight model using dual-view cross-correlation distillation and the teacher's codebook as learning objectives.

Keyword Spotting Knowledge Distillation +1

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

no code implementations25 Apr 2022 Gene-Ping Yang, Hao Tang

Attention mechanism in sequence-to-sequence models is designed to model the alignments between acoustic features and output tokens in speech recognition.

speech-recognition Speech Recognition

Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training

1 code implementation29 Oct 2020 Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-Yi Lee

Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.

Ranked #6 on Speech Separation on Libri2Mix (using extra training data)

Speaker Separation Speech Enhancement +1

Interrupted and cascaded permutation invariant training for speech separation

1 code implementation28 Oct 2019 Gene-Ping Yang, Szu-Lin Wu, Yao-Wen Mao, Hung-Yi Lee, Lin-shan Lee

Permutation Invariant Training (PIT) has long been a stepping stone method for training speech separation model in handling the label ambiguity problem.

Speech Separation

Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering

1 code implementation16 Apr 2019 Gene-Ping Yang, Chao-I Tuan, Hung-Yi Lee, Lin-shan Lee

Substantial effort has been reported based on approaches over spectrogram, which is well known as the standard time-and-frequency cross-domain representation for speech signals.

Clustering Speech Separation

Cannot find the paper you are looking for? You can Submit a new open access paper.