Search Results for author: Yist Y. Lin

Found 8 papers, 5 papers with code

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations

3 code implementations7 Apr 2021 Jheng-Hao Lin, Yist Y. Lin, Chung-Ming Chien, Hung-Yi Lee

AUTOVC used dvector to extract speaker information, and self-supervised learning (SSL) features like wav2vec 2. 0 is used in FragmentVC to extract the phonetic content information.

Self-Supervised Learning Voice Conversion

FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention

2 code implementations27 Oct 2020 Yist Y. Lin, Chung-Ming Chien, Jheng-Hao Lin, Hung-Yi Lee, Lin-shan Lee

Any-to-any voice conversion aims to convert the voice from and to any speakers even unseen during training, which is much more challenging compared to one-to-one or many-to-many tasks, but much more attractive in real-world scenarios.

Disentanglement Speaker Verification +1

Defending Your Voice: Adversarial Attack on Voice Conversion

1 code implementation18 May 2020 Chien-yu Huang, Yist Y. Lin, Hung-Yi Lee, Lin-shan Lee

We introduce human imperceptible noise into the utterances of a speaker whose voice is to be defended.

Adversarial Attack Voice Conversion

Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

no code implementations28 Oct 2022 Yist Y. Lin, Tao Han, HaiHua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma

One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched.

Action Detection Activity Detection +4

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition

no code implementations9 Jun 2023 Xianzhao Chen, Yist Y. Lin, Kang Wang, Yi He, Zejun Ma

In this paper, we improve the frame-level classifier for word timings in E2E system by introducing label priors in connectionist temporal classification (CTC) loss, which is adopted from prior works, and combining low-level Mel-scale filter banks with high-level ASR encoder output as input feature.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.