no code implementations • 1 Mar 2024 • Weiwei Lin, Chenhang He, Man-Wai Mak, Jiachen Lian, Kong Aik Lee
This forces the model to learn a speaker distribution disentangled from the semantic content.
no code implementations • 18 Jan 2024 • Jiachen Lian, Gopala Anumanchipalli
Speech disfluency modeling is the bottleneck for both speech therapy and language learning.
no code implementations • 20 Dec 2023 • Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli
Dysfluent speech modeling requires time-accurate and silence-aware transcription at both the word-level and phonetic-level.
1 code implementation • 5 Jul 2023 • Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W Black, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli
Finally, through a series of ablations, we show that the proposed MRI representation is more comprehensive than EMA and identify the most suitable MRI feature subset for articulatory synthesis.
no code implementations • 10 Feb 2023 • Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli
Self-supervision has shown great potential for audio-visual speech recognition by vastly reducing the amount of labeled data required to build good systems.
no code implementations • 29 Oct 2022 • Jiachen Lian, Alan W Black, Yijing Lu, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli
In this work, we propose a novel articulatory representation decomposition algorithm that takes the advantage of guided factor analysis to derive the articulatory-specific factors and factor scores.
no code implementations • 6 Jun 2022 • Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu
We leverage recent advancements in self-supervised speech representation learning as well as speech synthesis front-end techniques for system development.
1 code implementation • 11 May 2022 • Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu
In our experiment on the VCTK dataset, we demonstrate that content embeddings derived from the conditional DSVAE overcome the randomness and achieve a much better phoneme classification accuracy, a stabilized vocalization and a better zero-shot VC performance compared with the competitive DSVAE baseline.
1 code implementation • 1 Apr 2022 • Jiachen Lian, Alan W Black, Louis Goldstein, Gopala Krishna Anumanchipalli
Most of the research on data-driven speech representation learning has focused on raw audios in an end-to-end manner, paying little attention to their internal phonological or gestural structure.
1 code implementation • 30 Mar 2022 • Jiachen Lian, Chunlei Zhang, Dong Yu
A zero-shot voice conversion is performed by feeding an arbitrary speaker embedding and content embeddings to the VAE decoder.
1 code implementation • 9 Nov 2020 • Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh
We further propose Multinomial Masked Proxy (MMP) loss to leverage the hardness of speaker pairs.