no code implementations • 12 Oct 2023 • Ju-chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli
However, in the field of language modeling, very little effort has been made to model them jointly.
no code implementations • 9 Oct 2023 • Chung-Ming Chien, Mingjiamei Zhang, Ju-chieh Chou, Karen Livescu
Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space.
no code implementations • 14 Sep 2023 • Ju-chieh Chou, Chung-Ming Chien, Karen Livescu
In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data.
1 code implementation • 10 Jul 2021 • Ankita Pasad, Ju-chieh Chou, Karen Livescu
Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio
SpeechBrain is an open-source and all-in-one speech toolkit.
11 code implementations • 10 Apr 2019 • Ju-chieh Chou, Cheng-chieh Yeh, Hung-Yi Lee
Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers.
1 code implementation • 9 Aug 2018 • Cheng-chieh Yeh, Po-chun Hsu, Ju-chieh Chou, Hung-Yi Lee, Lin-shan Lee
In this way, the length constraint mentioned above is removed to offer rhythm-flexible voice conversion without requiring parallel data.
Sound Audio and Speech Processing
3 code implementations • 9 Apr 2018 • Ju-chieh Chou, Cheng-chieh Yeh, Hung-Yi Lee, Lin-shan Lee
The decoder then takes the speaker-independent latent representation and the target speaker embedding as the input to generate the voice of the target speaker with the linguistic content of the source utterance.
1 code implementation • EMNLP 2017 • Peng-Hsuan Li, Ruo-Ping Dong, Yu-Siang Wang, Ju-chieh Chou, Wei-Yun Ma
Motivated by the observation that named entities are highly related to linguistic constituents, we propose a constituent-based BRNN-CNN for named entity recognition.