1 code implementation • 14 Sep 2023 • Gašper Beguš, Thomas Lu, Alan Zhou, Peter Wu, Gopala K. Anumanchipalli
This paper introduces CiwaGAN, a model of human spoken language acquisition that combines unsupervised articulatory modeling with an unsupervised model of information exchange through the auditory modality.
1 code implementation • 5 Jul 2023 • Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W Black, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli
Finally, through a series of ablations, we show that the proposed MRI representation is more comprehensive than EMA and identify the most suitable MRI feature subset for articulatory synthesis.
1 code implementation • 14 Feb 2023 • Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli
To build speech processing methods that can handle speech as naturally as humans, researchers have explored multiple ways of building an invertible mapping from speech to an interpretable space.
no code implementations • 27 Oct 2022 • Yisi Liu, Peter Wu, Alan W Black, Gopala K. Anumanchipalli
Estimation of fundamental frequency (F0) in voiced segments of speech signals, also known as pitch tracking, plays a crucial role in pitch synchronous speech analysis, speech synthesis, and speech manipulation.
1 code implementation • 27 Oct 2022 • Gašper Beguš, Alan Zhou, Peter Wu, Gopala K Anumanchipalli
Articulatory analysis suggests that the network learns to control articulators in a similar manner to humans during speech production.
no code implementations • 21 Oct 2022 • Cheol Jun Cho, Peter Wu, Abdelrahman Mohamed, Gopala K. Anumanchipalli
Recent self-supervised learning (SSL) models have proven to learn rich representations of speech, which can readily be utilized by diverse downstream tasks.
1 code implementation • 13 Sep 2022 • Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli
In the articulatory synthesis task, speech is synthesized from input features containing information about the physical behavior of the human vocal tract.
1 code implementation • 21 Mar 2022 • Samuel Yu, Peter Wu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency
Our paper takes a step towards real-world physical commonsense reasoning by contributing PACS: the first audiovisual benchmark annotated for physical commonsense attributes.
1 code implementation • 2 Nov 2021 • Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, Alan W Black
We demonstrate the effectiveness of our approach in language family classification, speech recognition, and speech synthesis tasks.
1 code implementation • 15 Oct 2021 • Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, Shinji Watanabe
This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit.
2 code implementations • 15 Jul 2021 • Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency
In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.
2 code implementations • 22 Jan 2021 • Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, Louis-Philippe Morency
As users increasingly rely on cloud-based computing services, it is important to ensure that uploaded speech data remains private.
1 code implementation • 4 Dec 2020 • Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, Ruslan Salakhutdinov
In this work, we propose algorithms for cross-modal generalization: a learning paradigm to train a model that can (1) quickly perform new tasks in a target modality (i. e. meta-learning) and (2) doing so while being trained on a different source modality.
1 code implementation • 1 Dec 2020 • Peter Wu, Yifan Zhong, Alan W Black
Existing multilingual speech NLP works focus on a relatively small subset of languages, and thus current linguistic understanding of languages predominantly stems from classical approaches.
5 code implementations • 3 Dec 2018 • Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith, Ameet Talwalkar
Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day.
1 code implementation • 30 Apr 2018 • Xinyu Guan, Jessica Lee, Peter Wu, Yue Wu
In this project, we extend the state-of-the-art CheXNet (Rajpurkar et al. [2017]) by making use of the additional non-image features in the dataset.