We demonstrate the effectiveness of our approach in language family classification, speech recognition, and speech synthesis tasks.
This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit.
2 code implementations • 15 Jul 2021 • Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency
In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.
As users increasingly rely on cloud-based computing services, it is important to ensure that uploaded speech data remains private.
In this work, we propose algorithms for cross-modal generalization: a learning paradigm to train a model that can (1) quickly perform new tasks in a target modality (i. e. meta-learning) and (2) doing so while being trained on a different source modality.
Existing multilingual speech NLP works focus on a relatively small subset of languages, and thus current linguistic understanding of languages predominantly stems from classical approaches.
Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day.