no code implementations • 25 Mar 2024 • Tsendsuren Munkhdalai, Youzheng Chen, Khe Chai Sim, Fadi Biadsy, Tara Sainath, Pedro Moreno Mengibar
However, their per-task parameter overhead is considered still high when the number of downstream tasks to adapt for is large.
no code implementations • 25 Oct 2022 • Oleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal
We present a streaming-based approach to produce an acceptable delay, with minimal loss in speech conversion quality, when compared to a reference state of the art non-streaming approach.
no code implementations • 15 Sep 2022 • Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Yinghui Huang, Jesse Emond, Pedro Moreno Mengibar
For ASR augmentation, it is necessary that the VC model be robust to a wide range of input speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 23 Mar 2022 • Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro J. Moreno
We also show that learning a speaker-embedding space can scale further and reduce the amount of personalization training data required per speaker.
1 code implementation • 1 Mar 2022 • Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy
We present two methods of real time magnitude spectrogram inversion: streaming Griffin Lim(GL) and streaming MelGAN.
no code implementations • EMNLP 2021 • Katrin Tomanek, Vicky Zayats, Dirk Padfield, Kara Vaillancourt, Fadi Biadsy
We demonstrate this on two speech adaptation tasks (atypical and accented speech) and for two state-of-the-art ASR architectures.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 12 Apr 2019 • Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu
We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.