2 code implementations • 28 Oct 2021 • Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Anjali Chourdia, Artyom Astafurov, Caroline Chen, Ching-Feng Yeh, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jay Mahadeokar, Jeff Hwang, Ji Chen, Peter Goldsborough, Prabhat Roy, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-Bélair, Yangyang Shi
This document describes version 0. 10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain.
no code implementations • ACL 2021 • Yun Tang, Juan Pino, Xian Li, Changhan Wang, Dmitriy Genzel
Pretraining and multitask learning are widely used to improve the speech to text translation performance.
no code implementations • 15 Apr 2021 • Hongyu Gong, Xian Li, Dmitriy Genzel
Based on these insights, we propose an adaptive and sparse architecture for multilingual modeling, and train the model to learn shared and language-specific parameters to improve the positive transfer and mitigate the interference.
no code implementations • 21 Oct 2020 • Yun Tang, Juan Pino, Changhan Wang, Xutai Ma, Dmitriy Genzel
We demonstrate that representing text input as phoneme sequences can reduce the difference between speech and text inputs, and enhance the knowledge transfer from text corpora to the speech to text tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • EMNLP (NLP-COVID19) 2020 • Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello Federico, Christian Federman, Dmitriy Genzel, Francisco Guzmán, Junjie Hu, Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig, Mengmeng Niu, Alp Öktem, Eric Paquin, Grace Tang, Sylwia Tur
Further, the team is converting the test and development data into translation memories (TMXs) that can be used by localizers from and to any of the languages.
no code implementations • 25 Sep 2019 • Qing Sun, James Cross, Dmitriy Genzel
Sequence-to-sequence models such as transformers, which are now being used in a wide variety of NLP tasks, typically need to have very high capacity in order to perform well.