no code implementations • 9 Mar 2021 • Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas
However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.
no code implementations • 17 Jun 2019 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky
Two representative framework have been proposed and discussed, which are Multi-Encoder Multi-Resolution (MEM-Res) framework and Multi-Encoder Multi-Array (MEM-Array) framework, respectively.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 12 Nov 2018 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky
In this work, we present a novel Multi-Encoder Multi-Resolution (MEMR) framework based on the joint CTC/Attention model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 4 Oct 2018 • Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, Takaaki Hori
In this work, we attempt to use data from 10 BABEL languages to build a multi-lingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach.
no code implementations • 7 Aug 2018 • Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister
In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 14 Feb 2017 • Angel Mario Castro Martinez, Sri Harish Mallidi, Bernd T. Meyer
Previous studies support the idea of merging auditory-based Gabor features with deep learning architectures to achieve robust automatic speech recognition, however, the cause behind the gain of such combination is still unknown.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2