no code implementations • 30 Apr 2019 • Gokce Keskin, Tyler Lee, Cory Stephenson, Oguz H. Elibol
We present a Cycle-GAN based many-to-many voice conversion method that can convert between speakers that are not in the training set.
no code implementations • 10 May 2019 • Oguz H. Elibol, Gokce Keskin, Anil Thomas
We present a rapid design methodology that combines automated hyper-parameter tuning with semi-supervised training to build highly accurate and robust models for voice commands classification.
no code implementations • 9 May 2019 • Orhan Ocal, Oguz H. Elibol, Gokce Keskin, Cory Stephenson, Anil Thomas, Kannan Ramchandran
Due to the use of a single encoder, our method can generalize to converting the voice of out-of-training speakers to speakers in the training dataset.
no code implementations • 20 Jun 2019 • Stephen J Tarsa, Chit-Kwan Lin, Gokce Keskin, Gautham Chinya, Hong Wang
CPU branch prediction has hit a wall--existing techniques achieve near-perfect accuracy on 99% of static branches, and yet the mispredictions that remain hide major performance gains.
no code implementations • 30 Sep 2019 • Cory Stephenson, Gokce Keskin, Anil Thomas, Oguz H. Elibol
In this work we introduce a semi-supervised approach to the voice conversion problem, in which speech from a source speaker is converted into speech of a target speaker.
no code implementations • 14 Dec 2020 • Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Roland Maas
Accents mismatching is a critical problem for end-to-end ASR.
no code implementations • 12 May 2021 • Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas
The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers.
no code implementations • 4 Jun 2021 • Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas
An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1