1 code implementation • 24 Nov 2020 • Yushi Hu, Shane Settle, Karen Livescu
In this work, we generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of ASE to QbE with arbitrary-length queries in multiple unseen languages.
1 code implementation • 1 Jul 2020 • Bowen Shi, Shane Settle, Karen Livescu
We find that word error rate can be reduced by a large margin by pre-training the acoustic segment representation with AWEs, and additional (smaller) gains can be obtained by pre-training the word prediction layer with AGWEs.
1 code implementation • 24 Jun 2020 • Yushi Hu, Shane Settle, Karen Livescu
The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.
no code implementations • 29 Mar 2019 • Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny
Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are simpler to train, and more efficient to decode with, than sub-word systems.
1 code implementation • 12 Jun 2017 • Shane Settle, Keith Levin, Herman Kamper, Karen Livescu
Query-by-example search often uses dynamic time warping (DTW) for comparing queries and proposed matching segments.
1 code implementation • 23 Mar 2017 • Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech.
no code implementations • 8 Nov 2016 • Shane Settle, Karen Livescu
Acoustic word embeddings --- fixed-dimensional vector representations of variable-length spoken word segments --- have begun to be considered for tasks such as speech recognition and query-by-example search.