1 code implementation • NeurIPS 2023 • Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass
In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering.
no code implementations • 28 Sep 2020 • Tianxing He, Jingzhao Zhang, Zhiming Zhou, James R. Glass
The exposure bias problem refers to the incrementally distorted generation induced by the training-generation discrepancy, in teacher-forcing training for auto-regressive neural network language models (LM).
no code implementations • ACL 2017 • David Harwath, James R. Glass
Given a collection of images and spoken audio captions, we present a method for discovering word-like acoustic units in the continuous speech signal and grounding them to semantically relevant image regions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2