Search Results for author: Björn Hoffmeister

Found 9 papers, 1 papers with code

Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

1 code implementation4 Jan 2024 David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister

We demonstrate that our CLC family of approaches can improve the performance of ASR models on OD3, a new public large-scale semi-synthetic meta-dataset of audio task-oriented dialogues, by up to 19. 2%.

Attribute Automatic Speech Recognition +4

Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

no code implementations6 Jan 2023 David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning).

Domain Adaptation speech-recognition +1

Multi-Modal Pre-Training for Automated Speech Recognition

no code implementations12 Oct 2021 David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister

Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance.

Language Modelling Masked Language Modeling +3

DiPCo -- Dinner Party Corpus

no code implementations30 Sep 2019 Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas

We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment.

Benchmarking

End-to-end Anchored Speech Recognition

no code implementations6 Feb 2019 Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister

The anchored segment refers to the wake-up word part of an audio stream, which contains valuable speaker information that can be used to suppress interfering speech and background noise.

Multi-Task Learning speech-recognition +1

LSTM-based Whisper Detection

no code implementations20 Sep 2018 Zeynab Raeesy, Kellen Gillespie, Zhenpei Yang, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister

We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone compared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech.

Benchmarking

Device-directed Utterance Detection

no code implementations7 Aug 2018 Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister

In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.