no code implementations • 12 Oct 2021 • David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister
Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance.
no code implementations • dialdoc (ACL) 2022 • Greyson Gerhard-Young, Raviteja Anantha, Srinivas Chappidi, Björn Hoffmeister
Recent work building open-domain chatbots has demonstrated that increasing model size improves performance.
no code implementations • 30 Sep 2019 • Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas
We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment.
no code implementations • 6 Feb 2019 • Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister
The anchored segment refers to the wake-up word part of an audio stream, which contains valuable speaker information that can be used to suppress interfering speech and background noise.
no code implementations • 5 Jan 2019 • Ladislav Mošner, Minhua Wu, Anirudh Raju, Sree Hari Krishnan Parthasarathi, Kenichi Kumatani, Shiva Sundaram, Roland Maas, Björn Hoffmeister
For real-world speech recognition applications, noise robustness is still a challenge.
no code implementations • 20 Sep 2018 • Zeynab Raeesy, Kellen Gillespie, Zhenpei Yang, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister
We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone compared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech.
no code implementations • 7 Aug 2018 • Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister
In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants.