Search Results for author: Roland Maas

Found 18 papers, 0 papers with code

VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition

no code implementations22 Feb 2022 Jinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas

Results show that the proposed method can achieve a 20% relative computation cost reduction on Librispeech and Microsoft Speech Language Translation long-form corpus while maintaining the WER performance when comparing to the best performing overlapping inference algorithm.

Action Detection Activity Detection +1

SynthASR: Unlocking Synthetic Data for Speech Recognition

no code implementations14 Jun 2021 Amin Fazel, Wei Yang, YuLan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo

Our observations show that SynthASR holds great promise in training the state-of-the-art large-scale E2E ASR models for new applications while reducing the costs and dependency on production data.

Automatic Speech Recognition Continual Learning +1

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

no code implementations4 Jun 2021 Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.

Automatic Speech Recognition

Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition

no code implementations12 May 2021 Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas

The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers.

Speech Recognition

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations9 Mar 2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

no code implementations27 Jul 2020 Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas

Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.

Speech Recognition

Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

no code implementations30 Jun 2020 Maarten Van Segbroeck, Harish Mallidih, Brian King, I-Fan Chen, Gurpreet Chadha, Roland Maas

Acoustic models in real-time speech recognition systems typically stack multiple unidirectional LSTM layers to process the acoustic frames over time.

Automatic Speech Recognition

Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

no code implementations1 Jun 2020 Chander Chandak, Zeynab Raeesy, Ariya Rastrow, Yuzong Liu, Xiangyang Huang, Siyu Wang, Dong Kwon Joo, Roland Maas

A common approach to solve multilingual speech recognition is to run multiple monolingual ASR systems in parallel and rely on a language identification (LID) component that detects the input language.

Language Identification Speech Recognition

DiPCo -- Dinner Party Corpus

no code implementations30 Sep 2019 Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas

We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment.

LSTM-based Whisper Detection

no code implementations20 Sep 2018 Zeynab Raeesy, Kellen Gillespie, Zhenpei Yang, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister

We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone compared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech.

Device-directed Utterance Detection

no code implementations7 Aug 2018 Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister

In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants.

Automatic Speech Recognition

Estimating parameters of nonlinear systems using the elitist particle filter based on evolutionary strategies

no code implementations14 Apr 2016 Christian Huemmer, Christian Hofmann, Roland Maas, Walter Kellermann

In this article, we present the elitist particle filter based on evolutionary strategies (EPFES) as an efficient approach for nonlinear system identification.

Acoustic echo cancellation

The NLMS algorithm with time-variant optimum stepsize derived from a Bayesian network perspective

no code implementations18 Nov 2014 Christian Huemmer, Roland Maas, Walter Kellermann

In this article, we derive a new stepsize adaptation for the normalized least mean square algorithm (NLMS) by describing the task of linear acoustic echo cancellation from a Bayesian network perspective.

Acoustic echo cancellation

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

no code implementations9 Oct 2014 Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann

We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments.

Automatic Speech Recognition

A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition

no code implementations11 Oct 2013 Roland Maas, Christian Huemmer, Armin Sehr, Walter Kellermann

This article provides a unifying Bayesian network view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition.

Automatic Speech Recognition Robust Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.