Search Results for author: Roland Maas

Found 23 papers, 0 papers with code

Estimating parameters of nonlinear systems using the elitist particle filter based on evolutionary strategies

no code implementations • 14 Apr 2016 • Christian Huemmer, Christian Hofmann, Roland Maas, Walter Kellermann

In this article, we present the elitist particle filter based on evolutionary strategies (EPFES) as an efficient approach for nonlinear system identification.

Acoustic echo cancellation Evolutionary Algorithms

Paper
Add Code

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

no code implementations • 9 Oct 2014 • Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann

We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

The NLMS algorithm with time-variant optimum stepsize derived from a Bayesian network perspective

no code implementations • 18 Nov 2014 • Christian Huemmer, Roland Maas, Walter Kellermann

In this article, we derive a new stepsize adaptation for the normalized least mean square algorithm (NLMS) by describing the task of linear acoustic echo cancellation from a Bayesian network perspective.

Acoustic echo cancellation

Paper
Add Code

A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition

no code implementations • 11 Oct 2013 • Roland Maas, Christian Huemmer, Armin Sehr, Walter Kellermann

This article provides a unifying Bayesian network view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Device-directed Utterance Detection

no code implementations • 7 Aug 2018 • Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister

In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

LSTM-based Whisper Detection

no code implementations • 20 Sep 2018 • Zeynab Raeesy, Kellen Gillespie, Zhenpei Yang, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister

We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone compared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech.

Benchmarking

Paper
Add Code

Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

no code implementations • 5 Jan 2019 • Ladislav Mošner, Minhua Wu, Anirudh Raju, Sree Hari Krishnan Parthasarathi, Kenichi Kumatani, Shiva Sundaram, Roland Maas, Björn Hoffmeister

For real-world speech recognition applications, noise robustness is still a challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

DiPCo -- Dinner Party Corpus

no code implementations • 30 Sep 2019 • Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas

We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment.

Benchmarking

Paper
Add Code

Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

no code implementations • 1 Jun 2020 • Chander Chandak, Zeynab Raeesy, Ariya Rastrow, Yuzong Liu, Xiangyang Huang, Siyu Wang, Dong Kwon Joo, Roland Maas

A common approach to solve multilingual speech recognition is to run multiple monolingual ASR systems in parallel and rely on a language identification (LID) component that detects the input language.

Language Identification speech-recognition +1

Paper
Add Code

Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

no code implementations • 30 Jun 2020 • Maarten Van Segbroeck, Harish Mallidih, Brian King, I-Fan Chen, Gurpreet Chadha, Roland Maas

Acoustic models in real-time speech recognition systems typically stack multiple unidirectional LSTM layers to process the acoustic frames over time.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Streaming End-to-End Bilingual ASR Systems with Joint Language Identification

no code implementations • 8 Jul 2020 • Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann

Experiments show that for English-Spanish, the bilingual joint ASR-LID architecture matches monolingual ASR and acoustic-only LID accuracies.

Language Identification

Paper
Add Code

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

no code implementations • 27 Jul 2020 • Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas

Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.

speech-recognition Speech Recognition

Paper
Add Code

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

no code implementations • 14 Dec 2020 • Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Roland Maas

Accents mismatching is a critical problem for end-to-end ASR.

Clustering

Paper
Add Code

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations • 9 Mar 2021 • Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Paper
Add Code

Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition

no code implementations • 12 May 2021 • Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas

The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers.

speech-recognition Speech Recognition

Paper
Add Code

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

no code implementations • 4 Jun 2021 • Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

SynthASR: Unlocking Synthetic Data for Speech Recognition

no code implementations • 14 Jun 2021 • Amin Fazel, Wei Yang, YuLan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo

Our observations show that SynthASR holds great promise in training the state-of-the-art large-scale E2E ASR models for new applications while reducing the costs and dependency on production data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition

no code implementations • 22 Feb 2022 • Jinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas

Results show that the proposed method can achieve a 20% relative computation cost reduction on Librispeech and Microsoft Speech Language Translation long-form corpus while maintaining the WER performance when comparing to the best performing overlapping inference algorithm.

Action Detection Activity Detection +3

Paper
Add Code

Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

no code implementations • 16 Jul 2022 • Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas

A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Guided contrastive self-supervised pre-training for automatic speech recognition

no code implementations • 22 Oct 2022 • Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas

Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition

no code implementations • 1 Mar 2023 • Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, Martin Radfar, Harish Mallidi, Maurizio Omologo, Athanasios Mouchtaris, Brian King, Roland Maas

We augment the MC fusion networks to a conformer transducer model and train it in an end-to-end fashion.

Acoustic echo cancellation Automatic Speech Recognition +2

Paper
Add Code

Cross-utterance ASR Rescoring with Graph-based Label Propagation

no code implementations • 27 Mar 2023 • Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran

We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity.

Fairness Language Modelling

Paper
Add Code

Two-pass Endpoint Detection for Speech Recognition

no code implementations • 17 Jan 2024 • Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow

Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands.

speech-recognition Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.