Search Results for author: Niko Moritz

Found 15 papers, 0 papers with code

Streaming Audio-Visual Speech Recognition with Alignment Regularization

no code implementations3 Nov 2022 Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic

The audio and the visual encoder neural networks are both based on the conformer architecture, which is made streamable using chunk-wise self-attention (CSA) and causal convolution.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Anchored Speech Recognition with Neural Transducers

no code implementations20 Oct 2022 Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

Anchored speech recognition refers to a class of methods that use information from an anchor segment (e. g., wake-words) to recognize device-directed speech while ignoring interfering background speech/noise.

speech-recognition Speech Recognition

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

no code implementations19 Apr 2022 Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen

The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are RNN-Transducer (RNN-T) and connectionist temporal classification (CTC).

Automatic Speech Recognition speech-recognition

Sequence Transduction with Graph-based Supervision

no code implementations1 Nov 2021 Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production.

Automatic Speech Recognition speech-recognition

Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy

no code implementations11 Oct 2021 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Language Modelling +1

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

no code implementations2 Jul 2021 Niko Moritz, Takaaki Hori, Jonathan Le Roux

Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks.

Automatic Speech Recognition speech-recognition

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

no code implementations16 Jun 2021 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.

Automatic Speech Recognition speech-recognition

Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers

no code implementations19 Apr 2021 Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux

In this paper, we extend our prior work by (1) introducing the Conformer architecture to further improve the accuracy, (2) accelerating the decoding process with a novel activation recycling technique, and (3) enabling streaming decoding with triggered attention.

Automatic Speech Recognition speech-recognition

Capturing Multi-Resolution Context by Dilated Self-Attention

no code implementations7 Apr 2021 Niko Moritz, Takaaki Hori, Jonathan Le Roux

The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution.

Automatic Speech Recognition Machine Translation +2

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

no code implementations26 Nov 2020 Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux

The performance of automatic speech recognition (ASR) systems typically degrades significantly when the training and test data domains are mismatched.

Automatic Speech Recognition Pseudo Label +2

Semi-Supervised Speech Recognition via Graph-based Temporal Classification

no code implementations29 Oct 2020 Niko Moritz, Takaaki Hori, Jonathan Le Roux

However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model.

Automatic Speech Recognition Classification +3

Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR

no code implementations14 Feb 2020 Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux

We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR).

Automatic Speech Recognition speech-recognition

Streaming automatic speech recognition with the transformer model

no code implementations8 Jan 2020 Niko Moritz, Takaaki Hori, Jonathan Le Roux

Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition speech-recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.