Search Results for author: Hynek Hermansky

Found 13 papers, 1 papers with code

Self-supervised Learning with Speech Modulation Dropout

no code implementations • 22 Mar 2023 • Samik Sadhu, Hynek Hermansky

We show that training a multi-headed self-attention-based deep network to predict deleted, information-dense 2-8 Hz speech modulations over a 1. 5-second section of a speech utterance is an effective way to make machines learn to extract speech modulations using time-domain contextual information.

Automatic Speech Recognition Self-Supervised Learning +2

Paper
Add Code

Stabilized training of joint energy-based models and their practical applications

no code implementations • 7 Mar 2023 • Martin Sustek, Samik Sadhu, Lukas Burget, Hynek Hermansky, Jesus Villalba, Laureano Moro-Velazquez, Najim Dehak

The JEM training relies on "positive examples" (i. e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution $p(x)$ generated by means of Stochastic Gradient Langevin Dynamics (SGLD).

Paper
Add Code

Blind Signal Dereverberation for Machine Speech Recognition

no code implementations • 30 Sep 2022 • Samik Sadhu, Hynek Hermansky

We present a method to remove unknown convolutive noise introduced to speech by reverberations of recording environments, utilizing some amount of training speech data from the reverberant environment, and any available non-reverberant speech data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives

no code implementations • 31 Mar 2022 • Samik Sadhu, Hynek Hermansky

How important are different temporal speech modulations for speech recognition?

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Radically Old Way of Computing Spectra: Applications in End-to-End ASR

2 code implementations • 25 Mar 2021 • Samik Sadhu, Hynek Hermansky

We propose a technique to compute spectrograms using Frequency Domain Linear Prediction (FDLP) that uses all-pole models to fit the squared Hilbert envelope of speech in different frequency sub-bands.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR

no code implementations • 5 Feb 2021 • Ruizhi Li, Gregory Sell, Hynek Hermansky

Performance degradation of an Automatic Speech Recognition (ASR) system is commonly observed when the test acoustic condition is different from training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A practical two-stage training strategy for multi-stream end-to-end speech recognition

no code implementations • 23 Oct 2019 • Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, Hynek Hermansky

The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multi-Stream End-to-End Speech Recognition

no code implementations • 17 Jun 2019 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky

Two representative framework have been proposed and discussed, which are Multi-Encoder Multi-Resolution (MEM-Res) framework and Multi-Encoder Multi-Array (MEM-Array) framework, respectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Performance Monitoring for End-to-End Speech Recognition

no code implementations • 9 Apr 2019 • Ruizhi Li, Gregory Sell, Hynek Hermansky

Measuring performance of an automatic speech recognition (ASR) system without ground-truth could be beneficial in many scenarios, especially with data from unseen domains, where performance can be highly inconsistent.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Exploring Methods for the Automatic Detection of Errors in Manual Transcription

no code implementations • 8 Apr 2019 • Xiaofei Wang, Jinyi Yang, Ruizhi Li, Samik Sadhu, Hynek Hermansky

Quality of data plays an important role in most deep learning tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multi-encoder multi-resolution framework for end-to-end speech recognition

no code implementations • 12 Nov 2018 • Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky

In this work, we present a novel Multi-Encoder Multi-Resolution (MEMR) framework based on the joint CTC/Attention model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Stream attention-based multi-array end-to-end speech recognition

no code implementations • 12 Nov 2018 • Xiaofei Wang, Ruizhi Li, Sri Harish Mallid, Takaaki Hori, Shinji Watanabe, Hynek Hermansky

Automatic Speech Recognition (ASR) using multiple microphone arrays has achieved great success in the far-field robustness.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree

no code implementations • NeurIPS 2008 • Daphna Weinshall, Hynek Hermansky, Alon Zweig, Jie Luo, Holly Jimison, Frank Ohl, Misha Pavel

We define a formal framework for the representation and processing of incongruent events: starting from the notion of label hierarchy, we show how partial order on labels can be deduced from such hierarchies.

Novelty Detection Object Recognition +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.