3 code implementations • 18 Feb 2016 • Gil Keren, Björn Schuller
Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input.
no code implementations • 23 Nov 2016 • Gil Keren, Sivan Sabato, Björn Schuller
We propose incorporating this idea of tunable sensitivity for hard examples in neural network learning, using a new generalization of the cross-entropy gradient step, which can be used in place of the gradient in any gradient-based training method.
2 code implementations • 29 May 2017 • Gil Keren, Sivan Sabato, Björn Schuller
Our experiments show that indeed in almost all cases, losses that are aligned with the Principle of Logit Separation obtain at least 20% relative accuracy improvement in the SLC task compared to losses that are not aligned with it, and sometimes considerably more.
no code implementations • ICLR 2018 • Gil Keren, Sivan Sabato, Björn Schuller
In contrast, there are known loss functions, as well as novel batch loss functions that we propose, which are aligned with this principle.
no code implementations • 10 Jan 2018 • Gil Keren, Maximilian Schmitt, Thomas Kehrenberg, Björn Schuller
Neural network models that are not conditioned on class identities were shown to facilitate knowledge transfer between classes and to be well-suited for one-shot learning tasks.
1 code implementation • 26 Mar 2018 • Gil Keren, NIcholas Cummins, Björn Schuller
Despite their obvious aforementioned advantage in relation to accuracy, contemporary neural networks can, generally, be regarded as poorly calibrated and as such do not produce reliable output probability estimates.
no code implementations • 26 Oct 2018 • Gil Keren, Jing Han, Björn Schuller
We address the problem of speech enhancement generalisation to unseen environments by performing two manipulations.
no code implementations • 24 Jun 2019 • Shuo Liu, Gil Keren, Björn Schuller
We present a novel source separation model to decompose asingle-channel speech signal into two speech segments belonging to two different speakers.
1 code implementation • 16 Nov 2019 • Shuo Liu, Gil Keren, Björn Schuller
N-HANS is a Python toolkit for in-the-wild audio enhancement, including speech, music, and general audio denoising, separation, and selective noise or source suppression.
Sound Audio and Speech Processing
no code implementations • 4 Jun 2020 • Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf
By using an attention model and a biasing model to leverage the contextual metadata that accompanies a video, we observe a relative improvement of about 16% in Word Error Rate on Named Entities (WER-NE) for videos with related metadata.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 5 Nov 2020 • Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su, Thong Le, Ching-Feng Yeh, Christian Fuegen, Michael L. Seltzer
There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 16 Nov 2020 • Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer
End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 5 Apr 2021 • Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer
How to leverage dynamic contextual information in end-to-end speech recognition has remained an active research area.
no code implementations • 10 Nov 2021 • Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed
With 4. 5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 15 Dec 2022 • Ke Li, Jay Mahadeokar, Jinxi Guo, Yangyang Shi, Gil Keren, Ozlem Kalinli, Michael L. Seltzer, Duc Le
Experiments on Librispeech and in-house data show relative WER reductions (WERRs) from 3% to 5% with a slight increase in model size and negligible extra token emission latency compared with fast-slow encoder based transducer.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 28 Feb 2023 • Gil Keren
Standard Recurrent Neural Network Transducers (RNN-T) decoding algorithms for speech recognition are iterating over the time axis, such that one time step is decoded before moving on to the next time step.
no code implementations • 22 May 2023 • Zhuangqun Huang, Gil Keren, Ziran Jiang, Shashank Jain, David Goss-Grubbs, Nelson Cheng, Farnaz Abtahi, Duc Le, David Zhang, Antony D'Avirro, Ethan Campbell-Taylor, Jessie Salas, Irina-Elena Veliche, Xi Chen
In this work, we explore text augmentation for ASR using large-scale pre-trained neural networks, and systematically compare those to traditional text augmentation methods.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 30 May 2023 • Shuo Liu, Leda Sari, Chunyang Wu, Gil Keren, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli
This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic speech recognition (ASR) model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1