Search Results for author: Gil Keren

Found 14 papers, 4 papers with code

Scaling ASR Improves Zero and Few Shot Learning

no code implementations10 Nov 2021 Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

With 4. 5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition.

Automatic Speech Recognition Few-Shot Learning

Deep Shallow Fusion for RNN-T Personalization

no code implementations16 Nov 2020 Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer

End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks.

Automatic Speech Recognition

Alignment Restricted Streaming Recurrent Neural Network Transducer

no code implementations5 Nov 2020 Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su, Thong Le, Ching-Feng Yeh, Christian Fuegen, Michael L. Seltzer

There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications.

Automatic Speech Recognition

Contextual RNN-T For Open Domain ASR

no code implementations4 Jun 2020 Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf

By using an attention model and a biasing model to leverage the contextual metadata that accompanies a video, we observe a relative improvement of about 16% in Word Error Rate on Named Entities (WER-NE) for videos with related metadata.

Automatic Speech Recognition

N-HANS: Introducing the Augsburg Neuro-Holistic Audio-eNhancement System

1 code implementation16 Nov 2019 Shuo Liu, Gil Keren, Björn Schuller

N-HANS is a Python toolkit for in-the-wild audio enhancement, including speech, music, and general audio denoising, separation, and selective noise or source suppression.

Sound Audio and Speech Processing

Single-Channel Speech Separation with Auxiliary Speaker Embeddings

no code implementations24 Jun 2019 Shuo Liu, Gil Keren, Björn Schuller

We present a novel source separation model to decompose asingle-channel speech signal into two speech segments belonging to two different speakers.

Speech Separation

Scaling Speech Enhancement in Unseen Environments with Noise Embeddings

no code implementations26 Oct 2018 Gil Keren, Jing Han, Björn Schuller

We address the problem of speech enhancement generalisation to unseen environments by performing two manipulations.

Speech Enhancement Speech Recognition

Calibrated Prediction Intervals for Neural Network Regressors

1 code implementation26 Mar 2018 Gil Keren, NIcholas Cummins, Björn Schuller

Despite their obvious aforementioned advantage in relation to accuracy, contemporary neural networks can, generally, be regarded as poorly calibrated and as such do not produce reliable output probability estimates.

Prediction Intervals

Weakly Supervised One-Shot Detection with Attention Similarity Networks

no code implementations10 Jan 2018 Gil Keren, Maximilian Schmitt, Thomas Kehrenberg, Björn Schuller

Neural network models that are not conditioned on class identities were shown to facilitate knowledge transfer between classes and to be well-suited for one-shot learning tasks.

One-Shot Learning Transfer Learning

The Principle of Logit Separation

no code implementations ICLR 2018 Gil Keren, Sivan Sabato, Björn Schuller

In contrast, there are known loss functions, as well as novel batch loss functions that we propose, which are aligned with this principle.

Image Retrieval

Fast Single-Class Classification and the Principle of Logit Separation

2 code implementations29 May 2017 Gil Keren, Sivan Sabato, Björn Schuller

Our experiments show that indeed in almost all cases, losses that are aligned with the Principle of Logit Separation obtain at least 20% relative accuracy improvement in the SLC task compared to losses that are not aligned with it, and sometimes considerably more.

General Classification Image Retrieval

Tunable Sensitivity to Large Errors in Neural Network Training

no code implementations23 Nov 2016 Gil Keren, Sivan Sabato, Björn Schuller

We propose incorporating this idea of tunable sensitivity for hard examples in neural network learning, using a new generalization of the cross-entropy gradient step, which can be used in place of the gradient in any gradient-based training method.

Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data

3 code implementations18 Feb 2016 Gil Keren, Björn Schuller

Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input.

Audio Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.