Search Results for author: Gil Keren

Found 18 papers, 4 papers with code

Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data

3 code implementations18 Feb 2016 Gil Keren, Björn Schuller

Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input.

Audio Classification

Tunable Sensitivity to Large Errors in Neural Network Training

no code implementations23 Nov 2016 Gil Keren, Sivan Sabato, Björn Schuller

We propose incorporating this idea of tunable sensitivity for hard examples in neural network learning, using a new generalization of the cross-entropy gradient step, which can be used in place of the gradient in any gradient-based training method.

Fast Single-Class Classification and the Principle of Logit Separation

2 code implementations29 May 2017 Gil Keren, Sivan Sabato, Björn Schuller

Our experiments show that indeed in almost all cases, losses that are aligned with the Principle of Logit Separation obtain at least 20% relative accuracy improvement in the SLC task compared to losses that are not aligned with it, and sometimes considerably more.

Binary Classification Classification +2

The Principle of Logit Separation

no code implementations ICLR 2018 Gil Keren, Sivan Sabato, Björn Schuller

In contrast, there are known loss functions, as well as novel batch loss functions that we propose, which are aligned with this principle.

Image Retrieval

Weakly Supervised One-Shot Detection with Attention Similarity Networks

no code implementations10 Jan 2018 Gil Keren, Maximilian Schmitt, Thomas Kehrenberg, Björn Schuller

Neural network models that are not conditioned on class identities were shown to facilitate knowledge transfer between classes and to be well-suited for one-shot learning tasks.

One-Shot Learning Transfer Learning

Calibrated Prediction Intervals for Neural Network Regressors

1 code implementation26 Mar 2018 Gil Keren, NIcholas Cummins, Björn Schuller

Despite their obvious aforementioned advantage in relation to accuracy, contemporary neural networks can, generally, be regarded as poorly calibrated and as such do not produce reliable output probability estimates.

Prediction Intervals

Scaling Speech Enhancement in Unseen Environments with Noise Embeddings

no code implementations26 Oct 2018 Gil Keren, Jing Han, Björn Schuller

We address the problem of speech enhancement generalisation to unseen environments by performing two manipulations.

Speech Enhancement speech-recognition +1

Single-Channel Speech Separation with Auxiliary Speaker Embeddings

no code implementations24 Jun 2019 Shuo Liu, Gil Keren, Björn Schuller

We present a novel source separation model to decompose asingle-channel speech signal into two speech segments belonging to two different speakers.

Speech Separation

N-HANS: Introducing the Augsburg Neuro-Holistic Audio-eNhancement System

1 code implementation16 Nov 2019 Shuo Liu, Gil Keren, Björn Schuller

N-HANS is a Python toolkit for in-the-wild audio enhancement, including speech, music, and general audio denoising, separation, and selective noise or source suppression.

Sound Audio and Speech Processing

Contextual RNN-T For Open Domain ASR

no code implementations4 Jun 2020 Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf

By using an attention model and a biasing model to leverage the contextual metadata that accompanies a video, we observe a relative improvement of about 16% in Word Error Rate on Named Entities (WER-NE) for videos with related metadata.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Alignment Restricted Streaming Recurrent Neural Network Transducer

no code implementations5 Nov 2020 Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su, Thong Le, Ching-Feng Yeh, Christian Fuegen, Michael L. Seltzer

There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Deep Shallow Fusion for RNN-T Personalization

no code implementations16 Nov 2020 Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer

End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Scaling ASR Improves Zero and Few Shot Learning

no code implementations10 Nov 2021 Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

With 4. 5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Improving Fast-slow Encoder based Transducer with Streaming Deliberation

no code implementations15 Dec 2022 Ke Li, Jay Mahadeokar, Jinxi Guo, Yangyang Shi, Gil Keren, Ozlem Kalinli, Michael L. Seltzer, Duc Le

Experiments on Librispeech and in-house data show relative WER reductions (WERRs) from 3% to 5% with a slight increase in model size and negligible extra token emission latency compared with fast-slow encoder based transducer.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Token-Wise Beam Search Algorithm for RNN-T

no code implementations28 Feb 2023 Gil Keren

Standard Recurrent Neural Network Transducers (RNN-T) decoding algorithms for speech recognition are iterating over the time axis, such that one time step is decoded before moving on to the next time step.

speech-recognition Speech Recognition

Towards Selection of Text-to-speech Data to Augment ASR Training

no code implementations30 May 2023 Shuo Liu, Leda Sari, Chunyang Wu, Gil Keren, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli

This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.