Search Results for author: Sameer Khurana

Found 21 papers, 4 papers with code

Multi-view Dimensionality Reduction for Dialect Identification of Arabic Broadcast Speech

no code implementations19 Sep 2016 Sameer Khurana, Ahmed Ali, Steve Renals

In this work, we present a new Vector Space Model (VSM) of speech utterances for the task of spoken dialect identification.

Dialect Identification Dimensionality Reduction

DARTS: Dialectal Arabic Transcription System

no code implementations26 Sep 2019 Sameer Khurana, Ahmed Ali, James Glass

We analyze the following; transfer learning from high resource broadcast domain to low-resource dialectal domain and semi-supervised learning where we use in-domain unlabeled audio data collected from YouTube.

Language Modelling Transfer Learning

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

no code implementations3 Jun 2020 Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass

Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.

Representation Learning Self-Supervised Learning +1

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

no code implementations26 Nov 2020 Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux

The performance of automatic speech recognition (ASR) systems typically degrades significantly when the training and test data domains are mismatched.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

no code implementations7 Oct 2021 Sameer Khurana, Antoine Laurent, James Glass

We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2. 0 models for Automatic Speech Recognition (ASR) in resource-scarce languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation

no code implementations17 May 2022 Sameer Khurana, Antoine Laurent, James Glass

We combine state-of-the-art multilingual acoustic frame-level speech representation learning model XLS-R with the Language Agnostic BERT Sentence Embedding (LaBSE) model to create an utterance-level multimodal multilingual speech encoder SAMU-XLSR.

Retrieval Sentence +5

On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration

no code implementations14 Nov 2022 Nauman Dawalatabad, Sameer Khurana, Antoine Laurent, James Glass

Dropout-based Uncertainty-driven Self-Training (DUST) proceeds by first training a teacher model on source domain labeled data.

Pseudo Label Pseudo Label Filtering +1

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

no code implementations21 May 2023 Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.

Direct Text to Speech Translation System using Acoustic Units

no code implementations14 Sep 2023 Victoria Mingote, Pablo Gimeno, Luis Vicente, Sameer Khurana, Antoine Laurent, Jarod Duret

This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language.

Speech-to-Speech Translation text-to-speech translation +1

Generation or Replication: Auscultating Audio Latent Diffusion Models

no code implementations16 Oct 2023 Dimitrios Bralios, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux

The introduction of audio latent diffusion models possessing the ability to generate realistic sound clips on demand from a text description has the potential to revolutionize how we work with audio.

AudioCaps Memorization +1

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

no code implementations30 Oct 2023 Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux

Target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers.

Speaker Separation Speech Enhancement +1

NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection

no code implementations12 Dec 2023 Zexu Pan, Gordon Wichern, Francois G. Germain, Sameer Khurana, Jonathan Le Roux

Neuro-steered speaker extraction aims to extract the listener's brain-attended speech signal from a multi-talker speech signal, in which the attention is derived from the cortical activity.

EEG

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization

1 code implementation27 Feb 2024 Yoshiki Masuyama, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux

Existing NF-based methods focused on estimating the magnitude of the HRTF from a given sound source direction, and the magnitude is converted to a finite impulse response (FIR) filter.

Spatial Interpolation

Cannot find the paper you are looking for? You can Submit a new open access paper.