no code implementations • 1 Jun 2023 • Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote, James Glass
Having a single model that supports multiple translation tasks is desirable.
no code implementations • 21 May 2023 • Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass
Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.
no code implementations • 14 Nov 2022 • Nauman Dawalatabad, Sameer Khurana, Antoine Laurent, James Glass
Dropout-based Uncertainty-driven Self-Training (DUST) proceeds by first training a teacher model on source domain labeled data.
no code implementations • 17 May 2022 • Sameer Khurana, Antoine Laurent, James Glass
We combine state-of-the-art multilingual acoustic frame-level speech representation learning model XLS-R with the Language Agnostic BERT Sentence Embedding (LaBSE) model to create an utterance-level multimodal multilingual speech encoder SAMU-XLSR.
2 code implementations • 13 Mar 2022 • Yuan Gong, Sameer Khurana, Andrew Rouditchenko, James Glass
Audio classification is an active research area with a wide range of applications.
no code implementations • 7 Oct 2021 • Sameer Khurana, Antoine Laurent, James Glass
We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2. 0 models for Automatic Speech Recognition (ASR) in resource-scarce languages.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • NeurIPS 2021 • Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass
We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 26 Nov 2020 • Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux
The performance of automatic speech recognition (ASR) systems typically degrades significantly when the training and test data domains are mismatched.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 4 Jun 2020 • Sameer Khurana, Antoine Laurent, James Glass
The audio encoder is trained to perform a speech-translation retrieval task in a contrastive learning framework.
no code implementations • 3 Jun 2020 • Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass
Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.
1 code implementation • 18 May 2020 • Adrian Łańcucki, Jan Chorowski, Guillaume Sanchez, Ricard Marxer, Nanxin Chen, Hans J. G. A. Dolfing, Sameer Khurana, Tanel Alumäe, Antoine Laurent
We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs.
no code implementations • 26 Sep 2019 • Sameer Khurana, Ahmed Ali, James Glass
We analyze the following; transfer learning from high resource broadcast domain to low-resource dialectal domain and semi-supervised learning where we use in-domain unlabeled audio data collected from YouTube.
no code implementations • EACL 2017 • Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{\'e} Bourlard, Jo{\~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{\~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • EACL 2017 • Fahim Dalvi, Yifan Zhang, Sameer Khurana, Nadir Durrani, Hassan Sajjad, Ahmed Abdelali, Hamdy Mubarak, Ahmed Ali, Stephan Vogel
This paper presents QCRI{'}s Arabic-to-English live speech translation system.
no code implementations • 19 Sep 2016 • Sameer Khurana, Ahmed Ali, Steve Renals
In this work, we present a new Vector Space Model (VSM) of speech utterances for the task of spoken dialect identification.
1 code implementation • 23 Sep 2015 • Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.