Search Results for author: Abdelrahman Mohamed

Found 14 papers, 6 papers with code

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction

1 code implementation5 Jan 2022 Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed

The lip-reading WER is further reduced to 26. 9% when using all 433 hours of labeled data from LRS3 and combined with self-training.

 Ranked #1 on Lipreading on LRS3-TED (using extra training data)

Lipreading Lip Reading +2

Robust Self-Supervised Audio-Visual Speech Recognition

1 code implementation5 Jan 2022 Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed

Audio-based automatic speech recognition (ASR) degrades significantly in noisy environments and is particularly vulnerable to interfering speech, as the model cannot determine which speaker to transcribe.

Audio-Visual Speech Recognition Lipreading +2

Textless Speech Emotion Conversion using Decomposed and Discrete Representations

no code implementations14 Nov 2021 Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.

Scaling ASR Improves Zero and Few Shot Learning

no code implementations10 Nov 2021 Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

With 4. 5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition.

Few-Shot Learning Speech Recognition

Text-Free Prosody-Aware Generative Spoken Language Modeling

no code implementations7 Sep 2021 Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu

Generative Spoken Language Modeling (GSLM) (Lakhotia et al., 2021) is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences.

Language Modelling

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

no code implementations14 Jun 2021 Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR).

Speech Recognition

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

2 code implementations14 Jun 2021 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed

Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation.

Language Modelling Representation Learning

Contrastive Semi-supervised Learning for ASR

no code implementations9 Mar 2021 Alex Xiao, Christian Fuegen, Abdelrahman Mohamed

Pseudo-labeling is the most adopted method for pre-training automatic speech recognition (ASR) models.

Representation Learning Speech Recognition

The Obata first eigenvalue theorems on a seven dimensional quaternionic contact manifold

no code implementations31 Dec 2020 Abdelrahman Mohamed, Dimiter Vassilev

We show that a compact quaternionic contact manifold of dimension seven that satisfies a Lichnerowicz-type lower Ricci-type bound and has the $P$-function of any eigenfunction of the sub-Laplacian non-negative achieves its smallest possible eigenvalue only if the structure is qc-Einstein.

Differential Geometry Analysis of PDEs

Transformers with convolutional context for ASR

3 code implementations26 Apr 2019 Abdelrahman Mohamed, Dmytro Okhonko, Luke Zettlemoyer

The recent success of transformer networks for neural machine translation and other NLP tasks has led to a surge in research work trying to apply it for speech recognition.

Machine Translation Speech Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.