Search Results for author: Abdelrahman Mohamed

Found 23 papers, 10 papers with code

Self-Supervised Speech Representation Learning: A Review

no code implementations21 May 2022 Abdelrahman Mohamed, Hung-Yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years.

Automatic Speech Recognition Representation Learning

Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT

no code implementations15 May 2022 Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu

This paper investigates self-supervised pre-training for audio-visual speaker representation learning where a visual stream showing the speaker's mouth area is used alongside speech as inputs.

Representation Learning Speaker Verification

Federated Learning with Partial Model Personalization

no code implementations8 Apr 2022 Krishna Pillutla, Kshitiz Malik, Abdelrahman Mohamed, Michael Rabbat, Maziar Sanjabi, Lin Xiao

We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices.

Federated Learning

textless-lib: a Library for Textless Spoken Language Processing

1 code implementation15 Feb 2022 Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources.

Resynthesis

Object Detection in Aerial Images: What Improves the Accuracy?

no code implementations21 Jan 2022 Hashmat Shadab Malik, Ikboljon Sobirov, Abdelrahman Mohamed

In this work, we investigate the impact of Faster R-CNN for aerial object detection and explore numerous strategies to improve its performance for aerial images.

Object Detection In Aerial Images

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction

1 code implementation ICLR 2022 Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed

The lip-reading WER is further reduced to 26. 9% when using all 433 hours of labeled data from LRS3 and combined with self-training.

 Ranked #1 on Lipreading on LRS3-TED (using extra training data)

Automatic Speech Recognition Lipreading +2

Robust Self-Supervised Audio-Visual Speech Recognition

1 code implementation5 Jan 2022 Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed

Audio-based automatic speech recognition (ASR) degrades significantly in noisy environments and is particularly vulnerable to interfering speech, as the model cannot determine which speaker to transcribe.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Textless Speech Emotion Conversion using Discrete and Decomposed Representations

no code implementations14 Nov 2021 Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion.

Scaling ASR Improves Zero and Few Shot Learning

no code implementations10 Nov 2021 Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

With 4. 5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition.

Automatic Speech Recognition Few-Shot Learning

Text-Free Prosody-Aware Generative Spoken Language Modeling

1 code implementation ACL 2022 Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu

Generative Spoken Language Modeling (GSLM) \cite{Lakhotia2021} is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences.

Language Modelling

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

no code implementations14 Jun 2021 Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR).

Frame Speech Recognition

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

4 code implementations14 Jun 2021 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed

Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation.

Ranked #3 on Speech Recognition on LibriSpeech test-other (using extra training data)

Representation Learning Speech Recognition

Contrastive Semi-supervised Learning for ASR

no code implementations9 Mar 2021 Alex Xiao, Christian Fuegen, Abdelrahman Mohamed

Pseudo-labeling is the most adopted method for pre-training automatic speech recognition (ASR) models.

Automatic Speech Recognition Representation Learning

The Obata first eigenvalue theorems on a seven dimensional quaternionic contact manifold

no code implementations31 Dec 2020 Abdelrahman Mohamed, Dimiter Vassilev

We show that a compact quaternionic contact manifold of dimension seven that satisfies a Lichnerowicz-type lower Ricci-type bound and has the $P$-function of any eigenfunction of the sub-Laplacian non-negative achieves its smallest possible eigenvalue only if the structure is qc-Einstein.

Differential Geometry Analysis of PDEs

Transformers with convolutional context for ASR

3 code implementations26 Apr 2019 Abdelrahman Mohamed, Dmytro Okhonko, Luke Zettlemoyer

The recent success of transformer networks for neural machine translation and other NLP tasks has led to a surge in research work trying to apply it for speech recognition.

Machine Translation Speech Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.