Search Results for author: Abdel-rahman Mohamed

Found 22 papers, 11 papers with code

Unsupervised Cross-lingual Representation Learning for Speech Recognition

4 code implementations24 Jun 2020 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdel-rahman Mohamed, Michael Auli

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

Quantization Representation Learning +1

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

17 code implementations NeurIPS 2020 Alexei Baevski, Henry Zhou, Abdel-rahman Mohamed, Michael Auli

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

 Ranked #1 on Speech Recognition on TIMIT (using extra training data)

Quantization Self-Supervised Learning +1

Large scale weakly and semi-supervised learning for low-resource video ASR

no code implementations16 May 2020 Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdel-rahman Mohamed

Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems.

Speech Recognition

Libri-Light: A Benchmark for ASR with Limited or No Supervision

1 code implementation17 Dec 2019 Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux

Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).

 Ranked #1 on Speech Recognition on Libri-Light test-other (ABX-across metric)

Speech Recognition

Effectiveness of self-supervised pre-training for speech recognition

2 code implementations10 Nov 2019 Alexei Baevski, Michael Auli, Abdel-rahman Mohamed

We compare self-supervised representation learning algorithms which either explicitly quantize the audio data or learn representations without quantization.

Quantization Representation Learning +1

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models

no code implementations9 Nov 2019 Siddharth Dalmia, Abdel-rahman Mohamed, Mike Lewis, Florian Metze, Luke Zettlemoyer

Inspired by modular software design principles of independence, interchangeability, and clarity of interface, we introduce a method for enforcing encoder-decoder modularity in seq2seq models without sacrificing the overall model quality or its full differentiability.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

31 code implementations ACL 2020 Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdel-rahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.

Abstractive Text Summarization Denoising +5

Differentiable Greedy Networks

no code implementations30 Oct 2018 Thomas Powers, Rasool Fakoor, Siamak Shakeri, Abhinav Sethy, Amanjit Kainth, Abdel-rahman Mohamed, Ruhi Sarikaya

Optimal selection of a subset of items from a given set is a hard problem that requires combinatorial optimization.

Combinatorial Optimization Informativeness

Mean Actor Critic

2 code implementations1 Sep 2017 Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning.

Atari Games reinforcement-learning

Deep API Programmer: Learning to Program with APIs

no code implementations14 Apr 2017 Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli

We then present a novel neural synthesis algorithm to search for programs in the DSL that are consistent with a given set of examples.

RobustFill: Neural Program Learning under Noisy I/O

3 code implementations ICML 2017 Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli

Recently, two competing approaches for automatic program learning have received significant attention: (1) neural program synthesis, where a neural network is conditioned on input/output (I/O) examples and learns to generate a program, and (2) neural program induction, where a neural network generates new outputs directly using a latent program representation.

Program induction Program Synthesis

Sequence Modeling via Segmentations

2 code implementations ICML 2017 Chong Wang, Yining Wang, Po-Sen Huang, Abdel-rahman Mohamed, Dengyong Zhou, Li Deng

The probability of a segmented sequence is calculated as the product of the probabilities of all its segments, where each segment is modeled using existing tools such as recurrent neural networks.

Speech Recognition Text Segmentation

Memory-augmented Attention Modelling for Videos

1 code implementation7 Nov 2016 Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli

We present a method to improve video description generation by modeling higher-order interactions between video frames and described concepts.

Video Description

Neuro-Symbolic Program Synthesis

no code implementations6 Nov 2016 Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli

While achieving impressive results, these approaches have a number of important limitations: (a) they are computationally expensive and hard to train, (b) a model has to be trained for each task (program) separately, and (c) it is hard to interpret or verify the correctness of the learnt mapping (as it is defined by a neural network).

Program induction Program Synthesis

Blending LSTMs into CNNs

no code implementations19 Nov 2015 Krzysztof J. Geras, Abdel-rahman Mohamed, Rich Caruana, Gregor Urban, Shengjie Wang, Ozlem Aslan, Matthai Philipose, Matthew Richardson, Charles Sutton

We consider whether deep convolutional networks (CNNs) can represent decision functions with similar accuracy as recurrent networks such as LSTMs.

Automatic Speech Recognition Inductive Bias +1

Improvements to deep convolutional neural networks for LVCSR

no code implementations5 Sep 2013 Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

We find that with these improvements, particularly with fMLLR and dropout, we are able to achieve an additional 2-3% relative improvement in WER on a 50-hour Broadcast News task over our previous best CNN baseline.

Speech Recognition

Deep Neural Networks for Acoustic Modeling in Speech Recognition

no code implementations Signal Processing Magazine 2012 Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input.

Speech Recognition

Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

no code implementations NeurIPS 2010 George Dahl, Marc'Aurelio Ranzato, Abdel-rahman Mohamed, Geoffrey E. Hinton

Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task.

Cannot find the paper you are looking for? You can Submit a new open access paper.