6 code implementations • 24 Jun 2020 • Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdel-rahman Mohamed, Michael Auli
This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
23 code implementations • NeurIPS 2020 • Alexei Baevski, Henry Zhou, Abdel-rahman Mohamed, Michael Auli
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
Ranked #1 on Speech Recognition on TIMIT (using extra training data)
no code implementations • 16 May 2020 • Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdel-rahman Mohamed
Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems.
2 code implementations • 17 Dec 2019 • Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux
Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).
Ranked #1 on Speech Recognition on Libri-Light test-other (ABX-within metric)
2 code implementations • 10 Nov 2019 • Alexei Baevski, Michael Auli, Abdel-rahman Mohamed
We compare self-supervised representation learning algorithms which either explicitly quantize the audio data or learn representations without quantization.
no code implementations • 9 Nov 2019 • Siddharth Dalmia, Abdel-rahman Mohamed, Mike Lewis, Florian Metze, Luke Zettlemoyer
Inspired by modular software design principles of independence, interchangeability, and clarity of interface, we introduce a method for enforcing encoder-decoder modularity in seq2seq models without sacrificing the overall model quality or its full differentiability.
44 code implementations • ACL 2020 • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdel-rahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.
Ranked #3 on Open-Domain Question Answering on ELI5
no code implementations • 22 Oct 2019 • Yongqiang Wang, Abdel-rahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer
We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition.
Ranked #25 on Speech Recognition on LibriSpeech test-other (using extra training data)
no code implementations • 30 Oct 2018 • Thomas Powers, Rasool Fakoor, Siamak Shakeri, Abhinav Sethy, Amanjit Kainth, Abdel-rahman Mohamed, Ruhi Sarikaya
Optimal selection of a subset of items from a given set is a hard problem that requires combinatorial optimization.
2 code implementations • 1 Sep 2017 • Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman
We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning.
Ranked #1 on Continuous Control on Cart Pole (OpenAI Gym)
no code implementations • 14 Apr 2017 • Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli
We then present a novel neural synthesis algorithm to search for programs in the DSL that are consistent with a given set of examples.
3 code implementations • ICML 2017 • Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli
Recently, two competing approaches for automatic program learning have received significant attention: (1) neural program synthesis, where a neural network is conditioned on input/output (I/O) examples and learns to generate a program, and (2) neural program induction, where a neural network generates new outputs directly using a latent program representation.
2 code implementations • ICML 2017 • Chong Wang, Yining Wang, Po-Sen Huang, Abdel-rahman Mohamed, Dengyong Zhou, Li Deng
The probability of a segmented sequence is calculated as the product of the probabilities of all its segments, where each segment is modeled using existing tools such as recurrent neural networks.
1 code implementation • 7 Nov 2016 • Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli
We present a method to improve video description generation by modeling higher-order interactions between video frames and described concepts.
no code implementations • 6 Nov 2016 • Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli
While achieving impressive results, these approaches have a number of important limitations: (a) they are computationally expensive and hard to train, (b) a model has to be trained for each task (program) separately, and (c) it is hard to interpret or verify the correctness of the learnt mapping (as it is defined by a neural network).
no code implementations • 19 Nov 2015 • Krzysztof J. Geras, Abdel-rahman Mohamed, Rich Caruana, Gregor Urban, Shengjie Wang, Ozlem Aslan, Matthai Philipose, Matthew Richardson, Charles Sutton
We consider whether deep convolutional networks (CNNs) can represent decision functions with similar accuracy as recurrent networks such as LSTMs.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 5 Sep 2013 • Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran
We find that with these improvements, particularly with fMLLR and dropout, we are able to achieve an additional 2-3% relative improvement in WER on a 50-hour Broadcast News task over our previous best CNN baseline.
5 code implementations • 22 Mar 2013 • Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton
Recurrent neural networks (RNNs) are a powerful model for sequential data.
Ranked #18 on Speech Recognition on TIMIT
no code implementations • Signal Processing Magazine 2012 • Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury
Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input.
no code implementations • NeurIPS 2010 • George Dahl, Marc'Aurelio Ranzato, Abdel-rahman Mohamed, Geoffrey E. Hinton
Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task.