no code implementations • ICLR 2022 • Ankesh Anand, Jacob Walker, Yazhe Li, Eszter Vértes, Julian Schrittwieser, Sherjil Ozair, Théophane Weber, Jessica B. Hamrick
One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks.
Ranked #1 on
Meta-Learning
on ML10
(Meta-test success rate (zero-shot) metric)
no code implementations • ICLR 2022 • Ioannis Antonoglou, Julian Schrittwieser, Sherjil Ozair, Thomas K Hubert, David Silver
However, previous instantiations of this approach were limited to the use of deterministic models.
1 code implementation • ICML Workshop URL 2021 • Mina Khan, P Srivatsa, Advait Rane, Shriram Chenniappa, Rishabh Anand, Sherjil Ozair, Pattie Maes
Data-efficiency and generalization are key challenges in deep learning and deep reinforcement learning as many models are trained on large-scale, domain-specific, and expensive-to-label datasets.
no code implementations • 8 Jun 2021 • Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals
Our key insight is to use discrete autoencoders to capture the multiple possible effects of an action in a stochastic environment.
1 code implementation • 25 Dec 2019 • Alex Lamb, Sherjil Ozair, Vikas Verma, David Ha
In this work we focus on their ability to have invariance towards the presence or absence of details.
3 code implementations • NeurIPS 2019 • Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre Côté, R. Devon Hjelm
State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent agents that can perform a wide variety of tasks.
no code implementations • 22 May 2019 • Jonathan Binas, Sherjil Ozair, Yoshua Bengio
Unsupervised exploration and representation learning become increasingly important when learning in diverse and sparse environments.
3 code implementations • 16 May 2019 • Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker
Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging.
no code implementations • NeurIPS 2019 • Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron van den Oord, Sergey Levine, Pierre Sermanet
Mutual information maximization has emerged as a powerful learning objective for unsupervised representation learning obtaining state-of-the-art performance in applications such as object recognition, speech recognition, and reinforcement learning.
2 code implementations • 24 Jan 2019 • Rithesh Kumar, Sherjil Ozair, Anirudh Goyal, Aaron Courville, Yoshua Bengio
Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient.
no code implementations • ICML 2018 • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm
We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.
18 code implementations • 12 Jan 2018 • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R. Devon Hjelm
We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.
no code implementations • ICLR 2018 • Brady Neal, Alex Lamb, Sherjil Ozair, Devon Hjelm, Aaron Courville, Yoshua Bengio, Ioannis Mitliagkas
One of the most successful techniques in generative models has been decomposing a complicated generation task into a series of simpler generation tasks.
29 code implementations • 8 Dec 2015 • Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu
We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.
Ranked #1 on
Noisy Speech Recognition
on CHiME clean
1 code implementation • NeurIPS 2014 • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
We propose a new framework for estimating generative models via adversarial nets, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.
no code implementations • 2 Oct 2014 • Sherjil Ozair, Yoshua Bengio
The objective is to learn an encoder $f(\cdot)$ that maps $X$ to $f(X)$ that has a much simpler distribution than $X$ itself, estimated by $P(H)$.
no code implementations • 2 Sep 2014 • Li Yao, Sherjil Ozair, Kyunghyun Cho, Yoshua Bengio
Orderless NADEs are trained based on a criterion that stochastically maximizes $P(\mathbf{x})$ with all possible orders of factorizations.
175 code implementations • Proceedings of the 27th International Conference on Neural Information Processing Systems 2014 • Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake.
Super-Resolution
Time-Series Few-Shot Learning with Heterogeneous Channels
no code implementations • 19 Dec 2013 • Sherjil Ozair, Li Yao, Yoshua Bengio
Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data generating distribution.