Search Results for author: Ahmed Touati

Found 16 papers, 7 papers with code

Learning One Representation to Optimize All Rewards

1 code implementation NeurIPS 2021 Ahmed Touati, Yann Ollivier

In the test phase, a reward representation is estimated either from observations or an explicit reward description (e. g., a target state).

Efficient Learning in Non-Stationary Linear Markov Decision Processes

no code implementations24 Oct 2020 Ahmed Touati, Pascal Vincent

We study episodic reinforcement learning in non-stationary linear (a. k. a.

Maximum Reward Formulation In Reinforcement Learning

1 code implementation8 Oct 2020 Sai Krishna Gottipati, Yashaswi Pathak, Rohan Nuttall, Sahir, Raviteja Chunduru, Ahmed Touati, Sriram Ganapathi Subramanian, Matthew E. Taylor, Sarath Chandar

Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon).

Drug Discovery reinforcement-learning

Sharp Analysis of Smoothed Bellman Error Embedding

no code implementations7 Jul 2020 Ahmed Touati, Pascal Vincent

The \textit{Smoothed Bellman Error Embedding} algorithm~\citep{dai2018sbeed}, known as SBEED, was proposed as a provably convergent reinforcement learning algorithm with general nonlinear function approximation.


TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

no code implementations6 Jul 2020 Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers.

Stable Policy Optimization via Off-Policy Divergence Regularization

1 code implementation9 Mar 2020 Ahmed Touati, Amy Zhang, Joelle Pineau, Pascal Vincent

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL).


Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

no code implementations9 Mar 2020 Ahmed Touati, Adrien Ali Taiga, Marc G. Bellemare

Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces.


Stochastic Neural Network with Kronecker Flow

no code implementations10 Jun 2019 Chin-wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville

Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to scale to the high-dimensional setting of stochastic neural networks.

Multi-Armed Bandits Variational Inference

SVRG for Policy Evaluation with Fewer Gradient Evaluations

1 code implementation9 Jun 2019 Zilun Peng, Ahmed Touati, Pascal Vincent, Doina Precup

SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy.

Separating value functions across time-scales

1 code implementation5 Feb 2019 Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning.

Randomized Value Functions via Multiplicative Normalizing Flows

1 code implementation6 Jun 2018 Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent

In particular, we augment DQN and DDPG with multiplicative normalizing flows in order to track a rich approximate posterior distribution over the parameters of the value function.

Efficient Exploration

Learnable Explicit Density for Continuous Latent Space and Variational Inference

no code implementations6 Oct 2017 Chin-wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville

In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior.

Density Estimation Variational Inference

Parametric Adversarial Divergences are Good Losses for Generative Modeling

no code implementations ICLR 2018 Gabriel Huang, Hugo Berard, Ahmed Touati, Gauthier Gidel, Pascal Vincent, Simon Lacoste-Julien

Parametric adversarial divergences, which are a generalization of the losses used to train generative adversarial networks (GANs), have often been described as being approximations of their nonparametric counterparts, such as the Jensen-Shannon divergence, which can be derived under the so-called optimal discriminator assumption.

Structured Prediction

Convergent Tree Backup and Retrace with Function Approximation

no code implementations ICML 2018 Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy.

Cannot find the paper you are looking for? You can Submit a new open access paper.