Search Results for author: Ahmed Touati

Found 20 papers, 7 papers with code

Convergent Tree Backup and Retrace with Function Approximation

no code implementations ICML 2018 Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy.

Parametric Adversarial Divergences are Good Losses for Generative Modeling

no code implementations ICLR 2018 Gabriel Huang, Hugo Berard, Ahmed Touati, Gauthier Gidel, Pascal Vincent, Simon Lacoste-Julien

Parametric adversarial divergences, which are a generalization of the losses used to train generative adversarial networks (GANs), have often been described as being approximations of their nonparametric counterparts, such as the Jensen-Shannon divergence, which can be derived under the so-called optimal discriminator assumption.

Structured Prediction

Learnable Explicit Density for Continuous Latent Space and Variational Inference

no code implementations6 Oct 2017 Chin-wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville

In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior.

Density Estimation Variational Inference

Randomized Value Functions via Multiplicative Normalizing Flows

1 code implementation6 Jun 2018 Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent

In particular, we augment DQN and DDPG with multiplicative normalizing flows in order to track a rich approximate posterior distribution over the parameters of the value function.

Efficient Exploration Thompson Sampling

Separating value functions across time-scales

1 code implementation5 Feb 2019 Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning.

Reinforcement Learning (RL)

SVRG for Policy Evaluation with Fewer Gradient Evaluations

1 code implementation9 Jun 2019 Zilun Peng, Ahmed Touati, Pascal Vincent, Doina Precup

SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy.

Reinforcement Learning (RL)

Stochastic Neural Network with Kronecker Flow

no code implementations10 Jun 2019 Chin-wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville

Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to scale to the high-dimensional setting of stochastic neural networks.

Multi-Armed Bandits Thompson Sampling +1

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

no code implementations9 Mar 2020 Ahmed Touati, Adrien Ali Taiga, Marc G. Bellemare

Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces.

reinforcement-learning Reinforcement Learning (RL)

Stable Policy Optimization via Off-Policy Divergence Regularization

1 code implementation9 Mar 2020 Ahmed Touati, Amy Zhang, Joelle Pineau, Pascal Vincent

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL).

Reinforcement Learning (RL)

TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

no code implementations6 Jul 2020 Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers.

Sharp Analysis of Smoothed Bellman Error Embedding

no code implementations7 Jul 2020 Ahmed Touati, Pascal Vincent

The \textit{Smoothed Bellman Error Embedding} algorithm~\citep{dai2018sbeed}, known as SBEED, was proposed as a provably convergent reinforcement learning algorithm with general nonlinear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Maximum Reward Formulation In Reinforcement Learning

1 code implementation8 Oct 2020 Sai Krishna Gottipati, Yashaswi Pathak, Rohan Nuttall, Sahir, Raviteja Chunduru, Ahmed Touati, Sriram Ganapathi Subramanian, Matthew E. Taylor, Sarath Chandar

Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon).

Drug Discovery reinforcement-learning +1

Efficient Learning in Non-Stationary Linear Markov Decision Processes

no code implementations24 Oct 2020 Ahmed Touati, Pascal Vincent

We study episodic reinforcement learning in non-stationary linear (a. k. a.

Learning One Representation to Optimize All Rewards

2 code implementations NeurIPS 2021 Ahmed Touati, Yann Ollivier

In the test phase, a reward representation is estimated either from observations or an explicit reward description (e. g., a target state).

Does Zero-Shot Reinforcement Learning Exist?

1 code implementation29 Sep 2022 Ahmed Touati, Jérémy Rapin, Yann Ollivier

A zero-shot RL agent is an agent that can solve any RL task in a given environment, instantly with no additional planning or learning, after an initial reward-free learning phase.

Contrastive Learning reinforcement-learning +2

SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

no code implementations3 Nov 2023 Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum

Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.

Contrastive Learning reinforcement-learning +1

Simple Ingredients for Offline Reinforcement Learning

no code implementations19 Mar 2024 Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati

Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.

D4RL reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.