no code implementations • 19 Mar 2024 • Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati

Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.

no code implementations • 3 Nov 2023 • Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum

Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.

no code implementations • 24 Oct 2022 • Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta

We study the problem of representation learning in stochastic contextual linear bandits.

1 code implementation • 29 Sep 2022 • Ahmed Touati, Jérémy Rapin, Yann Ollivier

A zero-shot RL agent is an agent that can solve any RL task in a given environment, instantly with no additional planning or learning, after an initial reward-free learning phase.

2 code implementations • NeurIPS 2021 • Ahmed Touati, Yann Ollivier

In the test phase, a reward representation is estimated either from observations or an explicit reward description (e. g., a target state).

no code implementations • 24 Oct 2020 • Ahmed Touati, Pascal Vincent

We study episodic reinforcement learning in non-stationary linear (a. k. a.

1 code implementation • 8 Oct 2020 • Sai Krishna Gottipati, Yashaswi Pathak, Rohan Nuttall, Sahir, Raviteja Chunduru, Ahmed Touati, Sriram Ganapathi Subramanian, Matthew E. Taylor, Sarath Chandar

Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon).

no code implementations • 7 Jul 2020 • Ahmed Touati, Pascal Vincent

The \textit{Smoothed Bellman Error Embedding} algorithm~\citep{dai2018sbeed}, known as SBEED, was proposed as a provably convergent reinforcement learning algorithm with general nonlinear function approximation.

no code implementations • 6 Jul 2020 • Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers.

1 code implementation • 9 Mar 2020 • Ahmed Touati, Amy Zhang, Joelle Pineau, Pascal Vincent

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL).

no code implementations • 9 Mar 2020 • Ahmed Touati, Adrien Ali Taiga, Marc G. Bellemare

Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces.

no code implementations • 10 Jun 2019 • Chin-wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville

Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to scale to the high-dimensional setting of stochastic neural networks.

1 code implementation • 9 Jun 2019 • Zilun Peng, Ahmed Touati, Pascal Vincent, Doina Precup

SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy.

no code implementations • ICLR 2019 • Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra

This paper focuses on building a model that reasons about the long-term future and demonstrates how to use this for efficient planning and exploration.

no code implementations • 5 Mar 2019 • Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra

This paper focuses on building a model that reasons about the long-term future and demonstrates how to use this for efficient planning and exploration.

1 code implementation • 5 Feb 2019 • Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning.

1 code implementation • 6 Jun 2018 • Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent

In particular, we augment DQN and DDPG with multiplicative normalizing flows in order to track a rich approximate posterior distribution over the parameters of the value function.

no code implementations • 6 Oct 2017 • Chin-wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville

In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior.

no code implementations • ICLR 2018 • Gabriel Huang, Hugo Berard, Ahmed Touati, Gauthier Gidel, Pascal Vincent, Simon Lacoste-Julien

Parametric adversarial divergences, which are a generalization of the losses used to train generative adversarial networks (GANs), have often been described as being approximations of their nonparametric counterparts, such as the Jensen-Shannon divergence, which can be derived under the so-called optimal discriminator assumption.

no code implementations • ICML 2018 • Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.