no code implementations • 12 Nov 2017 • Bogdan Mazoure, Riashat Islam
We investigate the use of alternative divergences to Kullback-Leibler (KL) in variational inference(VI), based on the Variational Dropout \cite{kingma2015}.
1 code implementation • 13 May 2018 • Thang Doan, Bogdan Mazoure, Clare Lyle
Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation.
3 code implementations • 31 Jul 2018 • Thang Doan, Joao Monteiro, Isabela Albuquerque, Bogdan Mazoure, Audrey Durand, Joelle Pineau, R. Devon Hjelm
We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support.
no code implementations • WS 2018 • Bogdan Mazoure, Thang Doan, Saibal Ray
Generative models have recently experienced a surge in popularity due to the development of more efficient training algorithms and increasing computational power.
no code implementations • 1 Feb 2019 • Atta Norouzian, Bogdan Mazoure, Dermot Connolly, Daniel Willett
This would require the VA to have the capability to detect the speech that is being directed at it and respond accordingly.
1 code implementation • 16 May 2019 • Bogdan Mazoure, Thang Doan, Audrey Durand, R. Devon Hjelm, Joelle Pineau
The ability to discover approximately optimal policies in domains with sparse rewards is crucial to applying reinforcement learning (RL) in many real-world scenarios.
1 code implementation • 6 Jun 2019 • Cody Mazza-Anthony, Bogdan Mazoure, Mark Coates
We propose two novel estimators based on the Ordered Weighted $\ell_1$ (OWL) norm: 1) The Graphical OWL (GOWL) is a penalized likelihood method that applies the OWL norm to the lower triangle components of the precision matrix.
no code implementations • 17 Sep 2019 • Thang Doan, Bogdan Mazoure, Moloud Abdar, Audrey Durand, Joelle Pineau, R. Devon Hjelm
Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensional state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions.
no code implementations • 12 Nov 2019 • Tianyu Li, Bogdan Mazoure, Doina Precup, Guillaume Rabusseau
Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning.
no code implementations • 7 Feb 2020 • Bogdan Mazoure, Thang Doan, Tianyu Li, Vladimir Makarenkov, Joelle Pineau, Doina Precup, Guillaume Rabusseau
We propose a general framework for policy representation for reinforcement learning tasks.
1 code implementation • NeurIPS 2020 • Bogdan Mazoure, Remi Tachet des Combes, Thang Doan, Philip Bachman, R. Devon Hjelm
We begin with the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems.
2 code implementations • 7 Oct 2020 • Thang Doan, Mehdi Bennani, Bogdan Mazoure, Guillaume Rabusseau, Pierre Alquier
Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data during its entire lifetime.
no code implementations • 1 Jun 2021 • Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, Adith Swaminathan
Targeting immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric.
1 code implementation • ICLR 2022 • Bogdan Mazoure, Ahmed M. Ahmed, Patrick MacAlpine, R Devon Hjelm, Andrey Kolobov
A highly desirable property of a reinforcement learning (RL) agent -- and a major difficulty for deep RL approaches -- is the ability to generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training.
no code implementations • 29 Nov 2021 • Bogdan Mazoure, Ilya Kostrikov, Ofir Nachum, Jonathan Tompson
We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.
1 code implementation • 19 Mar 2022 • R Devon Hjelm, Bogdan Mazoure, Florian Golemo, Felipe Frujeri, Mihai Jalobeanu, Andrey Kolobov
A broad challenge of research on generalization for sequential decision-making tasks in interactive environments is designing benchmarks that clearly landmark progress.
no code implementations • 8 Jun 2022 • Tianyu Li, Bogdan Mazoure, Guillaume Rabusseau
Although WFAs have been extended to deal with continuous input data, namely continuous WFAs (CWFAs), it is still unclear how to approximate density functions over sequences of continuous random variables using WFA-based models, due to the limitation on the expressiveness of the model as well as the tractability of approximating density functions via CWFAs.
no code implementations • 3 Nov 2022 • Bogdan Mazoure, Benjamin Eysenbach, Ofir Nachum, Jonathan Tompson
In this paper, we propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics.
no code implementations • 31 Mar 2023 • Bogdan Mazoure, Jake Bruce, Doina Precup, Rob Fergus, Ankit Anand
In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
no code implementations • 9 Jun 2023 • Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev, Josh Susskind
A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data.
no code implementations • 26 Oct 2023 • Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev
We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks.