no code implementations • 24 Jul 2018 • Timothy A. Mann, Sven Gowal, András György, Ray Jiang, Huiyi Hu, Balaji Lakshminarayanan, Prav Srinivasan
Predicting delayed outcomes is an important problem in recommender systems (e. g., if customers will finish reading an ebook).
no code implementations • 11 Mar 2018 • Esther Derman, Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
It learns an optimal policy with respect to a distribution over an uncertainty set and stays robust to model uncertainty but avoids the conservativeness of robust strategies.
1 code implementation • ICLR 2019 • Ray Jiang, Sven Gowal, Timothy A. Mann, Danilo J. Rezende
The conventional solution to the recommendation problem greedily ranks individual document candidates by prediction scores.
no code implementations • 9 Feb 2018 • Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor
We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.
no code implementations • 30 Dec 2016 • Timothy A. Mann, Hugo Penedones, Shie Mannor, Todd Hester
Temporal Difference learning or TD($\lambda$) is a fundamental algorithm in the field of reinforcement learning.
no code implementations • NeurIPS 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.
no code implementations • 10 Feb 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.
no code implementations • 10 Feb 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation.
no code implementations • 11 Jun 2015 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
The monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions.
no code implementations • 16 Apr 2015 • Nir Levine, Timothy A. Mann, Shie Mannor
Twitter, a popular social network, presents great opportunities for on-line machine learning research.
no code implementations • NeurIPS 2014 • Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor
In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$.