Search Results for author: Kamil Ciosek

Found 25 papers, 7 papers with code

On the Importance of Uncertainty in Decision-Making with Large Language Models

no code implementations3 Apr 2024 Nicolò Felicioni, Lucas Maystre, Sina Ghiassian, Kamil Ciosek

We compare this baseline to LLM bandits that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy.

Decision Making Multi-Armed Bandits +1

Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

no code implementations13 Oct 2023 Federico Tomasi, Joseph Cauteruccio, Surya Kanoria, Kamil Ciosek, Matteo Rinaldi, Zhenwen Dai

In this paper, we present a reinforcement learning framework that solves for such limitations by directly optimizing for user satisfaction metrics via the use of a simulated playlist-generation environment.

Collaborative Filtering reinforcement-learning

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

1 code implementation19 Jul 2023 Thomas M. McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek

In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards.

Recommendation Systems

A Strong Baseline for Batch Imitation Learning

no code implementations6 Feb 2023 Matthew Smith, Lucas Maystre, Zhenwen Dai, Kamil Ciosek

Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making.

Continuous Control Imitation Learning +3

Estimating $α$-Rank by Maximizing Information Gain

1 code implementation22 Jan 2021 Tabish Rashid, Cheng Zhang, Kamil Ciosek

We show the benefits of using information gain as compared to the confidence interval criterion of ResponseGraphUCB (Rowland et al. 2019), and provide theoretical results justifying our method.

Regularized Policies are Reward Robust

no code implementations18 Jan 2021 Hisham Husain, Kamil Ciosek, Ryota Tomioka

Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Evaluating the Robustness of Collaborative Agents

no code implementations14 Jan 2021 Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah

We apply this methodology to build a suite of unit tests for the Overcooked-AI environment, and use this test suite to evaluate three proposals for improving robustness.

Discount Factor as a Regularizer in Reinforcement Learning

1 code implementation ICML 2020 Ron Amit, Ron Meir, Kamil Ciosek

Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor.

reinforcement-learning Reinforcement Learning (RL)

AMRL: Aggregated Memory For Reinforcement Learning

no code implementations ICLR 2020 Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann

In many partially observable scenarios, Reinforcement Learning (RL) agents must rely on long-term memory in order to learn an optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Better Exploration with Optimistic Actor Critic

1 code implementation NeurIPS 2019 Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann

To address both of these phenomena, we introduce a new algorithm, Optimistic Actor Critic, which approximates a lower and upper confidence bound on the state-action value function.

Continuous Control Efficient Exploration

Better Exploration with Optimistic Actor-Critic

no code implementations28 Oct 2019 Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann

To address both of these phenomena, we introduce a new algorithm, Optimistic Actor Critic, which approximates a lower and upper confidence bound on the state-action value function.

Continuous Control Efficient Exploration

Pre-training as Batch Meta Reinforcement Learning with tiMe

no code implementations25 Sep 2019 Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Hao Su, Henrik Iskov Christensen

Combining ideas from Batch RL and Meta RL, we propose tiMe, which learns distillation of multiple value functions and MDP embeddings from only existing data.

Meta Reinforcement Learning reinforcement-learning +1

Fourier Policy Gradients

no code implementations ICML 2018 Matthew Fellows, Kamil Ciosek, Shimon Whiteson

We propose a new way of deriving policy gradient updates for reinforcement learning.

Reinforcement Learning (RL)

Expected Policy Gradients for Reinforcement Learning

no code implementations10 Jan 2018 Kamil Ciosek, Shimon Whiteson

For Gaussian policies, we introduce an exploration method that uses covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions.

Policy Gradient Methods reinforcement-learning +1

Expected Policy Gradients

no code implementations15 Jun 2017 Kamil Ciosek, Shimon Whiteson

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning.

Alternating Optimisation and Quadrature for Robust Control

no code implementations24 May 2016 Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson

ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy.

Bayesian Optimisation

Value Iteration with Options and State Aggregation

no code implementations16 Jan 2015 Kamil Ciosek, David Silver

This paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction.

Properties of the Least Squares Temporal Difference learning algorithm

no code implementations22 Jan 2013 Kamil Ciosek

This paper presents four different ways of looking at the well-known Least Squares Temporal Differences (LSTD) algorithm for computing the value function of a Markov Reward Process, each of them leading to different insights: the operator-theory approach via the Galerkin method, the statistical approach via instrumental variables, the linear dynamical system view as well as the limit of the TD iteration.

Cannot find the paper you are looking for? You can Submit a new open access paper.