Search Results for author: Kamil Ciosek

Found 25 papers, 7 papers with code

On the Importance of Uncertainty in Decision-Making with Large Language Models

no code implementations • 3 Apr 2024 • Nicolò Felicioni, Lucas Maystre, Sina Ghiassian, Kamil Ciosek

We compare this baseline to LLM bandits that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy.

Decision Making Multi-Armed Bandits +1

Paper
Add Code

Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

no code implementations • 13 Oct 2023 • Federico Tomasi, Joseph Cauteruccio, Surya Kanoria, Kamil Ciosek, Matteo Rinaldi, Zhenwen Dai

In this paper, we present a reinforcement learning framework that solves for such limitations by directly optimizing for user satisfaction metrics via the use of a simulated playlist-generation environment.

Collaborative Filtering reinforcement-learning

Paper
Add Code

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

1 code implementation • 19 Jul 2023 • Thomas M. McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek

In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards.

Recommendation Systems

Paper
Code

A Strong Baseline for Batch Imitation Learning

no code implementations • 6 Feb 2023 • Matthew Smith, Lucas Maystre, Zhenwen Dai, Kamil Ciosek

Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making.

Continuous Control Imitation Learning +3

Paper
Add Code

Imitation Learning by Reinforcement Learning

1 code implementation • ICLR 2022 • Kamil Ciosek

Imitation learning algorithms learn a policy from demonstrations of expert behavior.

Continuous Control Imitation Learning +2

Paper
Code

Information Directed Reward Learning for Reinforcement Learning

1 code implementation • NeurIPS 2021 • David Lindner, Matteo Turchetta, Sebastian Tschiatschek, Kamil Ciosek, Andreas Krause

For many reinforcement learning (RL) applications, specifying a reward is difficult.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Estimating $α$-Rank by Maximizing Information Gain

1 code implementation • 22 Jan 2021 • Tabish Rashid, Cheng Zhang, Kamil Ciosek

We show the benefits of using information gain as compared to the confidence interval criterion of ResponseGraphUCB (Rowland et al. 2019), and provide theoretical results justifying our method.

Paper
Code

Regularized Policies are Reward Robust

no code implementations • 18 Jan 2021 • Hisham Husain, Kamil Ciosek, Ryota Tomioka

Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Evaluating the Robustness of Collaborative Agents

no code implementations • 14 Jan 2021 • Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah

We apply this methodology to build a suite of unit tests for the Overcooked-AI environment, and use this test suite to evaluate three proposals for improving robustness.

Paper
Add Code

Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

no code implementations • 11 Jan 2021 • Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, Katja Hofmann

The optimal adaptive behaviour under uncertainty over the other agents' strategies w. r. t.

Meta-Learning reinforcement-learning +1

Paper
Add Code

DRIFT: Deep Reinforcement Learning for Functional Software Testing

no code implementations • 16 Jul 2020 • Luke Harries, Rebekah Storan Clarke, Timothy Chapman, Swamy V. P. L. N. Nallamalli, Levent Ozgur, Shuktika Jain, Alex Leung, Steve Lim, Aaron Dietrich, José Miguel Hernández-Lobato, Tom Ellis, Cheng Zhang, Kamil Ciosek

Efficient software testing is essential for productive software development and reliable user experiences.

Q-Learning reinforcement-learning +1

Paper
Add Code

Discount Factor as a Regularizer in Reinforcement Learning

1 code implementation • ICML 2020 • Ron Amit, Ron Meir, Kamil Ciosek

Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

AMRL: Aggregated Memory For Reinforcement Learning

no code implementations • ICLR 2020 • Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann

In many partially observable scenarios, Reinforcement Learning (RL) agents must rely on long-term memory in order to learn an optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Conservative Uncertainty Estimation By Fitting Prior Networks

no code implementations • ICLR 2020 • Kamil Ciosek, Vincent Fortuin, Ryota Tomioka, Katja Hofmann, Richard Turner

Obtaining high-quality uncertainty estimates is essential for many applications of deep neural networks.

Out-of-Distribution Detection

Paper
Add Code

Better Exploration with Optimistic Actor Critic

1 code implementation • NeurIPS 2019 • Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann

To address both of these phenomena, we introduce a new algorithm, Optimistic Actor Critic, which approximates a lower and upper confidence bound on the state-action value function.

Continuous Control Efficient Exploration

Paper
Code

Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck

1 code implementation • NeurIPS 2019 • Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann

We discuss those differences and propose modifications to existing regularization techniques in order to better adapt them to RL.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Better Exploration with Optimistic Actor-Critic

no code implementations • 28 Oct 2019 • Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann

To address both of these phenomena, we introduce a new algorithm, Optimistic Actor Critic, which approximates a lower and upper confidence bound on the state-action value function.

Continuous Control Efficient Exploration

Paper
Add Code

Multi-task Batch Reinforcement Learning with Metric Learning

no code implementations • NeurIPS 2020 • Jiachen Li, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Keith Ross, Henrik Iskov Christensen, Hao Su

To perform well, the policy must infer the task identity from collected transitions by modelling its dependency on states, actions and rewards.

Meta Reinforcement Learning Metric Learning +2

Paper
Add Code

Pre-training as Batch Meta Reinforcement Learning with tiMe

no code implementations • 25 Sep 2019 • Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Hao Su, Henrik Iskov Christensen

Combining ideas from Batch RL and Meta RL, we propose tiMe, which learns distillation of multiple value functions and MDP embeddings from only existing data.

Meta Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Fourier Policy Gradients

no code implementations • ICML 2018 • Matthew Fellows, Kamil Ciosek, Shimon Whiteson

We propose a new way of deriving policy gradient updates for reinforcement learning.

Reinforcement Learning (RL)

Paper
Add Code

Expected Policy Gradients for Reinforcement Learning

no code implementations • 10 Jan 2018 • Kamil Ciosek, Shimon Whiteson

For Gaussian policies, we introduce an exploration method that uses covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions.

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Expected Policy Gradients

no code implementations • 15 Jun 2017 • Kamil Ciosek, Shimon Whiteson

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning.

Paper
Add Code

Alternating Optimisation and Quadrature for Robust Control

no code implementations • 24 May 2016 • Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson

ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy.

Bayesian Optimisation

Paper
Add Code

Value Iteration with Options and State Aggregation

no code implementations • 16 Jan 2015 • Kamil Ciosek, David Silver

This paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction.

Paper
Add Code

Properties of the Least Squares Temporal Difference learning algorithm

no code implementations • 22 Jan 2013 • Kamil Ciosek

This paper presents four different ways of looking at the well-known Least Squares Temporal Differences (LSTD) algorithm for computing the value function of a Markov Reward Process, each of them leading to different insights: the operator-theory approach via the Galerkin method, the statistical approach via instrumental variables, the linear dynamical system view as well as the limit of the TD iteration.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.