You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 6 Feb 2023 • Matthew Smith, Lucas Maystre, Zhenwen Dai, Kamil Ciosek

Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making.

1 code implementation • ICLR 2022 • Kamil Ciosek

Imitation learning algorithms learn a policy from demonstrations of expert behavior.

1 code implementation • NeurIPS 2021 • David Lindner, Matteo Turchetta, Sebastian Tschiatschek, Kamil Ciosek, Andreas Krause

For many reinforcement learning (RL) applications, specifying a reward is difficult.

1 code implementation • 22 Jan 2021 • Tabish Rashid, Cheng Zhang, Kamil Ciosek

We show the benefits of using information gain as compared to the confidence interval criterion of ResponseGraphUCB (Rowland et al. 2019), and provide theoretical results justifying our method.

no code implementations • 18 Jan 2021 • Hisham Husain, Kamil Ciosek, Ryota Tomioka

Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy.

no code implementations • 14 Jan 2021 • Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah

We apply this methodology to build a suite of unit tests for the Overcooked-AI environment, and use this test suite to evaluate three proposals for improving robustness.

no code implementations • 11 Jan 2021 • Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, Katja Hofmann

The optimal adaptive behaviour under uncertainty over the other agents' strategies w. r. t.

no code implementations • 16 Jul 2020 • Luke Harries, Rebekah Storan Clarke, Timothy Chapman, Swamy V. P. L. N. Nallamalli, Levent Ozgur, Shuktika Jain, Alex Leung, Steve Lim, Aaron Dietrich, José Miguel Hernández-Lobato, Tom Ellis, Cheng Zhang, Kamil Ciosek

Efficient software testing is essential for productive software development and reliable user experiences.

1 code implementation • ICML 2020 • Ron Amit, Ron Meir, Kamil Ciosek

Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor.

no code implementations • ICLR 2020 • Kamil Ciosek, Vincent Fortuin, Ryota Tomioka, Katja Hofmann, Richard Turner

Obtaining high-quality uncertainty estimates is essential for many applications of deep neural networks.

no code implementations • ICLR 2020 • Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann

In many partially observable scenarios, Reinforcement Learning (RL) agents must rely on long-term memory in order to learn an optimal policy.

1 code implementation • NeurIPS 2019 • Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann

To address both of these phenomena, we introduce a new algorithm, Optimistic Actor Critic, which approximates a lower and upper confidence bound on the state-action value function.

no code implementations • 28 Oct 2019 • Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann

To address both of these phenomena, we introduce a new algorithm, Optimistic Actor Critic, which approximates a lower and upper confidence bound on the state-action value function.

1 code implementation • NeurIPS 2019 • Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann

We discuss those differences and propose modifications to existing regularization techniques in order to better adapt them to RL.

no code implementations • NeurIPS 2020 • Jiachen Li, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Keith Ross, Henrik Iskov Christensen, Hao Su

To perform well, the policy must infer the task identity from collected transitions by modelling its dependency on states, actions and rewards.

no code implementations • 25 Sep 2019 • Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Hao Su, Henrik Iskov Christensen

Combining ideas from Batch RL and Meta RL, we propose tiMe, which learns distillation of multiple value functions and MDP embeddings from only existing data.

no code implementations • ICML 2018 • Matthew Fellows, Kamil Ciosek, Shimon Whiteson

We propose a new way of deriving policy gradient updates for reinforcement learning.

no code implementations • 10 Jan 2018 • Kamil Ciosek, Shimon Whiteson

For Gaussian policies, we introduce an exploration method that uses covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions.

no code implementations • 15 Jun 2017 • Kamil Ciosek, Shimon Whiteson

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning.

no code implementations • 24 May 2016 • Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson

ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy.

no code implementations • 16 Jan 2015 • Kamil Ciosek, David Silver

This paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction.

no code implementations • 22 Jan 2013 • Kamil Ciosek

This paper presents four different ways of looking at the well-known Least Squares Temporal Differences (LSTD) algorithm for computing the value function of a Markov Reward Process, each of them leading to different insights: the operator-theory approach via the Galerkin method, the statistical approach via instrumental variables, the linear dynamical system view as well as the limit of the TD iteration.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.