Search Results for author: Dmitrii Krasheninnikov

Found 6 papers, 2 papers with code

Meta- (out-of-context) learning in neural networks

1 code implementation23 Oct 2023 Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, David Krueger

Brown et al. (2020) famously introduced the phenomenon of in-context learning in large language models (LLMs).

In-Context Learning

Defining and Characterizing Reward Hacking

no code implementations27 Sep 2022 Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, David Krueger

We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$.

Combining Reward Information from Multiple Sources

no code implementations22 Mar 2021 Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof

We study this problem in the setting with two conflicting reward functions learned from different sources.

Informativeness

Benefits of Assistance over Reward Learning

no code implementations1 Jan 2021 Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.

Preferences Implicit in the State of the World

1 code implementation ICLR 2019 Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan

We find that information from the initial state can be used to infer both side effects that should be avoided as well as preferences for how the environment should be organized.

Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.