Search Results for author: Joshua Achiam

Found 9 papers, 2 papers with code

Responsive Safety in Reinforcement Learning

no code implementations ICML 2020 Adam Stooke, Joshua Achiam, Pieter Abbeel

This intuition leads to our introduction of PID control for the Lagrange multiplier in constrained RL, which we cast as a dynamical system.

reinforcement-learning Reinforcement Learning (RL) +1

A Hazard Analysis Framework for Code Synthesis Large Language Models

no code implementations25 Jul 2022 Heidy Khlaaf, Pamela Mishkin, Joshua Achiam, Gretchen Krueger, Miles Brundage

Codex, a large language model (LLM) trained on a variety of codebases, exceeds the previous state of the art in its capacity to synthesize and generate code.

Code Generation Language Modelling +1

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

no code implementations8 Jul 2020 Adam Stooke, Joshua Achiam, Pieter Abbeel

Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training.

reinforcement-learning Reinforcement Learning (RL) +1

Towards Characterizing Divergence in Deep Q-Learning

no code implementations21 Mar 2019 Joshua Achiam, Ethan Knight, Pieter Abbeel

Deep Q-Learning (DQL), a family of temporal difference algorithms for control, employs three techniques collectively known as the `deadly triad' in reinforcement learning: bootstrapping, off-policy learning, and function approximation.

Continuous Control OpenAI Gym +1

Variational Option Discovery Algorithms

no code implementations26 Jul 2018 Joshua Achiam, Harrison Edwards, Dario Amodei, Pieter Abbeel

We explore methods for option discovery based on variational inference and make two algorithmic contributions.

Variational Inference

On First-Order Meta-Learning Algorithms

13 code implementations8 Mar 2018 Alex Nichol, Joshua Achiam, John Schulman

This paper considers meta-learning problems, where there is a distribution of tasks, and we would like to obtain an agent that performs well (i. e., learns quickly) when presented with a previously unseen task sampled from this distribution.

Few-Shot Image Classification Few-Shot Learning

Constrained Policy Optimization

9 code implementations ICML 2017 Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function.

Reinforcement Learning (RL) Safe Reinforcement Learning

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

no code implementations6 Mar 2017 Joshua Achiam, Shankar Sastry

Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards.

Continuous Control reinforcement-learning +1

Easy Monotonic Policy Iteration

no code implementations29 Feb 2016 Joshua Achiam

A key problem in reinforcement learning for control with general function approximators (such as deep neural networks and other nonlinear functions) is that, for many algorithms employed in practice, updates to the policy or $Q$-function may fail to improve performance---or worse, actually cause the policy performance to degrade.

Cannot find the paper you are looking for? You can Submit a new open access paper.