Search Results for author: Brendan O'Donoghue

Found 22 papers, 4 papers with code

Combining policy gradient and Q-learning

no code implementations5 Nov 2016 Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih

Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting.

Atari Games Q-Learning

The Uncertainty Bellman Equation and Exploration

1 code implementation ICML 2018 Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps.

Training verified learners with learned verifiers

no code implementations25 May 2018 Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O'Donoghue, Jonathan Uesato, Pushmeet Kohli

This paper proposes a new algorithmic framework, predictor-verifier training, to train neural networks that are verifiable, i. e., networks that provably satisfy some desired input-output properties.

Variational Bayesian Reinforcement Learning with Regret Bounds

no code implementations NeurIPS 2021 Brendan O'Donoghue

We show deep connections of this approach to the soft-max and maximum-entropy strands of research in reinforcement learning.

Q-Learning reinforcement-learning +1

Hamiltonian Descent Methods

4 code implementations13 Sep 2018 Chris J. Maddison, Daniel Paulin, Yee Whye Teh, Brendan O'Donoghue, Arnaud Doucet

Yet, crucially the kinetic gradient map can be designed to incorporate information about the convex conjugate in a fashion that allows for linear convergence on convex functions that may be non-smooth or non-strongly convex.

Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

no code implementations ICLR 2019 Edward Grefenstette, Robert Stanforth, Brendan O'Donoghue, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

We show that increasing the number of parameters in adversarially-trained models increases their robustness, and in particular that ensembling smaller models while adversarially training the entire ensemble as a single model is a more efficient way of spending said budget than simply using a larger single model.

Self-Driving Cars

Verification of Non-Linear Specifications for Neural Networks

no code implementations ICLR 2019 Chongli Qin, Krishnamurthy, Dvijotham, Brendan O'Donoghue, Rudy Bunel, Robert Stanforth, Sven Gowal, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

We show that a number of important properties of interest can be modeled within this class, including conservation of energy in a learned dynamics model of a physical system; semantic consistency of a classifier's output labels under adversarial perturbations and bounding errors in a system that predicts the summation of handwritten digits.

Making Sense of Reinforcement Learning and Probabilistic Inference

no code implementations ICLR 2020 Brendan O'Donoghue, Ian Osband, Catalin Ionescu

Reinforcement learning (RL) combines a control problem with statistical estimation: The system dynamics are not known to the agent, but can be learned through experience.

reinforcement-learning Reinforcement Learning (RL) +1

Matrix games with bandit feedback

no code implementations9 Jun 2020 Brendan O'Donoghue, Tor Lattimore, Ian Osband

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff.

Sample Efficient Reinforcement Learning with REINFORCE

no code implementations22 Oct 2020 Junzi Zhang, Jongho Kim, Brendan O'Donoghue, Stephen Boyd

Policy gradient methods are among the most effective methods for large-scale reinforcement learning, and their empirical success has prompted several works that develop the foundation of their global convergence theory.

Policy Gradient Methods reinforcement-learning +1

Discovering a set of policies for the worst case reward

no code implementations ICLR 2021 Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh

Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.

Discovering Diverse Nearly Optimal Policies with Successor Features

no code implementations ICML Workshop URL 2021 Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.

Reward is enough for convex MDPs

no code implementations NeurIPS 2021 Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Maximising a cumulative reward function that is Markov and stationary, i. e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP).

Reinforcement Learning (RL)

Variational Bayesian Optimistic Sampling

no code implementations NeurIPS 2021 Brendan O'Donoghue, Tor Lattimore

We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy.

Thompson Sampling

On the connection between Bregman divergence and value in regularized Markov decision processes

no code implementations21 Oct 2022 Brendan O'Donoghue

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process.

reinforcement-learning Reinforcement Learning (RL)

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

no code implementations30 Dec 2022 Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.

Meta Reinforcement Learning Reinforcement Learning (RL)

Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization

no code implementations18 Feb 2023 Brendan O'Donoghue

Optimism in the face of uncertainty is a well-known heuristic with theoretical guarantees in the tabular setting, but how best to translate the principle to deep reinforcement learning, which involves online stochastic gradients and deep network function approximators, is not fully understood.

Efficient Exploration reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.