Search Results for author: Brendan O'Donoghue

Found 22 papers, 4 papers with code

Combining policy gradient and Q-learning

no code implementations • 5 Nov 2016 • Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih

Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting.

Paper
Add Code

The Uncertainty Bellman Equation and Exploration

1 code implementation • ICML 2018 • Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps.

Paper
Code

Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

no code implementations • ICML 2018 • Jonathan Uesato, Brendan O'Donoghue, Aaron van den Oord, Pushmeet Kohli

We motivate 'adversarial risk' as an objective for achieving models robust to worst-case inputs.

Adversarial Robustness

Paper
Add Code

Training verified learners with learned verifiers

no code implementations • 25 May 2018 • Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O'Donoghue, Jonathan Uesato, Pushmeet Kohli

This paper proposes a new algorithmic framework, predictor-verifier training, to train neural networks that are verifiable, i. e., networks that provably satisfy some desired input-output properties.

Paper
Add Code

Variational Bayesian Reinforcement Learning with Regret Bounds

no code implementations • NeurIPS 2021 • Brendan O'Donoghue

We show deep connections of this approach to the soft-max and maximum-entropy strands of research in reinforcement learning.

Q-Learning reinforcement-learning +1

Paper
Add Code

Hamiltonian Descent Methods

4 code implementations • 13 Sep 2018 • Chris J. Maddison, Daniel Paulin, Yee Whye Teh, Brendan O'Donoghue, Arnaud Doucet

Yet, crucially the kinetic gradient map can be designed to incorporate information about the convex conjugate in a fashion that allows for linear convergence on convex functions that may be non-smooth or non-strongly convex.

Paper
Code

Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

no code implementations • ICLR 2019 • Edward Grefenstette, Robert Stanforth, Brendan O'Donoghue, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

We show that increasing the number of parameters in adversarially-trained models increases their robustness, and in particular that ensembling smaller models while adversarially training the entire ensemble as a single model is a more efficient way of spending said budget than simply using a larger single model.

Self-Driving Cars

Paper
Add Code

Verification of Non-Linear Specifications for Neural Networks

no code implementations • ICLR 2019 • Chongli Qin, Krishnamurthy, Dvijotham, Brendan O'Donoghue, Rudy Bunel, Robert Stanforth, Sven Gowal, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

We show that a number of important properties of interest can be modeled within this class, including conservation of energy in a learned dynamics model of a physical system; semantic consistency of a classifier's output labels under adversarial perturbations and bounding errors in a system that predicts the summation of handwritten digits.

Paper
Add Code

Making Sense of Reinforcement Learning and Probabilistic Inference

no code implementations • ICLR 2020 • Brendan O'Donoghue, Ian Osband, Catalin Ionescu

Reinforcement learning (RL) combines a control problem with statistical estimation: The system dynamics are not known to the agent, but can be learned through experience.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Matrix games with bandit feedback

no code implementations • 9 Jun 2020 • Brendan O'Donoghue, Tor Lattimore, Ian Osband

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff.

Paper
Add Code

Sample Efficient Reinforcement Learning with REINFORCE

no code implementations • 22 Oct 2020 • Junzi Zhang, Jongho Kim, Brendan O'Donoghue, Stephen Boyd

Policy gradient methods are among the most effective methods for large-scale reinforcement learning, and their empirical success has prompted several works that develop the foundation of their global convergence theory.

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Solving Mixed Integer Programs Using Neural Networks

1 code implementation • 23 Dec 2020 • Vinod Nair, Sergey Bartunov, Felix Gimeno, Ingrid von Glehn, Pawel Lichocki, Ivan Lobov, Brendan O'Donoghue, Nicolas Sonnerat, Christian Tjandraatmadja, Pengming Wang, Ravichandra Addanki, Tharindi Hapuarachchi, Thomas Keck, James Keeling, Pushmeet Kohli, Ira Ktena, Yujia Li, Oriol Vinyals, Yori Zwols

Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.

Variable Selection

12,799

Paper
Code

Discovering a set of policies for the worst case reward

no code implementations • ICLR 2021 • Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh

Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.

Paper
Add Code

Discovering Diverse Nearly Optimal Policies with Successor Features

no code implementations • ICML Workshop URL 2021 • Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.

Paper
Add Code

Reward is enough for convex MDPs

no code implementations • NeurIPS 2021 • Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Maximising a cumulative reward function that is Markov and stationary, i. e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP).

Reinforcement Learning (RL)

Paper
Add Code

Evaluating Predictive Distributions: Does Bayesian Deep Learning Work?

no code implementations • 29 Sep 2021 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Dieterich Lawson, Brendan O'Donoghue, Botao Hao, Benjamin Van Roy

This paper introduces \textit{The Neural Testbed}, which provides tools for the systematic evaluation of agents that generate such predictions.

Uncertainty Quantification

Paper
Add Code

The Neural Testbed: Evaluating Joint Predictions

1 code implementation • 9 Oct 2021 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

Predictive distributions quantify uncertainties ignored by point estimates.

188

Paper
Code

Variational Bayesian Optimistic Sampling

no code implementations • NeurIPS 2021 • Brendan O'Donoghue, Tor Lattimore

We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy.

Thompson Sampling

Paper
Add Code

On the connection between Bregman divergence and value in regularized Markov decision processes

no code implementations • 21 Oct 2022 • Brendan O'Donoghue

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

no code implementations • 30 Dec 2022 • Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.

Meta Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs

no code implementations • 2 Feb 2023 • Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy

Such applications often require to put constraints on the agent's behavior.

Continuous Control reinforcement-learning +1

Paper
Add Code

Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization

no code implementations • 18 Feb 2023 • Brendan O'Donoghue

Optimism in the face of uncertainty is a well-known heuristic with theoretical guarantees in the tabular setting, but how best to translate the principle to deep reinforcement learning, which involves online stochastic gradients and deep network function approximators, is not fully understood.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.