1 code implementation • 1 Sep 2023 • Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons
In this work, we focus on the image input to a vision-language model (VLM).
no code implementations • 15 Jun 2023 • Niklas Lauffer, Ameesh Shah, Micah Carroll, Michael Dennis, Stuart Russell
We apply this algorithm to analyze the strategically relevant information for tasks in both a standard and a partially observable version of the Overcooked environment.
no code implementations • 12 Jun 2023 • Andrew Critch, Stuart Russell
While several recent works have identified societal-scale and extinction-level risks to humanity arising from artificial intelligence, few have attempted an {\em exhaustive taxonomy} of such risks.
1 code implementation • 19 Apr 2023 • Cassidy Laidlaw, Stuart Russell, Anca Dragan
We find that prior bounds do not correlate well with when deep RL succeeds vs. fails, but discover a surprising property that does.
no code implementations • 2 Mar 2023 • Peter Barnett, Rachel Freedman, Justin Svegliato, Stuart Russell
Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an AI system.
2 code implementations • 22 Nov 2022 • Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell
imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch.
no code implementations • 1 Nov 2022 • Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao
Offline reinforcement learning (RL), which refers to decision-making from a previously-collected dataset of interactions, has received significant attention over the past years.
2 code implementations • 1 Nov 2022 • Tony T. Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell
The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack.
1 code implementation • 7 Jul 2022 • Scott Emmons, Caspar Oesterheld, Andrew Critch, Vincent Conitzer, Stuart Russell
In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium.
2 code implementations • 16 May 2022 • Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah
We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.
no code implementations • 25 Apr 2022 • Micah Carroll, Anca Dragan, Stuart Russell, Dylan Hadfield-Menell
These steps involve two challenging ingredients: estimation requires anticipating how hypothetical algorithms would influence user preferences if deployed - we do this by using historical user interaction data to train a predictive user model which implicitly contains their preference dynamics; evaluation and optimization additionally require metrics to assess whether such influences are manipulative or otherwise unwanted - we use the notion of "safe shifts", that define a trust region within which behavior is safe: for instance, the natural way in which users would shift without interference from the system could be deemed "safe".
no code implementations • 14 Mar 2022 • Joar Skalse, Matthew Farrugia-Roberts, Stuart Russell, Alessandro Abate, Adam Gleave
It is often very challenging to manually design reward functions for complex, real-world tasks.
3 code implementations • 13 Oct 2021 • Shlomi Hod, Daniel Filan, Stephen Casper, Andrew Critch, Stuart Russell
These results suggest that graph-based partitioning can reveal local specialization and that statistical methods can be used to automatedly screen for sets of neurons that can be understood abstractly.
1 code implementation • ICLR 2022 • Arnaud Fickinger, samuel cohen, Stuart Russell, Brandon Amos
Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology.
1 code implementation • NeurIPS 2021 • Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown
Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker.
no code implementations • 29 Sep 2021 • Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell
These results suggest that graph-based partitioning can reveal modularity and help us understand how deep neural networks function.
1 code implementation • ICML Workshop URL 2021 • Arnaud Fickinger, Natasha Jaques, Samyak Parajuli, Michael Chang, Nicholas Rhinehart, Glen Berseth, Stuart Russell, Sergey Levine
Unsupervised reinforcement learning (RL) studies how to leverage environment statistics to learn useful behaviors without the cost of reward engineering.
no code implementations • 5 Jul 2021 • Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan
Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.
no code implementations • NeurIPS 2021 • Cassidy Laidlaw, Stuart Russell
We give the first statistical analysis of IDT, providing conditions necessary to identify these preferences and characterizing the sample complexity -- the number of decisions that must be observed to learn the tradeoff the human is making to a desired precision.
1 code implementation • NeurIPS 2021 • Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao, Yuandong Tian, Joseph Gonzalez, Stuart Russell
As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies.
no code implementations • NeurIPS 2021 • Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell
Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets.
2 code implementations • 4 Mar 2021 • Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell
We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy.
no code implementations • 25 Jan 2021 • Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell
Recent work on promoting cooperation in multi-agent learning has resulted in many methods which successfully promote cooperation at the cost of becoming more vulnerable to exploitation by malicious actors.
no code implementations • 1 Jan 2021 • Thanard Kurutach, Julia Peng, Yang Gao, Stuart Russell, Pieter Abbeel
Discrete representations have been key in enabling robots to plan at more abstract levels and solve temporally-extended tasks more efficiently for decades.
no code implementations • 1 Jan 2021 • Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell
By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.
no code implementations • 1 Jan 2021 • Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell
We apply these methods on partitionings generated by a spectral clustering algorithm which uses a graph representation of the network's neurons and weights.
no code implementations • 29 Dec 2020 • Arnaud Fickinger, Simon Zhuang, Andrew Critch, Dylan Hadfield-Menell, Stuart Russell
We introduce the concept of a multi-principal assistance game (MPAG), and circumvent an obstacle in social choice theory, Gibbard's theorem, by using a sufficiently collegial preference inference mechanism.
1 code implementation • 10 Dec 2020 • Eric J. Michaud, Adam Gleave, Stuart Russell
However, current techniques for reward learning may fail to produce reward functions which accurately reflect user preferences.
4 code implementations • NeurIPS 2020 • Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED).
2 code implementations • 2 Dec 2020 • Pedro Freire, Adam Gleave, Sam Toyer, Stuart Russell
We evaluate a range of common reward and imitation learning algorithms on our tasks.
no code implementations • pproximateinference AABI Symposium 2021 • George Matheos, Alexander K. Lew, Matin Ghavamizadeh, Stuart Russell, Marco Cusumano-Towner, Vikash Mansinghka
Open-universe probabilistic models enable Bayesian inference about how many objects underlie data, and how they are related.
1 code implementation • NeurIPS 2020 • Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell
This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially different deployment settings.
1 code implementation • NeurIPS 2020 • Paria Rashidinejad, Jiantao Jiao, Stuart Russell
Our theoretical and experimental results shed light on the conditions required for efficient probably approximately correct (PAC) learning of the Kalman filter from partially observed data.
no code implementations • 19 Jul 2020 • Arnaud Fickinger, Simon Zhuang, Dylan Hadfield-Menell, Stuart Russell
Assistance games (also known as cooperative inverse reinforcement learning games) have been proposed as a model for beneficial AI, wherein a robotic agent must act on behalf of a human principal but is initially uncertain about the humans payoff function.
1 code implementation • ICLR 2021 • Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike
However, this method cannot distinguish between the learned reward function failing to reflect user preferences and the policy optimization process failing to optimize the learned reward.
1 code implementation • 10 Mar 2020 • Daniel Filan, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell
To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate the modular structure of MLPs trained on datasets of small images.
1 code implementation • ICCV 2019 • Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian
We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards.
2 code implementations • ICLR 2020 • Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell
Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers.
1 code implementation • 24 Oct 2018 • Aaron Tucker, Adam Gleave, Stuart Russell
Deep reinforcement learning achieves superhuman performance in a range of video game environments, but requires that a designer manually specify a reward function.
no code implementations • ICLR 2019 • Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian
Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI.
1 code implementation • NeurIPS 2018 • Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel
Finally, to generate a visual plan, we project the current and goal observations onto their respective states in the planning model, plan a trajectory, and then use the generative model to transform the trajectory to a sequence of observations.
no code implementations • ICML 2018 • Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan
We apply this update to a variety of POMDP solvers and find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human.
no code implementations • ICML 2018 • Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, Stuart Russell
Despite the recent successes of probabilistic programming languages (PPLs) in AI applications, PPLs offer only limited support for random variables whose distributions combine discrete and continuous elements.
no code implementations • NeurIPS 2017 • Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan
When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios.
no code implementations • 31 Oct 2017 • Andrew Critch, Stuart Russell
It is often argued that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a {\em Pareto-optimal} policy, i. e., a policy that cannot be improved upon for one agent without making sacrifices for another.
no code implementations • EMNLP 2017 • Yi Wu, David Bamman, Stuart Russell
Adversarial training is a mean of regularizing classification algorithms by generating adversarial noise to the training data.
1 code implementation • 28 May 2017 • Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell
We show that when a human is not perfectly rational then a robot that tries to infer and act according to the human's underlying preferences can always perform better than a robot that simply follows the human's literal order.
no code implementations • 24 Nov 2016 • Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell
We analyze a simple game between a human H and a robot R, where H can press R's off switch but R can disable the off switch.
no code implementations • 30 Jun 2016 • Yi Wu, Lei LI, Stuart Russell, Rastislav Bodik
A probabilistic program defines a probability measure over its semantic structures.
2 code implementations • NeurIPS 2016 • Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell
For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans.
no code implementations • 29 Mar 2016 • Yusuf Bugra Erol, Yi Wu, Lei LI, Stuart Russell
Joint state and parameter estimation is a core problem for dynamic Bayesian networks.
no code implementations • 10 Feb 2016 • Stuart Russell, Daniel Dewey, Max Tegmark
Success in the quest for artificial intelligence has the potential to bring unprecedented benefits to humanity, and it is therefore worthwhile to investigate how to maximize these benefits while avoiding potential pitfalls.
no code implementations • 24 Dec 2015 • Hugh Chen, Yusuf Erol, Eric Shen, Stuart Russell
One of the biggest flaws in the medical system is perhaps an unexpected one: the patient alarm system.
no code implementations • 9 Aug 2014 • Nicholas Hay, Stuart Russell, David Tolpin, Solomon Eyal Shimony
Sequential decision problems are often approximately solvable by simulating possible future action sequences.
no code implementations • 27 Mar 2013 • Stuart Russell
This formula is used to estimate the value of expanding further successors, using a general formula for the value of a computation in game-playing developed in earlier work.
no code implementations • 27 Mar 2013 • Sampath Srinivas, Stuart Russell, Alice M. Agogino
An algorithm for automated construction of a sparse Bayesian network given an unstructured probabilistic model and causal domain information from an expert has been developed and implemented.
no code implementations • 10 Jan 2013 • Nando de Freitas, Pedro Hojen-Sorensen, Michael. I. Jordan, Stuart Russell
One of these algorithms is a mixture of two MCMC kernels: a random walk Metropolis kernel and a blockMetropolis-Hastings (MH) kernel with a variational approximation as proposaldistribution.
no code implementations • IEEE Transactions on Automatic Control 2009 • Songhwai Oh, Stuart Russell, Shankar Sastry
This paper presents Markov chain Monte Carlo data association (MCMCDA) for solving data association problems arising in multiple-target tracking in a cluttered environment.