Search Results for author: Stuart Russell

Found 67 papers, 28 papers with code

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

no code implementations27 Feb 2024 Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment.

Avoiding Catastrophe in Continuous Spaces by Asking for Help

no code implementations12 Feb 2024 Benjamin Plaut, Hanlin Zhu, Stuart Russell

Specifically, we assume that the payoff each round represents the chance of avoiding catastrophe that round, and try to maximize the product of payoffs (the overall chance of avoiding catastrophe).

ALMANACS: A Simulatability Benchmark for Language Model Explainability

1 code implementation20 Dec 2023 Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons

The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations.

Language Modelling

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

1 code implementation13 Dec 2023 Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan

Furthermore, SQIRL explains why random exploration works well in practice, since we show many environments can be solved by estimating the random policy's Q-function and then applying zero or a few steps of value iteration.

Reinforcement Learning (RL)

Active teacher selection for reinforcement learning from human feedback

no code implementations23 Oct 2023 Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell

The HUB framework and ATS algorithm demonstrate the importance of leveraging differences between teachers to learn accurate reward models, facilitating future research on active teacher selection for robust reward modeling.

Recommendation Systems reinforcement-learning

On Representation Complexity of Model-based and Model-free Reinforcement Learning

no code implementations3 Oct 2023 Hanlin Zhu, Baihe Huang, Stuart Russell

To the best of our knowledge, this work is the first to study the circuit complexity of RL, which also provides a rigorous framework for future research.

reinforcement-learning Reinforcement Learning (RL)

Who Needs to Know? Minimal Knowledge for Optimal Coordination

no code implementations15 Jun 2023 Niklas Lauffer, Ameesh Shah, Micah Carroll, Michael Dennis, Stuart Russell

We apply this algorithm to analyze the strategically relevant information for tasks in both a standard and a partially observable version of the Overcooked environment.

TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI

no code implementations12 Jun 2023 Andrew Critch, Stuart Russell

While several recent works have identified societal-scale and extinction-level risks to humanity arising from artificial intelligence, few have attempted an {\em exhaustive taxonomy} of such risks.

Bridging RL Theory and Practice with the Effective Horizon

1 code implementation NeurIPS 2023 Cassidy Laidlaw, Stuart Russell, Anca Dragan

Using BRIDGE, we find that prior bounds do not correlate well with when deep RL succeeds vs. fails, but discover a surprising property that does.

Reinforcement Learning (RL)

Active Reward Learning from Multiple Teachers

no code implementations2 Mar 2023 Peter Barnett, Rachel Freedman, Justin Svegliato, Stuart Russell

Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an AI system.

Adversarial Policies Beat Superhuman Go AIs

2 code implementations1 Nov 2022 Tony T. Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell

The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack.

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

no code implementations1 Nov 2022 Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao

Offline reinforcement learning (RL), which refers to decision-making from a previously-collected dataset of interactions, has received significant attention over the past years.

Decision Making Offline RL +2

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

1 code implementation7 Jul 2022 Scott Emmons, Caspar Oesterheld, Andrew Critch, Vincent Conitzer, Stuart Russell

In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium.

An Empirical Investigation of Representation Learning for Imitation

2 code implementations16 May 2022 Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.

Image Classification Imitation Learning +1

Estimating and Penalizing Induced Preference Shifts in Recommender Systems

no code implementations25 Apr 2022 Micah Carroll, Anca Dragan, Stuart Russell, Dylan Hadfield-Menell

These steps involve two challenging ingredients: estimation requires anticipating how hypothetical algorithms would influence user preferences if deployed - we do this by using historical user interaction data to train a predictive user model which implicitly contains their preference dynamics; evaluation and optimization additionally require metrics to assess whether such influences are manipulative or otherwise unwanted - we use the notion of "safe shifts", that define a trust region within which behavior is safe: for instance, the natural way in which users would shift without interference from the system could be deemed "safe".

Recommendation Systems

Quantifying Local Specialization in Deep Neural Networks

3 code implementations13 Oct 2021 Shlomi Hod, Daniel Filan, Stephen Casper, Andrew Critch, Stuart Russell

These results suggest that graph-based partitioning can reveal local specialization and that statistical methods can be used to automatedly screen for sets of neurons that can be understood abstractly.

Cross-Domain Imitation Learning via Optimal Transport

1 code implementation ICLR 2022 Arnaud Fickinger, samuel cohen, Stuart Russell, Brandon Amos

Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology.

Continuous Control Imitation Learning

Detecting Modularity in Deep Neural Networks

no code implementations29 Sep 2021 Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell

These results suggest that graph-based partitioning can reveal modularity and help us understand how deep neural networks function.

The MineRL BASALT Competition on Learning from Human Feedback

no code implementations5 Jul 2021 Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan

Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.

Imitation Learning

Uncertain Decisions Facilitate Better Preference Learning

no code implementations NeurIPS 2021 Cassidy Laidlaw, Stuart Russell

We give the first statistical analysis of IDT, providing conditions necessary to identify these preferences and characterizing the sample complexity -- the number of decisions that must be observed to learn the tradeoff the human is making to a desired precision.

MADE: Exploration via Maximizing Deviation from Explored Regions

1 code implementation NeurIPS 2021 Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao, Yuandong Tian, Joseph Gonzalez, Stuart Russell

As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies.

Efficient Exploration Reinforcement Learning (RL)

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

no code implementations NeurIPS 2021 Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell

Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets.

Imitation Learning Multi-Armed Bandits +3

Clusterability in Neural Networks

2 code implementations4 Mar 2021 Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell

We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy.

Accumulating Risk Capital Through Investing in Cooperation

no code implementations25 Jan 2021 Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell

Recent work on promoting cooperation in multi-agent learning has resulted in many methods which successfully promote cooperation at the cost of becoming more vulnerable to exploitation by malicious actors.

Importance and Coherence: Methods for Evaluating Modularity in Neural Networks

no code implementations1 Jan 2021 Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell

We apply these methods on partitionings generated by a spectral clustering algorithm which uses a graph representation of the network's neurons and weights.

Clustering

Benefits of Assistance over Reward Learning

no code implementations1 Jan 2021 Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.

Discrete Predictive Representation for Long-horizon Planning

no code implementations1 Jan 2021 Thanard Kurutach, Julia Peng, Yang Gao, Stuart Russell, Pieter Abbeel

Discrete representations have been key in enabling robots to plan at more abstract levels and solve temporally-extended tasks more efficiently for decades.

Object Reinforcement Learning (RL)

Multi-Principal Assistance Games: Definition and Collegial Mechanisms

no code implementations29 Dec 2020 Arnaud Fickinger, Simon Zhuang, Andrew Critch, Dylan Hadfield-Menell, Stuart Russell

We introduce the concept of a multi-principal assistance game (MPAG), and circumvent an obstacle in social choice theory, Gibbard's theorem, by using a sufficiently collegial preference inference mechanism.

Understanding Learned Reward Functions

1 code implementation10 Dec 2020 Eric J. Michaud, Adam Gleave, Stuart Russell

However, current techniques for reward learning may fail to produce reward functions which accurately reflect user preferences.

DERAIL: Diagnostic Environments for Reward And Imitation Learning

2 code implementations2 Dec 2020 Pedro Freire, Adam Gleave, Sam Toyer, Stuart Russell

We evaluate a range of common reward and imitation learning algorithms on our tasks.

Imitation Learning

The MAGICAL Benchmark for Robust Imitation

1 code implementation NeurIPS 2020 Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell

This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially different deployment settings.

Imitation Learning

SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term Memory

1 code implementation NeurIPS 2020 Paria Rashidinejad, Jiantao Jiao, Stuart Russell

Our theoretical and experimental results shed light on the conditions required for efficient probably approximately correct (PAC) learning of the Kalman filter from partially observed data.

PAC learning

Multi-Principal Assistance Games

no code implementations19 Jul 2020 Arnaud Fickinger, Simon Zhuang, Dylan Hadfield-Menell, Stuart Russell

Assistance games (also known as cooperative inverse reinforcement learning games) have been proposed as a model for beneficial AI, wherein a robotic agent must act on behalf of a human principal but is initially uncertain about the humans payoff function.

Quantifying Differences in Reward Functions

1 code implementation ICLR 2021 Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike

However, this method cannot distinguish between the learned reward function failing to reflect user preferences and the policy optimization process failing to optimize the learned reward.

Pruned Neural Networks are Surprisingly Modular

1 code implementation10 Mar 2020 Daniel Filan, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell

To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate the modular structure of MLPs trained on datasets of small images.

Clustering Graph Clustering

Bayesian Relational Memory for Semantic Visual Navigation

1 code implementation ICCV 2019 Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian

We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards.

Navigate Visual Navigation

Adversarial Policies: Attacking Deep Reinforcement Learning

2 code implementations ICLR 2020 Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell

Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers.

reinforcement-learning Reinforcement Learning (RL)

Inverse reinforcement learning for video games

1 code implementation24 Oct 2018 Aaron Tucker, Adam Gleave, Stuart Russell

Deep reinforcement learning achieves superhuman performance in a range of video game environments, but requires that a designer manually specify a reward function.

Continuous Control reinforcement-learning +1

Learning and Planning with a Semantic Model

no code implementations ICLR 2019 Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian

Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI.

Visual Navigation

Learning Plannable Representations with Causal InfoGAN

1 code implementation NeurIPS 2018 Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel

Finally, to generate a visual plan, we project the current and goal observations onto their respective states in the planning model, plan a trajectory, and then use the generative model to transform the trajectory to a sequence of observations.

Representation Learning

An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning

no code implementations ICML 2018 Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan

We apply this update to a variety of POMDP solvers and find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human.

reinforcement-learning Reinforcement Learning (RL)

Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms

no code implementations ICML 2018 Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, Stuart Russell

Despite the recent successes of probabilistic programming languages (PPLs) in AI applications, PPLs offer only limited support for random variables whose distributions combine discrete and continuous elements.

Probabilistic Programming

Inverse Reward Design

1 code implementation NeurIPS 2017 Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan

When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios.

Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making

no code implementations31 Oct 2017 Andrew Critch, Stuart Russell

It is often argued that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a {\em Pareto-optimal} policy, i. e., a policy that cannot be improved upon for one agent without making sacrifices for another.

Decision Making

Adversarial Training for Relation Extraction

no code implementations EMNLP 2017 Yi Wu, David Bamman, Stuart Russell

Adversarial training is a mean of regularizing classification algorithms by generating adversarial noise to the training data.

General Classification Image Classification +4

Should Robots be Obedient?

1 code implementation28 May 2017 Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell

We show that when a human is not perfectly rational then a robot that tries to infer and act according to the human's underlying preferences can always perform better than a robot that simply follows the human's literal order.

The Off-Switch Game

no code implementations24 Nov 2016 Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

We analyze a simple game between a human H and a robot R, where H can press R's off switch but R can disable the off switch.

Cooperative Inverse Reinforcement Learning

2 code implementations NeurIPS 2016 Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans.

Active Learning reinforcement-learning +1

Towards Practical Bayesian Parameter and State Estimation

no code implementations29 Mar 2016 Yusuf Bugra Erol, Yi Wu, Lei LI, Stuart Russell

Joint state and parameter estimation is a core problem for dynamic Bayesian networks.

Research Priorities for Robust and Beneficial Artificial Intelligence

no code implementations10 Feb 2016 Stuart Russell, Daniel Dewey, Max Tegmark

Success in the quest for artificial intelligence has the potential to bring unprecedented benefits to humanity, and it is therefore worthwhile to investigate how to maximize these benefits while avoiding potential pitfalls.

Probabilistic Model-Based Approach for Heart Beat Detection

no code implementations24 Dec 2015 Hugh Chen, Yusuf Erol, Eric Shen, Stuart Russell

One of the biggest flaws in the medical system is perhaps an unexpected one: the patient alarm system.

Bayesian Inference

Selecting Computations: Theory and Applications

no code implementations9 Aug 2014 Nicholas Hay, Stuart Russell, David Tolpin, Solomon Eyal Shimony

Sequential decision problems are often approximately solvable by simulating possible future action sequences.

Game of Go

Automated Construction of Sparse Bayesian Networks from Unstructured Probabilistic Models and Domain Information

no code implementations27 Mar 2013 Sampath Srinivas, Stuart Russell, Alice M. Agogino

An algorithm for automated construction of a sparse Bayesian network given an unstructured probabilistic model and causal domain information from an expert has been developed and implemented.

Fine-Grained Decision-Theoretic Search Control

no code implementations27 Mar 2013 Stuart Russell

This formula is used to estimate the value of expanding further successors, using a general formula for the value of a computation in game-playing developed in earlier work.

Variational MCMC

no code implementations10 Jan 2013 Nando de Freitas, Pedro Hojen-Sorensen, Michael. I. Jordan, Stuart Russell

One of these algorithms is a mixture of two MCMC kernels: a random walk Metropolis kernel and a blockMetropolis-Hastings (MH) kernel with a variational approximation as proposaldistribution.

Markov Chain Monte Carlo Data Association for Multiple-Target Tracking

no code implementations IEEE Transactions on Automatic Control 2009 Songhwai Oh, Stuart Russell, Shankar Sastry

This paper presents Markov chain Monte Carlo data association (MCMCDA) for solving data association problems arising in multiple-target tracking in a cluttered environment.

Cannot find the paper you are looking for? You can Submit a new open access paper.