Search Results for author: Cassidy Laidlaw

Found 12 papers, 8 papers with code

Preventing Reward Hacking with Occupancy Measure Regularization

1 code implementation5 Mar 2024 Cassidy Laidlaw, Shivam Singhal, Anca Dragan

Thus, we propose regularizing based on the OM divergence between policies instead of AD divergence to prevent reward hacking.

Toward Computationally Efficient Inverse Reinforcement Learning via Reward Shaping

no code implementations15 Dec 2023 Lauren H. Cooke, Harvey Klyne, Edwin Zhang, Cassidy Laidlaw, Milind Tambe, Finale Doshi-Velez

Inverse reinforcement learning (IRL) is computationally challenging, with common approaches requiring the solution of multiple reinforcement learning (RL) sub-problems.

reinforcement-learning Reinforcement Learning (RL)

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

1 code implementation13 Dec 2023 Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan

Our goal is to explain why deep RL algorithms often perform well in practice, despite using random exploration and much more expressive function classes like neural networks.

Reinforcement Learning (RL)

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

1 code implementation13 Dec 2023 Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell

We prove that standard applications of preference learning, including reinforcement learning from human feedback (RLHF), implicitly aggregate over hidden contexts according to a well-known voting rule called Borda count.

Bridging RL Theory and Practice with the Effective Horizon

1 code implementation NeurIPS 2023 Cassidy Laidlaw, Stuart Russell, Anca Dragan

Using BRIDGE, we find that prior bounds do not correlate well with when deep RL succeeds vs. fails, but discover a surprising property that does.

Reinforcement Learning (RL)

The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models

1 code implementation ICLR 2022 Cassidy Laidlaw, Anca Dragan

However, these models fail when humans exhibit systematic suboptimality, i. e. when their deviations from optimal behavior are not independent, but instead consistent over time.

Bayesian Inference Imitation Learning

Uncertain Decisions Facilitate Better Preference Learning

no code implementations NeurIPS 2021 Cassidy Laidlaw, Stuart Russell

We give the first statistical analysis of IDT, providing conditions necessary to identify these preferences and characterizing the sample complexity -- the number of decisions that must be observed to learn the tradeoff the human is making to a desired precision.

Perceptual Adversarial Robustness: Generalizable Defenses Against Unforeseen Threat Models

no code implementations ICLR 2021 Cassidy Laidlaw, Sahil Singla, Soheil Feizi

We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images.

Adversarial Defense Adversarial Robustness +1

Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

2 code implementations22 Jun 2020 Cassidy Laidlaw, Sahil Singla, Soheil Feizi

We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images.

Adversarial Defense Adversarial Robustness +1

Playing it Safe: Adversarial Robustness with an Abstain Option

no code implementations25 Nov 2019 Cassidy Laidlaw, Soheil Feizi

We explore adversarial robustness in the setting in which it is acceptable for a classifier to abstain---that is, output no class---on adversarial examples.

Adversarial Robustness

Functional Adversarial Attacks

1 code implementation NeurIPS 2019 Cassidy Laidlaw, Soheil Feizi

For simplicity, we refer to functional adversarial attacks on image colors as ReColorAdv, which is the main focus of our experiments.

Adversarial Attack

Capture, Learning, and Synthesis of 3D Speaking Styles

1 code implementation CVPR 2019 Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, Michael J. Black

To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers.

3D Face Animation Talking Face Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.