Search Results for author: Sam Toyer

Found 11 papers, 8 papers with code

A StrongREJECT for Empty Jailbreaks

1 code implementation15 Feb 2024 Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

We show that our new grading scheme better accords with human judgment of response quality and overall jailbreak effectiveness, especially on the sort of low-quality responses that contribute the most to over-estimation of jailbreak performance on existing benchmarks.

An Empirical Investigation of Representation Learning for Imitation

2 code implementations16 May 2022 Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.

Image Classification Imitation Learning +1

A Primer on Maximum Causal Entropy Inverse Reinforcement Learning

no code implementations22 Mar 2022 Adam Gleave, Sam Toyer

Inverse Reinforcement Learning (IRL) algorithms infer a reward function that explains demonstrations provided by an expert acting in the environment.

reinforcement-learning Reinforcement Learning (RL)

DERAIL: Diagnostic Environments for Reward And Imitation Learning

2 code implementations2 Dec 2020 Pedro Freire, Adam Gleave, Sam Toyer, Stuart Russell

We evaluate a range of common reward and imitation learning algorithms on our tasks.

Imitation Learning

The MAGICAL Benchmark for Robust Imitation

1 code implementation NeurIPS 2020 Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell

This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially different deployment settings.

Imitation Learning

ASNets: Deep Learning for Generalised Planning

1 code implementation4 Aug 2019 Sam Toyer, Felipe Trevizan, Sylvie Thiébaux, Lexing Xie

In this paper, we discuss the learning of generalised policies for probabilistic and classical planning problems using Action Schema Networks (ASNets).

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

5 code implementations ICLR 2019 Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine

By enforcing a constraint on the mutual information between the observations and the discriminator's internal representation, we can effectively modulate the discriminator's accuracy and maintain useful and informative gradients.

Continuous Control Image Generation +1

Action Schema Networks: Generalised Policies with Deep Learning

1 code implementation13 Sep 2017 Sam Toyer, Felipe Trevizan, Sylvie Thiébaux, Lexing Xie

In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems.

Human Pose Forecasting via Deep Markov Models

no code implementations24 Jul 2017 Sam Toyer, Anoop Cherian, Tengda Han, Stephen Gould

Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving.

Autonomous Driving Human Pose Forecasting

Cannot find the paper you are looking for? You can Submit a new open access paper.