Search Results for author: Justin Svegliato

Found 8 papers, 2 papers with code

A StrongREJECT for Empty Jailbreaks

1 code implementation • 15 Feb 2024 • Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

We show that our new grading scheme better accords with human judgment of response quality and overall jailbreak effectiveness, especially on the sort of low-quality responses that contribute the most to over-estimation of jailbreak performance on existing benchmarks.

Paper
Code

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

no code implementations • 2 Nov 2023 • Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell

Our benchmark results show that many models are vulnerable to the attack strategies in the Tensor Trust dataset.

Instruction Following

Paper
Add Code

Active teacher selection for reinforcement learning from human feedback

no code implementations • 23 Oct 2023 • Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell

The HUB framework and ATS algorithm demonstrate the importance of leveraging differences between teachers to learn accurate reward models, facilitating future research on active teacher selection for robust reward modeling.

Recommendation Systems reinforcement-learning

Paper
Add Code

Active Reward Learning from Multiple Teachers

no code implementations • 2 Mar 2023 • Peter Barnett, Rachel Freedman, Justin Svegliato, Stuart Russell

Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an AI system.

Paper
Add Code

Fairness and Sequential Decision Making: Limits, Lessons, and Opportunities

no code implementations • 13 Jan 2023 • Samer B. Nashed, Justin Svegliato, Su Lin Blodgett

As automated decision making and decision assistance systems become common in everyday life, research on the prevention or mitigation of potential harms that arise from decisions made by these systems has proliferated.

Decision Making Fairness

Paper
Add Code

Agent-aware State Estimation in Autonomous Vehicles

1 code implementation • 1 Aug 2021 • Shane Parr, Ishan Khatri, Justin Svegliato, Shlomo Zilberstein

Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state.

Autonomous Vehicles

Paper
Code

Improving Competence for Reliable Autonomy

no code implementations • 23 Jul 2020 • Connor Basich, Justin Svegliato, Kyle Hollins Wray, Stefan J. Witwicki, Shlomo Zilberstein

Given the complexity of real-world, unstructured domains, it is often impossible or impractical to design models that include every feature needed to handle all possible scenarios that an autonomous system may encounter.

Paper
Add Code

Learning to Optimize Autonomy in Competence-Aware Systems

no code implementations • 17 Mar 2020 • Connor Basich, Justin Svegliato, Kyle Hollins Wray, Stefan Witwicki, Joydeep Biswas, Shlomo Zilberstein

Interest in semi-autonomous systems (SAS) is growing rapidly as a paradigm to deploy autonomous systems in domains that require occasional reliance on humans.

Autonomous Driving

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.