2 code implementations • 28 Mar 2024 • Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work (Zou et al., 2023; Mazeika et al., 2023, 2024) -- which align with OpenAI's usage policies; (3) a standardized evaluation framework at https://github. com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at https://jailbreakbench. github. io/ that tracks the performance of attacks and defenses for various LLMs.

no code implementations • 28 Mar 2024 • Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter

Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts.

no code implementations • 7 Mar 2024 • Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems.

1 code implementation • 25 Feb 2024 • Jiabao Ji, Bairu Hou, Alexander Robey, George J. Pappas, Hamed Hassani, Yang Zhang, Eric Wong, Shiyu Chang

Aligned large language models (LLMs) are vulnerable to jailbreaking attacks, which bypass the safeguards of targeted LLMs and fool them into generating objectionable content.

no code implementations • 11 Dec 2023 • Thomas Waite, Alexander Robey, Hassani Hamed, George J. Pappas, Radoslav Ivanov

This paper addresses the problem of data-driven modeling and verification of perception-based autonomous systems.

1 code implementation • 12 Oct 2023 • Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

PAIR -- which is inspired by social engineering attacks -- uses an attacker LLM to automatically generate jailbreaks for a separate targeted LLM without human intervention.

1 code implementation • 5 Oct 2023 • Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas

Despite efforts to align large language models (LLMs) with human intentions, widely-used LLMs such as GPT, Llama, and Claude are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content.

no code implementations • 19 Jun 2023 • Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher

One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data.

2 code implementations • 20 Jul 2022 • Cian Eastwood, Alexander Robey, Shashank Singh, Julius von Kügelgen, Hamed Hassani, George J. Pappas, Bernhard Schölkopf

By minimizing the $\alpha$-quantile of predictor's risk distribution over domains, QRM seeks predictors that perform well with probability $\alpha$.

1 code implementation • 8 Jun 2022 • Haoze Wu, Teruhiro Tagomori, Alexander Robey, Fengjun Yang, Nikolai Matni, George Pappas, Hamed Hassani, Corina Pasareanu, Clark Barrett

We consider the problem of certifying the robustness of deep neural networks against real-world distribution shifts.

1 code implementation • 2 Apr 2022 • Anton Xue, Lars Lindemann, Alexander Robey, Hamed Hassani, George J. Pappas, Rajeev Alur

Lipschitz constants of neural networks allow for guarantees of robustness in image classification, safety in controller design, and generalizability beyond the training data.

1 code implementation • ICLR 2022 • Allan Zhou, Fahim Tajwar, Alexander Robey, Tom Knowles, George J. Pappas, Hamed Hassani, Chelsea Finn

Based on this analysis, we show how a generative approach for learning the nuisance transformations can help transfer invariances across classes and improve performance on a set of imbalanced image classification benchmarks.

Ranked #22 on Long-tail Learning on CIFAR-10-LT (ρ=100)

1 code implementation • 2 Feb 2022 • Alexander Robey, Luiz F. O. Chamon, George J. Pappas, Hamed Hassani

From a theoretical point of view, this framework overcomes the trade-offs between the performance and the sample-complexity of worst-case and average-case learning.

1 code implementation • 18 Nov 2021 • Lars Lindemann, Alexander Robey, Lejun Jiang, Satyajeet Das, Stephen Tu, Nikolai Matni

Along with the optimization problem, we provide verifiable conditions in terms of the density of the data, smoothness of the system model and state estimator, and the size of the error bounds that guarantee validity of the obtained ROCBF.

no code implementations • NeurIPS 2021 • Alexander Robey, Luiz F. O. Chamon, George J. Pappas, Hamed Hassani, Alejandro Ribeiro

In particular, we leverage semi-infinite optimization and non-convex duality theory to show that adversarial training is equivalent to a statistical problem over perturbation distributions, which we characterize completely.

1 code implementation • NeurIPS 2021 • Alexander Robey, George J. Pappas, Hamed Hassani

Despite remarkable success in a variety of applications, it is well-known that deep learning can fail catastrophically when presented with out-of-distribution data.

no code implementations • 18 Feb 2021 • Stephen Tu, Alexander Robey, Tingnan Zhang, Nikolai Matni

We study the following question in the context of imitation learning for continuous control: how are the underlying stability properties of an expert policy reflected in the sample-complexity of an imitation learning task?

1 code implementation • 16 Jan 2021 • Alexander Robey, Lars Lindemann, Stephen Tu, Nikolai Matni

We identify sufficient conditions on the data such that feasibility of the optimization problem ensures correctness of the learned robust hybrid control barrier functions.

no code implementations • 8 Nov 2020 • Lars Lindemann, Haimin Hu, Alexander Robey, Hanwen Zhang, Dimos V. Dimarogonas, Stephen Tu, Nikolai Matni

Motivated by the lack of systematic tools to obtain safe control laws for hybrid systems, we propose an optimization-based framework for learning certifiably safe control laws from data.

no code implementations • 9 Jun 2020 • Edgar Dobriban, Hamed Hassani, David Hong, Alexander Robey

It is well known that machine learning methods can be vulnerable to adversarially-chosen perturbations of their inputs.

1 code implementation • 20 May 2020 • Alexander Robey, Hamed Hassani, George J. Pappas

Indeed, natural variation such as lighting or weather conditions can significantly degrade the accuracy of trained neural networks, proving that such natural variation presents a significant challenge for deep learning.

1 code implementation • 7 Apr 2020 • Alexander Robey, Haimin Hu, Lars Lindemann, Hanwen Zhang, Dimos V. Dimarogonas, Stephen Tu, Nikolai Matni

Furthermore, if the CBF parameterization is convex, then under mild assumptions, so is our learning process.

no code implementations • 30 Sep 2019 • Alexander Robey, Arman Adibi, Brent Schlotfeldt, George J. Pappas, Hamed Hassani

Given this distributed setting, we develop Constraint-Distributed Continuous Greedy (CDCG), a message passing algorithm that converges to the tight $(1-1/e)$ approximation factor of the optimum global solution using only local computation and communication.

1 code implementation • NeurIPS 2019 • Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, George J. Pappas

The resulting SDP can be adapted to increase either the estimation accuracy (by capturing the interaction between activation functions of different layers) or scalability (by decomposition and parallel implementation).

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.