Search Results for author: Javier Rando

Found 10 papers, 5 papers with code

Universal Jailbreak Backdoors from Poisoned Human Feedback

2 code implementations24 Nov 2023 Javier Rando, Florian Tramèr

Reinforcement Learning from Human Feedback (RLHF) is used to align large language models to produce helpful and harmless responses.

Backdoor Attack

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

no code implementations6 Nov 2023 Rusheb Shah, Quentin Feuillade--Montixi, Soroush Pour, Arush Tagade, Stephen Casper, Javier Rando

Despite efforts to align large language models to produce harmless responses, they are still vulnerable to jailbreak prompts that elicit unrestricted behaviour.

Language Modelling

Personas as a Way to Model Truthfulness in Language Models

no code implementations27 Oct 2023 Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, He He

This allows the model to separate truth from falsehoods and controls the truthfulness of its generation.

PassGPT: Password Modeling and (Guided) Generation with Large Language Models

1 code implementation2 Jun 2023 Javier Rando, Fernando Perez-Cruz, Briland Hitaj

Large language models (LLMs) successfully model natural language from vast amounts of text without the need for explicit supervision.

Red-Teaming the Stable Diffusion Safety Filter

no code implementations3 Oct 2022 Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, Florian Tramèr

We then reverse-engineer the filter and find that while it aims to prevent sexual content, it ignores violence, gore, and other similarly disturbing content.

Image Generation

Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO

1 code implementation14 Jun 2022 Javier Rando, Nasib Naimi, Thomas Baumann, Max Mathys

This work conducts the first analysis on the robustness against adversarial attacks on self-supervised Vision Transformers trained using DINO.

Adversarial Robustness

Uneven Coverage of Natural Disasters in Wikipedia: the Case of Flood

1 code implementation23 Jan 2020 Valerio Lorini, Javier Rando, Diego Saez-Trumper, Carlos Castillo

We also note how coverage of floods in countries with the lowest income, as well as countries in South America, is substantially lower than the coverage of floods in middle-income countries.

Disaster Response Management

Cannot find the paper you are looking for? You can Submit a new open access paper.