Gandalf the Red: Adaptive Security for LLMs

no code implementations14 Jan 2025 Niklas Pfister, Václav Volhejn, Manuel Knott, Santiago Arias, Julia Bazińska, Mykhailo Bichurin, Alan Commike, Janet Darling, Peter Dienes, Matthew Fiedler, David Haber, Matthias Kraft, Marco Lancini, Max Mathys, Damián Pascual-Ortiz, Jakub Podolak, Adrià Romero-López, Kyriacos Shiarlis, Andreas Signer, Zsolt Terek, Athanasios Theocharis, Daniel Timbrell, Samuel Trautwein, Samuel Watts, Natalie Wu, Mateo Rojas-Carulla

Current evaluations of defenses against prompt attacks in large language model (LLM) applications often overlook two critical factors: the dynamic nature of adversarial behavior and the usability penalties imposed on legitimate users by restrictive defenses.

Blocking Language Modeling +3

GeNet: Deep Representations for Metagenomics

5 code implementations30 Jan 2019 Mateo Rojas-Carulla, Ilya Tolstikhin, Guillermo Luque, Nicholas Youngblut, Ruth Ley, Bernhard Schölkopf

We introduce GeNet, a method for shotgun metagenomic classification from raw DNA sequences that exploits the known hierarchical structure between labels for training.

General Classification

Learning Independent Causal Mechanisms

1 code implementation ICML 2018 Giambattista Parascandolo, Niki Kilbertus, Mateo Rojas-Carulla, Bernhard Schölkopf

The approach is unsupervised and based on a set of experts that compete for data generated by the mechanisms, driving specialization.

Transfer Learning

Avoiding Discrimination through Causal Reasoning

no code implementations NeurIPS 2017 Niki Kilbertus, Mateo Rojas-Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, Bernhard Schölkopf

Going beyond observational criteria, we frame the problem of discrimination based on protected attributes in the language of causal reasoning.

Attribute Fairness

Causal Discovery Using Proxy Variables

no code implementations23 Feb 2017 Mateo Rojas-Carulla, Marco Baroni, David Lopez-Paz

In this paper, we develop a framework to estimate the cause-effect relation between two static entities $x$ and $y$: for instance, an art masterpiece $x$ and its fraudulent copy $y$.

Causal Discovery Relation

Invariant Models for Causal Transfer Learning

1 code implementation19 Jul 2015 Mateo Rojas-Carulla, Bernhard Schölkopf, Richard Turner, Jonas Peters

We focus on the problem of Domain Generalization, in which no examples from the test task are observed.

Domain Generalization Transfer Learning

