Search Results for author: Victor Gallego

Found 13 papers, 12 papers with code

Merging Improves Self-Critique Against Jailbreak Attacks

1 code implementation11 Jun 2024 Victor Gallego

The robustness of large language models (LLMs) against adversarial manipulations, such as jailbreak attacks, remains a significant challenge.

Configurable Safety Tuning of Language Models with Synthetic Preference Data

1 code implementation30 Mar 2024 Victor Gallego

State-of-the-art language model fine-tuning techniques, such as Direct Preference Optimization (DPO), restrict user control by hard-coding predefined behaviors into the model.

Language Modelling

Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian Perspective

1 code implementation4 Dec 2023 Victor Gallego

This paper proposes an interpretation of RLAIF as Bayesian inference by introducing distilled Self-Critique (dSC), which refines the outputs of a LLM through a Gibbs sampler that is later distilled into a fine-tuned model.

Bayesian Inference

ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF

2 code implementations11 Aug 2023 Victor Gallego

In this work, we address the problem of directing the text generation of a language model (LM) towards a desired behavior, aligning the generated text with the preferences of the human operator.

Attribute Diversity +2

Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image Classification and Generation

no code implementations15 Jul 2023 Victor Gallego

Recently, large multimodal models, such as CLIP and Stable Diffusion have experimented tremendous successes in both foundations and applications.

Image Classification

Personalizing Text-to-Image Generation via Aesthetic Gradients

1 code implementation25 Sep 2022 Victor Gallego

This work proposes aesthetic gradients, a method to personalize a CLIP-conditioned diffusion model by guiding the generative process towards custom aesthetics defined by the user from a set of images.

Text-to-Image Generation

Protecting Classifiers From Attacks. A Bayesian Approach

1 code implementation18 Apr 2020 Victor Gallego, Roi Naveiro, Alberto Redondo, David Rios Insua, Fabrizio Ruggeri

Classification problems in security settings are usually modeled as confrontations in which an adversary tries to fool a classifier manipulating the covariates of instances to obtain a benefit.

Adversarial Machine Learning: Bayesian Perspectives

1 code implementation7 Mar 2020 David Rios Insua, Roi Naveiro, Victor Gallego, Jason Poulos

Adversarial Machine Learning (AML) is emerging as a major field aimed at protecting machine learning (ML) systems against security threats: in certain scenarios there may be adversaries that actively manipulate input data to fool learning systems.

Adversarial Robustness BIG-bench Machine Learning

Variationally Inferred Sampling Through a Refined Bound

1 code implementation pproximateinference AABI Symposium 2019 Victor Gallego, David Rios Insua

A framework for efficient Bayesian inference in probabilistic programs is introduced by embedding a sampler inside a variational posterior approximation.

Bayesian Inference Density Estimation +2

Variationally Inferred Sampling Through a Refined Bound for Probabilistic Programs

1 code implementation26 Aug 2019 Victor Gallego, David Rios Insua

A framework to boost the efficiency of Bayesian inference in probabilistic programs is introduced by embedding a sampler inside a variational posterior approximation.

Bayesian Inference Density Estimation +2

Opponent Aware Reinforcement Learning

1 code implementation22 Aug 2019 Victor Gallego, Roi Naveiro, David Rios Insua, David Gomez-Ullate Oteiza

We introduce Threatened Markov Decision Processes (TMDPs) as an extension of the classical Markov Decision Process framework for Reinforcement Learning (RL).

reinforcement-learning Reinforcement Learning +1

Stochastic Gradient MCMC with Repulsive Forces

2 code implementations30 Nov 2018 Victor Gallego, David Rios Insua

We propose a unifying view of two different Bayesian inference algorithms, Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) and Stein Variational Gradient Descent (SVGD), leading to improved and efficient novel sampling schemes.

Bayesian Inference valid

Reinforcement Learning under Threats

1 code implementation5 Sep 2018 Victor Gallego, Roi Naveiro, David Rios Insua

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process.

reinforcement-learning Reinforcement Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.