Search Results

IPO: Interpretable Prompt Optimization for Vision-Language Models

1 code implementation20 Oct 2024

Pre-trained vision-language models like CLIP have remarkably adapted to various downstream tasks.

Prompt Learning Specificity

IPO: Your Language Model is Secretly a Preference Classifier

1 code implementation22 Feb 2025

Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences.

Language Modeling Language Modelling

SCQPTH: an efficient differentiable splitting method for convex quadratic programming

2 code implementations16 Aug 2023

We present SCQPTH: a differentiable first-order splitting method for convex quadratic programs.

Computational Efficiency

Self-Play Preference Optimization for Language Model Alignment

1 code implementation1 May 2024

In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy.

Language Modeling Language Modelling +1

Sample-Efficient Alignment for LLMs

1 code implementation3 Nov 2024

The results demonstrate that SEA achieves highly sample-efficient alignment with oracle's preferences, outperforming recent active exploration methods for LLMs.

Thompson Sampling

Evidence of Crowding on Russell 3000 Reconstitution Events

1 code implementation12 Jun 2020

We develop a methodology which replicates in great accuracy the FTSE Russell indexes reconstitutions, including the quarterly rebalancings due to new initial public offerings (IPOs).

Efficient Adversarial Training in LLMs with Continuous Attacks

1 code implementation24 May 2024

We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses: the first makes the model robust on continuous embedding attacks computed on an adversarial behaviour dataset; the second ensures the usefulness of the final model by fine-tuning on utility data.

An Empirical Study of Capital Asset Pricing Model based on Chinese A-share Trading Data

1 code implementation1 Apr 2023

This paper presents an empirical analysis of the capital asset pricing model using trading data for the Chinese A-share market from 2000 to 2019.

regression

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

1 code implementation1 Jun 2020

A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domains experienced during training.

Policy Gradient Methods reinforcement-learning +2

Triple Preference Optimization: Achieving Better Alignment with Less Data in a Single Step Optimization

1 code implementation26 May 2024

However, while RL-free methods deliver satisfactory performance, they require significant data to develop a robust Supervised Fine-Tuned (SFT) model and an additional step to fine-tune this model on a preference dataset, which constrains their utility and scalability.

Reinforcement Learning (RL)