Search Results for author: Alexander Lyzhov

Found 5 papers, 3 papers with code

Steering Without Side Effects: Improving Post-Deployment Control of Language Models

1 code implementation21 Jun 2024 Asa Cooper Stickland, Alexander Lyzhov, Jacob Pfau, Salsabila Mahdi, Samuel R. Bowman

To demonstrate the generality and transferability of our method beyond jailbreaks, we show that our KTS model can be steered to reduce bias towards user-suggested answers on TruthfulQA.

Red Teaming TruthfulQA

Normative Disagreement as a Challenge for Cooperative AI

no code implementations27 Nov 2021 Julian Stastny, Maxime Riché, Alexander Lyzhov, Johannes Treutlein, Allan Dafoe, Jesse Clifton

However, the mixed-motive environments typically studied have a single cooperative outcome on which all agents can agree.

Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation

1 code implementation21 Feb 2020 Dmitry Molchanov, Alexander Lyzhov, Yuliya Molchanova, Arsenii Ashukha, Dmitry Vetrov

Test-time data augmentation$-$averaging the predictions of a machine learning model across multiple augmented samples of data$-$is a widely used technique that improves the predictive performance.

Data Augmentation Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.