1 code implementation • 21 Jun 2024 • Asa Cooper Stickland, Alexander Lyzhov, Jacob Pfau, Salsabila Mahdi, Samuel R. Bowman
To demonstrate the generality and transferability of our method beyond jailbreaks, we show that our KTS model can be steered to reduce bias towards user-suggested answers on TruthfulQA.
no code implementations • 15 Jun 2023 • Ian R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zhengping Zhou, Najoung Kim, Samuel R. Bowman, Ethan Perez
Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e. g., due to flaws in the training objective and data.
no code implementations • 27 Nov 2021 • Julian Stastny, Maxime Riché, Alexander Lyzhov, Johannes Treutlein, Allan Dafoe, Jesse Clifton
However, the mixed-motive environments typically studied have a single cooperative outcome on which all agents can agree.
1 code implementation • 21 Feb 2020 • Dmitry Molchanov, Alexander Lyzhov, Yuliya Molchanova, Arsenii Ashukha, Dmitry Vetrov
Test-time data augmentation$-$averaging the predictions of a machine learning model across multiple augmented samples of data$-$is a widely used technique that improves the predictive performance.
2 code implementations • ICLR 2020 • Arsenii Ashukha, Alexander Lyzhov, Dmitry Molchanov, Dmitry Vetrov
Uncertainty estimation and ensembling methods go hand-in-hand.