Search Results for author: Tomasz Korbak

Found 22 papers, 13 papers with code

Towards Understanding Sycophancy in Language Models

1 code implementation20 Oct 2023 Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez

Overall, our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.

Text Generation

Compositional preference models for aligning LMs

no code implementations17 Oct 2023 Dongyoung Go, Tomasz Korbak, Germán Kruszewski, Jos Rozen, Marc Dymetman

As language models (LMs) become more capable, it is increasingly important to align them with human preferences.

Improving Code Generation by Training with Natural Language Feedback

1 code implementation28 Mar 2023 Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez

The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development.

Code Generation Imitation Learning +1

Aligning Language Models with Preferences through f-divergence Minimization

1 code implementation16 Feb 2023 Dongyoung Go, Tomasz Korbak, Germán Kruszewski, Jos Rozen, Nahyeon Ryu, Marc Dymetman

We show that Jensen-Shannon divergence strikes a good balance between these objectives, and frequently outperforms forward KL divergence by a wide margin, leading to significant improvements over prior work.

Pretraining Language Models with Human Preferences

1 code implementation16 Feb 2023 Tomasz Korbak, Kejian Shi, Angelica Chen, Rasika Bhalerao, Christopher L. Buckley, Jason Phang, Samuel R. Bowman, Ethan Perez

Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more.

Imitation Learning Language Modelling

On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

2 code implementations1 Jun 2022 Tomasz Korbak, Hady Elsahar, Germán Kruszewski, Marc Dymetman

Here we explore the theoretical connections between the two paradigms, and show that methods such as KL-control developed for RM can also be construed as belonging to DM.

Language Modelling Reinforcement Learning (RL) +1

RL with KL penalties is better viewed as Bayesian inference

no code implementations23 May 2022 Tomasz Korbak, Ethan Perez, Christopher L Buckley

We show that KL-regularised RL is equivalent to variational inference: approximating a Bayesian posterior which specifies how to update a prior LM to conform with evidence provided by the reward function.

Bayesian Inference Language Modelling +2

A continuity of Markov blanket interpretations under the Free Energy Principle

no code implementations18 Jan 2022 Anil Seth, Tomasz Korbak, Alexander Tschantz

Bruineberg and colleagues helpfully distinguish between instrumental and ontological interpretations of Markov blankets, exposing the dangers of using the former to make claims about the latter.

Controlling Conditional Language Models without Catastrophic Forgetting

1 code implementation1 Dec 2021 Tomasz Korbak, Hady Elsahar, German Kruszewski, Marc Dymetman

Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks.

Abstractive Text Summarization Code Generation

On Reward Maximization and Distribution Matching for Fine-Tuning Language Models

no code implementations29 Sep 2021 Tomasz Korbak, Hady Elsahar, Germán Kruszewski, Marc Dymetman

The availability of large pre-trained models is changing the landscape of Machine Learning research and practice, moving from a "training from scratch" to a "fine-tuning'' paradigm.

Language Modelling Reinforcement Learning (RL) +1

Energy-Based Models for Code Generation under Compilability Constraints

1 code implementation9 Jun 2021 Tomasz Korbak, Hady Elsahar, Marc Dymetman, Germán Kruszewski

Neural language models can be successfully trained on source code, leading to applications such as code completion.

Code Completion Code Generation

Measuring non-trivial compositionality in emergent communication

1 code implementation28 Oct 2020 Tomasz Korbak, Julian Zubek, Joanna Rączaszek-Leonardi

Compositionality is an important explanatory target in emergent communication and language evolution.

Fine-tuning Tree-LSTM for phrase-level sentiment classification on a Polish dependency treebank. Submission to PolEval task 2

1 code implementation3 Nov 2017 Tomasz Korbak, Paulina Żak

We describe a variant of Child-Sum Tree-LSTM deep neural network (Tai et al, 2015) fine-tuned for working with dependency trees and morphologically rich languages using the example of Polish.

General Classification Sentiment Analysis +3

Cannot find the paper you are looking for? You can Submit a new open access paper.