Search Results for author: Suraj Anand

Found 3 papers, 0 papers with code

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

no code implementations28 May 2024 Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick

Hence, we study $\textbf{structural in-context learning}$, which we define as the ability of a model to execute in-context learning on arbitrary tokens -- so called because the model must generalize on the basis of e. g. sentence structure or task structure, rather than semantic content encoded in token embeddings.

In-Context Learning

Are PPO-ed Language Models Hackable?

no code implementations28 May 2024 Suraj Anand, David Getzen

Numerous algorithms have been proposed to $\textit{align}$ language models to remove undesirable behaviors.

Text Generation

Suppressing Pink Elephants with Direct Principle Feedback

no code implementations12 Feb 2024 Louis Castricato, Nathan Lile, Suraj Anand, Hailey Schoelkopf, Siddharth Verma, Stella Biderman

Existing methods for controlling language models, such as RLHF and Constitutional AI, involve determining which LLM behaviors are desirable and training them into a language model.

Language Modeling Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.