Search Results for author: Faeze Brahman

Found 48 papers, 25 papers with code

“Let Your Characters Tell Their Story”: A Dataset for Character-Centric Narrative Understanding

no code implementations Findings (EMNLP) 2021 Faeze Brahman, Meng Huang, Oyvind Tafjord, Chao Zhao, Mrinmaya Sachan, Snigdha Chaturvedi

When reading a literary piece, readers often make inferences about various characters’ roles, personalities, relationships, intents, actions, etc.

Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences

no code implementations30 May 2025 Mingqian Zheng, Wenjia Hu, Patrick Zhao, Motahhare Eslami, Jena D. Hwang, Faeze Brahman, Carolyn Rose, Maarten Sap

Current LLMs are trained to refuse potentially harmful input queries regardless of whether users actually had harmful intents, causing a tradeoff between safety and user experience.

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

no code implementations20 Apr 2025 Tong Chen, Faeze Brahman, Jiacheng Liu, Niloofar Mireshghallah, Weijia Shi, Pang Wei Koh, Luke Zettlemoyer, Hannaneh Hajishirzi

When applied to the instruction-tuned Tulu3-8B model, ParaPO with system prompting successfully preserves famous quotation recall while reducing unintentional regurgitation (from 8. 7 to 6. 3 in creative writing) when prompted not to regurgitate.

Large-Scale Data Selection for Instruction Tuning

1 code implementation3 Mar 2025 Hamish Ivison, Muru Zhang, Faeze Brahman, Pang Wei Koh, Pradeep Dasigi

However, popular deployed instruction-tuned models often train on hundreds of thousands to millions of samples, subsampled from even larger data pools.

Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

1 code implementation20 Feb 2025 Shuyue Stella Li, Jimin Mun, Faeze Brahman, Jonathan S. Ilgen, Yulia Tsvetkov, Maarten Sap

Large language models (LLMs) often fail to ask effective questions under uncertainty, making them unreliable in domains where proactive information-gathering is essential for decisionmaking.

Attribute Diagnostic

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

no code implementations12 Feb 2025 Paul Röttger, Musashi Hinck, Valentin Hofmann, Kobi Hackenburg, Valentina Pyatkin, Faeze Brahman, Dirk Hovy

Large language models (LLMs) are helping millions of users write texts about diverse issues, and in doing so expose users to different ideas and perspectives.

Multi-Attribute Constraint Satisfaction via Language Model Rewriting

no code implementations26 Dec 2024 Ashutosh Baheti, Debanjana Chakraborty, Faeze Brahman, Ronan Le Bras, Ximing Lu, Nouha Dziri, Yejin Choi, Mark Riedl, Maarten Sap

Thus, we create Multi-Attribute Constraint Satisfaction (MACS), a generalized method capable of finetuning language models on any sequential domain to satisfy user-specified constraints on multiple external real-value attributes.

Attribute Language Modeling +5

RESTOR: Knowledge Recovery through Machine Unlearning

1 code implementation31 Oct 2024 Keivan Rezaei, Khyathi Chandu, Soheil Feizi, Yejin Choi, Faeze Brahman, Abhilasha Ravichander

Large language models trained on web-scale corpora can memorize undesirable datapoints such as incorrect facts, copyrighted content or sensitive data.

Machine Unlearning

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

1 code implementation24 Oct 2024 Lester James V. Miranda, Yizhong Wang, Yanai Elazar, Sachin Kumar, Valentina Pyatkin, Faeze Brahman, Noah A. Smith, Hannaneh Hajishirzi, Pradeep Dasigi

We analyze features from the routing model to identify characteristics of instances that can benefit from human feedback, e. g., prompts with a moderate safety concern or moderate intent complexity.

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

no code implementations13 Sep 2024 Zhe Su, Xuhui Zhou, Sanketh Rangreji, Anubha Kabra, Julia Mendelsohn, Faeze Brahman, Maarten Sap

We design a set of realistic scenarios where language agents are instructed to achieve goals that are in conflict with being truthful during a multi-turn conversation with simulated human agents.

AI Agent Navigate

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

no code implementations25 Jul 2024 JaeHun Jung, Faeze Brahman, Yejin Choi

We present a principled approach to provide LLM-based evaluation with a rigorous guarantee of human agreement.

Chatbot

How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models

no code implementations29 Jun 2024 Jaeyoung Lee, Ximing Lu, Jack Hessel, Faeze Brahman, Youngjae Yu, Yonatan Bisk, Yejin Choi, Saadia Gabriel

Given the growing influx of misinformation across news and social media, there is a critical need for systems that can provide effective real-time verification of news claims.

Fact Checking Misinformation +2

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

3 code implementations26 Jun 2024 Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri

As WildJailbreak considerably upgrades the quality and scale of existing safety resources, it uniquely enables us to examine the scaling effects of data and the interplay of data properties and model capabilities during safety training.

Chatbot Red Teaming

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

1 code implementation7 Jun 2024 Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi

For automated evaluation with WildBench, we have developed two metrics, WB-Reward and WB-Score, which are computable using advanced LLMs such as GPT-4-turbo.

Benchmarking Chatbot

Information-Theoretic Distillation for Reference-less Summarization

no code implementations20 Mar 2024 JaeHun Jung, Ximing Lu, Liwei Jiang, Faeze Brahman, Peter West, Pang Wei Koh, Yejin Choi

The current winning recipe for automatic summarization is using proprietary large-scale language models (LLMs) such as ChatGPT as is, or imitation learning from them as teacher models.

Imitation Learning

Tailoring with Targeted Precision: Edit-Based Agents for Open-Domain Procedure Customization

no code implementations16 Nov 2023 Yash Kumar Lal, Li Zhang, Faeze Brahman, Bodhisattwa Prasad Majumder, Peter Clark, Niket Tandon

Our approach is to test several simple multi-LLM-agent architectures for customization, as well as an end-to-end LLM, using a new evaluation set, called CustomPlans, of over 200 WikiHow procedures each with a customization need.

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

no code implementations14 Nov 2023 Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr

To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning.

Diversity Imitation Learning +1

STEER: Unified Style Transfer with Expert Reinforcement

1 code implementation13 Nov 2023 Skyler Hallinan, Faeze Brahman, Ximing Lu, JaeHun Jung, Sean Welleck, Yejin Choi

We propose STEER: Unified Style Transfer with Expert Reinforcement, a unified frame-work developed to overcome the challenge of limited parallel data for style transfer.

Style Transfer Text Style Transfer

In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search

1 code implementation13 Nov 2023 Huihan Li, Yuting Ning, Zeyi Liao, Siyuan Wang, Xiang Lorraine Li, Ximing Lu, Wenting Zhao, Faeze Brahman, Yejin Choi, Xiang Ren

To effectively use large language models (LLMs) for real-world queries, it is imperative that they generalize to the long-tail distribution, i. e. rare examples where models exhibit low confidence.

Language Modelling Natural Language Inference +1

Agent Lumos: Unified and Modular Training for Open-Source Language Agents

2 code implementations9 Nov 2023 Da Yin, Faeze Brahman, Abhilasha Ravichander, Khyathi Chandu, Kai-Wei Chang, Yejin Choi, Bill Yuchen Lin

To foster generalizable agent learning, we collect large-scale, unified, and high-quality training annotations derived from diverse ground-truth reasoning rationales across various complex interactive tasks.

Math Question Answering

The Generative AI Paradox: "What It Can Create, It May Not Understand"

no code implementations31 Oct 2023 Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi

Specifically, we propose and test the Generative AI Paradox hypothesis: generative models, having been trained directly to reproduce expert-like outputs, acquire generative capabilities that are not contingent upon -- and can therefore exceed -- their ability to understand those same types of outputs.

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

no code implementations24 Oct 2023 Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi

From this model we distill a high-quality dataset, \delta-Rules-of-Thumb, of 1. 2M entries of contextualizations and rationales for 115K defeasible moral actions rated highly by human annotators 85. 9% to 99. 8% of the time.

Diversity Imitation Learning

Affective and Dynamic Beam Search for Story Generation

1 code implementation23 Oct 2023 Tenghao Huang, Ehsan Qasemi, Bangzheng Li, He Wang, Faeze Brahman, Muhao Chen, Snigdha Chaturvedi

Storytelling's captivating potential makes it a fascinating research area, with implications for entertainment, education, therapy, and cognitive studies.

Reranking Sentence +1

Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers

no code implementations22 Sep 2023 Tuhin Chakrabarty, Vishakh Padmakumar, Faeze Brahman, Smaranda Muresan

The development of large language models (LLMs) capable of following instructions and engaging in conversational interactions sparked increased interest in their utilization across various support tools.

Survey

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

2 code implementations NeurIPS 2023 Bill Yuchen Lin, Yicheng Fu, Karina Yang, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Prithviraj Ammanabrolu, Yejin Choi, Xiang Ren

The Swift module is a small encoder-decoder LM fine-tuned on the oracle agent's action trajectories, while the Sage module employs LLMs such as GPT-4 for subgoal planning and grounding.

Decoder

Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing

no code implementations26 May 2023 JaeHun Jung, Peter West, Liwei Jiang, Faeze Brahman, Ximing Lu, Jillian Fisher, Taylor Sorensen, Yejin Choi

We present Impossible Distillation, a novel framework for paraphrasing and sentence summarization, that distills a high-quality dataset and model from a low-quality teacher that itself cannot perform these tasks.

Diversity model +3

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

1 code implementation24 May 2023 Ximing Lu, Faeze Brahman, Peter West, Jaehun Jang, Khyathi Chandu, Abhilasha Ravichander, Lianhui Qin, Prithviraj Ammanabrolu, Liwei Jiang, Sahana Ramnath, Nouha Dziri, Jillian Fisher, Bill Yuchen Lin, Skyler Hallinan, Xiang Ren, Sean Welleck, Yejin Choi

While extreme-scale language models have demonstrated exceptional performance on a variety of language tasks, the degree of control over these language models through pure prompting can often be limited.

Language Modeling Language Modelling +2

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models

1 code implementation24 May 2023 Ashutosh Baheti, Ximing Lu, Faeze Brahman, Ronan Le Bras, Maarten Sap, Mark Riedl

However, RLHF is an unstable and data-hungry process that continually requires new high-quality LM-generated data for finetuning.

Language Modelling Offline RL +3

Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation

no code implementations4 Dec 2022 Faeze Brahman, Baolin Peng, Michel Galley, Sudha Rao, Bill Dolan, Snigdha Chaturvedi, Jianfeng Gao

We propose a new grounded keys-to-text generation task: the task is to generate a factual description about an entity given a set of guiding keys, and grounding passages.

Data-to-Text Generation

NarraSum: A Large-Scale Dataset for Abstractive Narrative Summarization

1 code implementation2 Dec 2022 Chao Zhao, Faeze Brahman, Kaiqiang Song, Wenlin Yao, Dian Yu, Snigdha Chaturvedi

To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset.

Natural Language Understanding

Towards Inter-character Relationship-driven Story Generation

no code implementations1 Nov 2022 Anvesh Rao Vijjini, Faeze Brahman, Snigdha Chaturvedi

In this paper, we introduce the task of modeling interpersonal relationships for story generation.

Sentence Story Generation

Generating Sequences by Learning to Self-Correct

no code implementations31 Oct 2022 Sean Welleck, Ximing Lu, Peter West, Faeze Brahman, Tianxiao Shen, Daniel Khashabi, Yejin Choi

Sequence generation applications require satisfying semantic constraints, such as ensuring that programs are correct, using certain keywords, or avoiding undesirable content.

Language Modeling Language Modelling +1

REV: Information-Theoretic Evaluation of Free-Text Rationales

1 code implementation10 Oct 2022 Hanjie Chen, Faeze Brahman, Xiang Ren, Yangfeng Ji, Yejin Choi, Swabha Swayamdipta

More concretely, we propose a metric called REV (Rationale Evaluation with conditional V-information), to quantify the amount of new, label-relevant information in a rationale beyond the information already available in the input or the label.

Revisiting Generative Commonsense Reasoning: A Pre-Ordering Approach

1 code implementation Findings (NAACL) 2022 Chao Zhao, Faeze Brahman, Tenghao Huang, Snigdha Chaturvedi

In particular, we hypothesize that the order of the input concepts can affect the PTM's ability to utilize its commonsense knowledge.

Sentence Text Generation

Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations

no code implementations24 May 2022 JaeHun Jung, Lianhui Qin, Sean Welleck, Faeze Brahman, Chandra Bhagavatula, Ronan Le Bras, Yejin Choi

Despite their impressive capabilities, large pre-trained language models (LMs) struggle with consistent reasoning; recently, prompting LMs to generate explanations that self-guide the inference has emerged as a promising direction to amend this.

Uncovering Implicit Gender Bias in Narratives through Commonsense Inference

1 code implementation Findings (EMNLP) 2021 Tenghao Huang, Faeze Brahman, Vered Shwartz, Snigdha Chaturvedi

Pre-trained language models learn socially harmful biases from their training corpora, and may repeat these biases when used for generation.

"Let Your Characters Tell Their Story": A Dataset for Character-Centric Narrative Understanding

no code implementations12 Sep 2021 Faeze Brahman, Meng Huang, Oyvind Tafjord, Chao Zhao, Mrinmaya Sachan, Snigdha Chaturvedi

When reading a literary piece, readers often make inferences about various characters' roles, personalities, relationships, intents, actions, etc.

Is Everything in Order? A Simple Way to Order Sentences

1 code implementation EMNLP 2021 Somnath Basu Roy Chowdhury, Faeze Brahman, Snigdha Chaturvedi

We perform evaluations in a zero-shot setting, showcasing that our model is able to generalize well across other datasets.

Conditional Text Generation Sentence +1

Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision

no code implementations14 Dec 2020 Faeze Brahman, Vered Shwartz, Rachel Rudinger, Yejin Choi

In this paper, we investigate the extent to which neural models can reason about natural language rationales that explain model predictions, relying only on distant supervision with no additional annotation cost for human-written rationales.

Cue Me In: Content-Inducing Approaches to Interactive Story Generation

no code implementations Asian Chapter of the Association for Computational Linguistics 2020 Faeze Brahman, Alexandru Petrusca, Snigdha Chaturvedi

Previous approaches in this domain have focused largely on one-shot generation, where a language model outputs a complete story based on limited initial input from a user.

Language Modeling Language Modelling +2

Cannot find the paper you are looking for? You can Submit a new open access paper.