Search Results for author: Liwei Jiang

Found 36 papers, 18 papers with code

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

no code implementations24 Jul 2024 Wenting Zhao, Tanya Goyal, Yu Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, Khyathi Chandu, Ronan Le Bras, Claire Cardie, Yuntian Deng, Yejin Choi

While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about.

Chatbot Hallucination +1

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

1 code implementation26 Jun 2024 Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, Nouha Dziri

We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate.

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

1 code implementation26 Jun 2024 Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri

As WildJailbreak considerably upgrades the quality and scale of existing safety resources, it uniquely enables us to examine the scaling effects of data and the interplay of data properties and model capabilities during safety training.

Chatbot

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

1 code implementation16 Apr 2024 Huihan Li, Liwei Jiang, Jena D. Hwang, Hyunwoo Kim, Sebastin Santy, Taylor Sorensen, Bill Yuchen Lin, Nouha Dziri, Xiang Ren, Yejin Choi

As the utilization of large language models (LLMs) has proliferated world-wide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures.

Diversity Fairness

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

no code implementations10 Apr 2024 Yu Ying Chiu, Liwei Jiang, Maria Antoniak, Chan Young Park, Shuyue Stella Li, Mehar Bhatia, Sahithya Ravi, Yulia Tsvetkov, Vered Shwartz, Yejin Choi

Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions, that modern LLMs fail at, in a gamified manner.

Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits

1 code implementation21 Mar 2024 Jimin Mun, Liwei Jiang, Jenny Liang, Inyoung Cheong, Nicole DeCario, Yejin Choi, Tadayoshi Kohno, Maarten Sap

General purpose AI, such as ChatGPT, seems to have lowered the barriers for the public to use AI and harness its power.

Information-Theoretic Distillation for Reference-less Summarization

no code implementations20 Mar 2024 JaeHun Jung, Ximing Lu, Liwei Jiang, Faeze Brahman, Peter West, Pang Wei Koh, Yejin Choi

The current winning recipe for automatic summarization is using proprietary large-scale language models (LLMs) such as ChatGPT as is, or imitation learning from them as teacher models.

Imitation Learning

JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models

1 code implementation13 Feb 2024 Jillian Fisher, Ximing Lu, JaeHun Jung, Liwei Jiang, Zaid Harchaoui, Yejin Choi

The permanence of online content combined with the enhanced authorship identification techniques calls for stronger computational methods to protect the identity and privacy of online authorship when needed, e. g., blind reviews for scientific papers, anonymous online reviews, or anonymous interactions in the mental health forums.

A Roadmap to Pluralistic Alignment

1 code implementation7 Feb 2024 Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, Yejin Choi

We identify and formalize three possible ways to define and operationalize pluralism in AI systems: 1) Overton pluralistic models that present a spectrum of reasonable responses; 2) Steerably pluralistic models that can steer to reflect certain perspectives; and 3) Distributionally pluralistic models that are well-calibrated to a given population in distribution.

The Generative AI Paradox: "What It Can Create, It May Not Understand"

no code implementations31 Oct 2023 Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi

Specifically, we propose and test the Generative AI Paradox hypothesis: generative models, having been trained directly to reproduce expert-like outputs, acquire generative capabilities that are not contingent upon -- and can therefore exceed -- their ability to understand those same types of outputs.

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

no code implementations24 Oct 2023 Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi

From this model we distill a high-quality dataset, \delta-Rules-of-Thumb, of 1. 2M entries of contextualizations and rationales for 115K defeasible moral actions rated highly by human annotators 85. 9% to 99. 8% of the time.

Diversity Imitation Learning

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms

1 code implementation16 Oct 2023 Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu

NORMLENS consists of 10K human judgments accompanied by free-form explanations covering 2K multimodal situations, and serves as a probe to address two questions: (1) to what extent can models align with average human judgment?

2k

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

1 code implementation12 Oct 2023 Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren

The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence.

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

1 code implementation2 Sep 2023 Taylor Sorensen, Liwei Jiang, Jena Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Yejin Choi

To improve AI systems to better reflect value pluralism, the first-order challenge is to explore the extent to which AI systems can model pluralistic human values, rights, and duties as well as their interaction.

Decision Making

Faith and Fate: Limits of Transformers on Compositionality

1 code implementation NeurIPS 2023 Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi

We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures.

Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing

no code implementations26 May 2023 JaeHun Jung, Peter West, Liwei Jiang, Faeze Brahman, Ximing Lu, Jillian Fisher, Taylor Sorensen, Yejin Choi

We present Impossible Distillation, a novel framework for paraphrasing and sentence summarization, that distills a high-quality dataset and model from a low-quality teacher that itself cannot perform these tasks.

Diversity Paraphrase Generation +2

BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases

no code implementations23 May 2023 Yiming Zhang, Sravani Nanduri, Liwei Jiang, Tongshuang Wu, Maarten Sap

Toxicity annotators and content moderators often default to mental shortcuts when making decisions.

Asymptotic normality and optimality in nonsmooth stochastic approximation

no code implementations16 Jan 2023 Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang

In their seminal work, Polyak and Juditsky showed that stochastic approximation algorithms for solving smooth equations enjoy a central limit theorem.

Open-Ended Question Answering

A Validation Approach to Over-parameterized Matrix and Image Recovery

no code implementations21 Sep 2022 Lijun Ding, Zhen Qin, Liwei Jiang, Jinxin Zhou, Zhihui Zhu

In this paper, we study the problem of recovering a low-rank matrix from a number of noisy random linear measurements.

Image Restoration

ProsocialDialog: A Prosocial Backbone for Conversational Agents

1 code implementation25 May 2022 Hyunwoo Kim, Youngjae Yu, Liwei Jiang, Ximing Lu, Daniel Khashabi, Gunhee Kim, Yejin Choi, Maarten Sap

With this dataset, we introduce a dialogue safety detection module, Canary, capable of generating RoTs given conversational context, and a socially-informed dialogue agent, Prost.

Dialogue Generation Dialogue Safety Prediction +2

Aligning to Social Norms and Values in Interactive Narratives

no code implementations NAACL 2022 Prithviraj Ammanabrolu, Liwei Jiang, Maarten Sap, Hannaneh Hajishirzi, Yejin Choi

We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games -- environments wherein an agent perceives and interacts with a world through natural language.

text-based games

Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization

no code implementations6 Mar 2022 Liwei Jiang, Yudong Chen, Lijun Ding

We study the asymmetric matrix factorization problem under a natural nonconvex formulation with arbitrary overparametrization.

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

1 code implementation NAACL 2022 Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Yejin Choi

To enable constrained generation, we build on NeuroLogic decoding (Lu et al., 2021), combining its flexibility in incorporating logical constraints with A*esque estimates of future constraint satisfaction.

Machine Translation Table-to-Text Generation

Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery

no code implementations NeurIPS 2021 Lijun Ding, Liwei Jiang, Yudong Chen, Qing Qu, Zhihui Zhu

We study the robust recovery of a low-rank matrix from sparsely and grossly corrupted Gaussian measurements, with no prior knowledge on the intrinsic rank.

Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization

no code implementations26 Aug 2021 Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang

We show that the subgradient method converges only to local minimizers when applied to generic Lipschitz continuous and subdifferentially regular functions that are definable in an o-minimal structure.

``I'm Not Mad'': Commonsense Implications of Negation and Contradiction

no code implementations NAACL 2021 Liwei Jiang, Antoine Bosselut, Chandra Bhagavatula, Yejin Choi

In this paper, we present the first comprehensive study focusing on commonsense implications of negated statements and contradictions.

Natural Language Inference Negation

"I'm Not Mad": Commonsense Implications of Negation and Contradiction

no code implementations13 Apr 2021 Liwei Jiang, Antoine Bosselut, Chandra Bhagavatula, Yejin Choi

In this paper, we present the first comprehensive study focusing on commonsense implications of negated statements and contradictions.

Natural Language Inference Negation

AUL is a better optimization metric in PU learning

no code implementations1 Jan 2021 Shangchuan Huang, Songtao Wang, Dan Li, Liwei Jiang

Recent works try to recover the unbiased result by estimating the proportion of positive samples with mixture proportion estimation (MPE) algorithms, but the model performance is still limited and heavy computational cost is introduced (particularly for big datasets).

Binary Classification

Improving Positive Unlabeled Learning: Practical AUL Estimation and New Training Method for Extremely Imbalanced Data Sets

no code implementations21 Apr 2020 Liwei Jiang, Dan Li, Qisheng Wang, Shuai Wang, Songtao Wang

Secondly, we propose ProbTagging, a new training method for extremely imbalanced data sets, where the number of unlabeled samples is hundreds or thousands of times that of positive samples.

Cannot find the paper you are looking for? You can Submit a new open access paper.