Search Results for author: Xuhui Zhou

Found 30 papers, 14 papers with code

Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective

no code implementations15 Apr 2025 Qiaosi Wang, Xuhui Zhou, Maarten Sap, Jodi Forlizzi, Hong Shen

The last couple of years have witnessed emerging research that appropriates Theory-of-Mind (ToM) tasks designed for humans to benchmark LLM's ToM capabilities as an indication of LLM's social intelligence.

Interactive Agents to Overcome Ambiguity in Software Engineering

1 code implementation18 Feb 2025 Sanidhya Vijayvargiya, Xuhui Zhou, Akhila Yerukola, Maarten Sap, Graham Neubig

AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.

Code Generation

AutoPresent: Designing Structured Visuals from Scratch

1 code implementation1 Jan 2025 Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell

We benchmark end-to-end image generation and program generation methods with a variety of models, and find that programmatic methods produce higher-quality slides in user-interactable formats.

Image Generation

BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data

no code implementations21 Oct 2024 Wenkai Li, Jiarui Liu, Andy Liu, Xuhui Zhou, Mona Diab, Maarten Sap

In this work, we tackle the challenge of embedding realistic human personality traits into LLMs.

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

no code implementations13 Sep 2024 Zhe Su, Xuhui Zhou, Sanketh Rangreji, Anubha Kabra, Julia Mendelsohn, Faeze Brahman, Maarten Sap

We design a set of realistic scenarios where language agents are instructed to achieve goals that are in conflict with being truthful during a multi-turn conversation with simulated human agents.

AI Agent Navigate

On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents

1 code implementation2 Aug 2024 Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Michael R. Lyu, Maarten Sap

Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents, each focusing on a specific domain.

Code Generation Large Language Model +1

PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models

2 code implementations15 May 2024 Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas Hartvigsen, Maarten Sap

Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations.

Benchmarking

Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

no code implementations8 Mar 2024 Xuhui Zhou, Zhe Su, Tiwalayo Eisape, Hyunwoo Kim, Maarten Sap

Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena.

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

no code implementations24 Oct 2023 Hyunwoo Kim, Melanie Sclar, Xuhui Zhou, Ronan Le Bras, Gunhee Kim, Yejin Choi, Maarten Sap

Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity.

Question Answering

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

2 code implementations18 Oct 2023 Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap

We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and evaluate their social intelligence.

WebArena: A Realistic Web Environment for Building Autonomous Agents

1 code implementation25 Jul 2023 Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig

Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.

COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements

no code implementations3 Jun 2023 Xuhui Zhou, Hao Zhu, Akhila Yerukola, Thomas Davidson, Jena D. Hwang, Swabha Swayamdipta, Maarten Sap

To study the contextual dynamics of offensiveness, we train models to generate COBRA explanations, with and without access to the context.

Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting

no code implementations24 May 2023 Akhila Yerukola, Xuhui Zhou, Elizabeth Clark, Maarten Sap

Most existing stylistic text rewriting methods and evaluation metrics operate on a sentence level, but ignoring the broader context of the text can lead to preferring generic, ambiguous, and incoherent rewrites.

Sentence

Learning to translate by learning to communicate

1 code implementation14 Jul 2022 C. M. Downey, Xuhui Zhou, Leo Z. Liu, Shane Steinert-Threlkeld

We formulate and test a technique to use Emergent Communication (EC) with a pre-trained multilingual model to improve on modern Unsupervised NMT systems, especially for low-resource languages.

Natural Language Understanding NMT

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

no code implementations NAACL 2022 Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, Noah A. Smith

The perceived toxicity of language can vary based on someone's identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases.

Extracting and Inferring Personal Attributes from Dialogue

1 code implementation NLP4ConvAI (ACL) 2022 Zhilin Wang, Xuhui Zhou, Rik Koncel-Kedziorski, Alex Marin, Fei Xia

Personal attributes represent structured information about a person, such as their hobbies, pets, family, likes and dislikes.

Attribute Language Modeling +1

Challenges in Automated Debiasing for Toxic Language Detection

2 code implementations EACL 2021 Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Noah A. Smith, Yejin Choi

Overall, our findings show that debiasing a model trained on biased toxic language data is not as effective as simply relabeling the data to remove existing biases.

Fairness text-classification +1

Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets

no code implementations EMNLP (BlackboxNLP) 2020 Chuanrong Li, Lin Shengshuo, Leo Z. Liu, Xinyi Wu, Xuhui Zhou, Shane Steinert-Threlkeld

Although large-scale pretrained language models, such as BERT and RoBERTa, have achieved superhuman performance on in-distribution test sets, their performance suffers on out-of-distribution test sets (e. g., on contrast sets).

RPD: A Distance Function Between Word Embeddings

no code implementations ACL 2020 Xuhui Zhou, Zaixiang Zheng, Shu-Jian Huang

Based on the properties of RPD, we study the relations of word embeddings of different algorithms systematically and investigate the influence of different training processes and corpora.

Word Embeddings

Evaluating Commonsense in Pre-trained Language Models

1 code implementation27 Nov 2019 Xuhui Zhou, Yue Zhang, Leyang Cui, Dandan Huang

However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension.

Language Modeling Language Modelling +2

Parallel Distributed Logistic Regression for Vertical Federated Learning without Third-Party Coordinator

no code implementations22 Nov 2019 Shengwen Yang, Bing Ren, Xuhui Zhou, Li-Ping Liu

The system is built on the pa-rameter server architecture and aims to speed up the model training via utilizing a cluster of servers in case of large volume of training data.

regression Transfer Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.