Search Results for author: Diyi Yang

Found 125 papers, 65 papers with code

Learning with Limited Text Data

no code implementations • ACL 2022 • Diyi Yang, Ankur Parikh, Colin Raffel

Natural Language Processing (NLP) has achieved great progress in the past decade on the basis of neural models, which often make use of large amounts of labeled data to achieve state-of-the-art performance.

Data Augmentation Structured Prediction +2

Paper
Add Code

“This is a Problem, Don’t You Agree?” Framing and Bias in Human Evaluation for Natural Language Generation

no code implementations • ACL (EvalNLGEval, INLG) 2020 • Stephanie Schoch, Diyi Yang, Yangfeng Ji

Despite recent efforts reviewing current human evaluation practices for natural language generation (NLG) research, the lack of reported question wording and potential for framing effects or cognitive biases influencing results has been widely overlooked.

Text Generation

Paper
Add Code

Personalized Response Generation with Tensor Factorization

no code implementations • ACL (GEM) 2021 • Zhenghui Wang, Lingxiao Luo, Diyi Yang

Personalized response generation is essential for more human-like conversations.

Language Modelling Response Generation

Paper
Add Code

Explaining Toxic Text via Knowledge Enhanced Text Generation

no code implementations • NAACL 2022 • Rohit Sridhar, Diyi Yang

Warning: This paper contains content that is offensive and may be upsetting. Biased or toxic speech can be harmful to various demographic groups.

Text Generation

Paper
Add Code

HypMix: Hyperbolic Interpolative Data Augmentation

1 code implementation • EMNLP 2021 • Ramit Sawhney, Megh Thakkar, Shivam Agarwal, Di Jin, Diyi Yang, Lucie Flek

Interpolation-based regularisation methods for data augmentation have proven to be effective for various tasks and modalities.

Adversarial Robustness Data Augmentation

Paper
Code

SPORTSINTERVIEW: A Large-Scale Sports Interview Benchmark for Entity-centric Dialogues

no code implementations • LREC 2022 • Hanfei Sun, Ziyuan Cao, Diyi Yang

We propose a novel knowledge grounded dialogue (interview) dataset SPORTSINTERVIEW set in the domain of sports interview.

Paper
Add Code

One Document, Many Revisions: A Dataset for Classification and Description of Edit Intents

no code implementations • LREC 2022 • Dheeraj Rajagopal, Xuchao Zhang, Michael Gamon, Sujay Kumar Jauhar, Diyi Yang, Eduard Hovy

Document authoring involves a lengthy revision process, marked by individual edits that are frequently linked to comments.

Paper
Add Code

Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension

no code implementations • ACL 2022 • Ying Xu, Dakuo Wang, Mo Yu, Daniel Ritchie, Bingsheng Yao, Tongshuang Wu, Zheng Zhang, Toby Li, Nora Bradford, Branda Sun, Tran Hoang, Yisi Sang, Yufang Hou, Xiaojuan Ma, Diyi Yang, Nanyun Peng, Zhou Yu, Mark Warschauer

Through benchmarking with QG models, we show that the QG model trained on FairytaleQA is capable of asking high-quality and more diverse questions.

Benchmarking Question Answering +2

Paper
Add Code

Focus on the Action: Learning to Highlight and Summarize Jointly for Email To-Do Items Summarization

no code implementations • Findings (ACL) 2022 • Kexun Zhang, Jiaao Chen, Diyi Yang

Automatic email to-do item generation is the task of generating to-do items from a given email to help people overview emails and schedule daily work.

Paper
Add Code

Frustratingly Simple but Surprisingly Strong: Using Language-Independent Features for Zero-shot Cross-lingual Semantic Parsing

1 code implementation • EMNLP 2021 • Jingfeng Yang, Federico Fancellu, Bonnie Webber, Diyi Yang

The availability of corpora has led to significant advances in training semantic parsers in English.

POS Semantic Parsing +1

Paper
Code

Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection

no code implementations • EMNLP 2020 • Jingfeng Yang, Diyi Yang, Zhaoran Ma

Existing approaches to disfluency detection heavily depend on human-annotated data.

Data Augmentation

Paper
Add Code

DMix: Adaptive Distance-aware Interpolative Mixup

1 code implementation • ACL 2022 • Ramit Sawhney, Megh Thakkar, Shrey Pandit, Ritesh Soun, Di Jin, Diyi Yang, Lucie Flek

Interpolation-based regularisation methods such as Mixup, which generate virtual training samples, have proven to be effective for various tasks and modalities. We extend Mixup and propose DMix, an adaptive distance-aware interpolative Mixup that selects samples based on their diversity in the embedding space.

Data Augmentation Sentence +1

Paper
Code

Simple Conversational Data Augmentation for Semi-supervised Abstractive Dialogue Summarization

1 code implementation • EMNLP 2021 • Jiaao Chen, Diyi Yang

Abstractive conversation summarization has received growing attention while most current state-of-the-art summarization models heavily rely on human-annotated summaries.

Abstractive Dialogue Summarization Data Augmentation

Paper
Code

WIKIBIAS: Detecting Multi-Span Subjective Biases in Language

1 code implementation • Findings (EMNLP) 2021 • Yang Zhong, Jingfeng Yang, Wei Xu, Diyi Yang

Biases continue to be prevalent in modern text and media, especially subjective bias – a special type of bias that introduces improper attitudes or presents a statement with the presupposition of truth.

Sentence

Paper
Code

Best Practices and Lessons Learned on Synthetic Data for Language Models

no code implementations • 11 Apr 2024 • Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs.

Paper
Add Code

Social Skill Training with Large Language Models

no code implementations • 5 Apr 2024 • Diyi Yang, Caleb Ziems, William Held, Omar Shaikh, Michael S. Bernstein, John Mitchell

People rely on social skills like conflict resolution to communicate effectively and to thrive in both work and personal life.

Paper
Add Code

Mapping the Increasing Use of LLMs in Scientific Papers

no code implementations • 1 Apr 2024 • Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou

To address this gap, we conduct the first systematic, large-scale analysis across 950, 965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time.

Paper
Add Code

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

no code implementations • 1 Apr 2024 • Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs?

Image Generation

Paper
Add Code

Multi-Level Feedback Generation with Large Language Models for Empowering Novice Peer Counselors

no code implementations • 21 Mar 2024 • Alicja Chaszczewicz, Raj Sanjay Shah, Ryan Louie, Bruce A Arnow, Robert Kraut, Diyi Yang

We further design a self-improvement method on top of large language models to enhance the automatic generation of feedback.

Paper
Add Code

A Safe Harbor for AI Evaluation and Red Teaming

no code implementations • 7 Mar 2024 • Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems.

Paper
Add Code

Design2Code: How Far Are We From Automating Front-End Engineering?

no code implementations • 5 Mar 2024 • Chenglei Si, Yanzhe Zhang, Zhengyuan Yang, Ruibo Liu, Diyi Yang

In this work, we formalize this as a Design2Code task and conduct comprehensive benchmarking.

Benchmarking Code Generation

Paper
Add Code

Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future

no code implementations • 28 Feb 2024 • Minzhi Li, Weiyan Shi, Caleb Ziems, Diyi Yang

As Natural Language Processing (NLP) systems become increasingly integrated into human social life, these technologies will need to increasingly rely on social intelligence.

Paper
Add Code

Unintended Impacts of LLM Alignment on Global Representation

no code implementations • 22 Feb 2024 • Michael J. Ryan, William Held, Diyi Yang

Before being deployed for user-facing applications, developers align Large Language Models (LLMs) to user preferences through a variety of procedures, such as Reinforcement Learning From Human Feedback (RLHF) and Direct Preference Optimization (DPO).

Instruction Following

Paper
Add Code

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

1 code implementation • 12 Jan 2024 • Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, Weiyan Shi

This paper introduces a new perspective to jailbreak LLMs as human-like communicators, to explore this overlooked intersection between everyday language interaction and AI safety.

185

Paper
Code

Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach

no code implementations • 16 Nov 2023 • Yanchen Liu, Mingyu Derek Ma, Wenna Qin, Azure Zhou, Jiaao Chen, Weiyan Shi, Wei Wang, Diyi Yang

Using COVID-19 as a testbed domain, our experiments demonstrate a significant alignment between the susceptibility scores estimated by our computational modeling and human judgments, confirming the effectiveness of this latent modeling approach.

Misinformation

Paper
Add Code

Grounding Gaps in Language Model Generations

no code implementations • 15 Nov 2023 • Omar Shaikh, Kristina Gligorić, Ashna Khetan, Matthias Gerstgrasser, Diyi Yang, Dan Jurafsky

To understand the roots of the identified grounding gap, we examine the role of instruction tuning and preference optimization, finding that training on contemporary preference data leads to a reduction in generated grounding acts.

Language Modelling

Paper
Add Code

A Material Lens on Coloniality in NLP

no code implementations • 14 Nov 2023 • William Held, Camille Harris, Michael Best, Diyi Yang

Coloniality, the continuation of colonial harms beyond "official" colonization, has pervasive effects across society and scientific fields.

Paper
Add Code

Task-Agnostic Low-Rank Adapters for Unseen English Dialects

1 code implementation • 2 Nov 2023 • Zedian Xiao, William Held, Yanchen Liu, Diyi Yang

Large Language Models (LLMs) are trained on corpora disproportionally weighted in favor of Standard American English.

Paper
Code

Unlearn What You Want to Forget: Efficient Unlearning for LLMs

1 code implementation • 31 Oct 2023 • Jiaao Chen, Diyi Yang

Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data, however, this process might suffer from privacy issues and violations of data protection regulations.

Paper
Code

Impressions: Understanding Visual Semiotics and Aesthetic Impact

no code implementations • 27 Oct 2023 • Julia Kruk, Caleb Ziems, Diyi Yang

We present Impressions, a novel dataset through which to investigate the semiotics of images, and how specific visual features and design choices can elicit specific emotions, thoughts and beliefs.

Image Captioning

Paper
Add Code

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

1 code implementation • 24 Oct 2023 • Minzhi Li, Taiwei Shi, Caleb Ziems, Min-Yen Kan, Nancy F. Chen, Zhengyuan Liu, Diyi Yang

Annotated data plays a critical role in Natural Language Processing (NLP) in training models and evaluating their performance.

text annotation

Paper
Code

CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations

1 code implementation • 17 Oct 2023 • Myra Cheng, Tiziano Piccardi, Diyi Yang

Recent work has aimed to capture nuances of human behavior by using LLMs to simulate responses from particular demographics in settings like social science experiments and public opinion surveys.

Caricature

Paper
Code

"Mistakes Help Us Grow": Facilitating and Evaluating Growth Mindset Supportive Language in Classrooms

no code implementations • 16 Oct 2023 • Kunal Handa, Margaret Clapper, Jessica Boyle, Rose E Wang, Diyi Yang, David S Yeager, Dorottya Demszky

Teachers' growth mindset supportive language (GMSL)--rhetoric emphasizing that one's skills can be improved over time--has been shown to significantly reduce disparities in academic achievement and enhance students' learning outcomes.

Paper
Add Code

Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency

no code implementations • 10 Oct 2023 • Eric Zelikman, Wanjing Anya Ma, Jasmine E. Tran, Diyi Yang, Jason D. Yeatman, Nick Haber

Developing an educational test can be expensive and time-consuming, as each item must be written by experts and then evaluated by collecting hundreds of student responses.

Language Modelling Sentence

Paper
Add Code

MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways

no code implementations • 4 Oct 2023 • Mingyu Derek Ma, Alexander K. Taylor, Nuan Wen, Yanchen Liu, Po-Nien Kung, Wenna Qin, Shicheng Wen, Azure Zhou, Diyi Yang, Xuezhe Ma, Nanyun Peng, Wei Wang

We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights, including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information.

Paper
Add Code

Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization

1 code implementation • 3 Oct 2023 • Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang

We further design an automatic agent team optimization algorithm based on an unsupervised metric termed $\textit{Agent Importance Score}$, enabling the selection of best agents based on the contribution each agent makes.

Code Generation Language Modelling +2

Paper
Code

DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks

1 code implementation • 29 Sep 2023 • Kaijie Zhu, Jiaao Chen, Jindong Wang, Neil Zhenqiang Gong, Diyi Yang, Xing Xie

Moreover, DyVal-generated samples are not only evaluation sets, but also helpful data for fine-tuning to improve the performance of LLMs on existing benchmarks.

Logical Reasoning

1,956

Paper
Code

Rehearsal: Simulating Conflict to Teach Conflict Resolution

no code implementations • 21 Sep 2023 • Omar Shaikh, Valentino Chai, Michele J. Gelfand, Diyi Yang, Michael S. Bernstein

Compared to a control group with lecture material covering the same IRP theory, participants with simulated training from Rehearsal significantly improved their performance in the unaided conflict: they reduced their use of escalating competitive strategies by an average of 67%, while doubling their use of cooperative strategies.

counterfactual Language Modelling +1

Paper
Add Code

Anchor Points: Benchmarking Models with Much Fewer Examples

1 code implementation • 14 Sep 2023 • Rajan Vivek, Kawin Ethayarajh, Diyi Yang, Douwe Kiela

Moreover, just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error, sufficient for gauging where the model is likely to fail.

Benchmarking Language Modelling

Paper
Code

Identifying and Mitigating the Security Risks of Generative AI

no code implementations • 28 Aug 2023 • Clark Barrett, Brad Boyd, Elie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang

However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks.

Code Completion In-Context Learning +1

Paper
Add Code

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

1 code implementation • 29 Jun 2023 • Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun

Instruction tuning unlocks the superior capability of Large Language Models (LLM) to interact with humans.

16k Image Captioning +3

230

Paper
Code

Modeling Cross-Cultural Pragmatic Inference with Codenames Duet

1 code implementation • 4 Jun 2023 • Omar Shaikh, Caleb Ziems, William Held, Aryan J. Pariani, Fred Morstatter, Diyi Yang

Prior work uses simple reference games to test models of pragmatic reasoning, often with unidentified speakers and listeners.

Paper
Code

Forgotten Knowledge: Examining the Citational Amnesia in NLP

no code implementations • 29 May 2023 • Janvijay Singh, Mukund Rungta, Diyi Yang, Saif M. Mohammad

Citing papers is the primary method through which modern scientific writing discusses and builds on past work.

Paper
Add Code

Training Socially Aligned Language Models on Simulated Social Interactions

1 code implementation • 26 May 2023 • Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M. Dai, Diyi Yang, Soroush Vosoughi

Social alignment in AI systems aims to ensure that these models behave according to established societal values.

322

Paper
Code

TADA: Task-Agnostic Dialect Adapters for English

1 code implementation • 26 May 2023 • Will Held, Caleb Ziems, Diyi Yang

Large Language Models, the dominant starting point for Natural Language Processing (NLP) applications, fail at a higher rate for speakers of English dialects other than Standard American English (SAE).

Data Augmentation

Paper
Code

NormBank: A Knowledge Bank of Situational Social Norms

1 code implementation • 26 May 2023 • Caleb Ziems, Jane Dwivedi-Yu, Yi-Chia Wang, Alon Halevy, Diyi Yang

We present NormBank, a knowledge bank of 155k situational norms.

Paper
Code

Benchmarking LLM-based Machine Translation on Cultural Awareness

no code implementations • 23 May 2023 • Binwei Yao, Ming Jiang, Diyi Yang, Junjie Hu

Furthermore, we devise a novel evaluation metric to assess the understandability of translations in a reference-free manner by GPT-4.

Benchmarking In-Context Learning +3

Paper
Add Code

DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules

1 code implementation • 22 May 2023 • Yanchen Liu, William Held, Diyi Yang

We show that DADA is effective for both single task and instruction finetuned language models, offering an extensible and interpretable framework for adapting existing LLMs to different English dialects.

Dialect Identification

Paper
Code

Helping the Helper: Supporting Peer Counselors via AI-Empowered Practice and Feedback

no code implementations • 15 May 2023 • Shang-Ling Hsu, Raj Sanjay Shah, Prathik Senthil, Zahra Ashktorab, Casey Dugan, Werner Geyer, Diyi Yang

Millions of users come to online peer counseling platforms to seek support on diverse topics ranging from relationship stress to anxiety.

Text Generation

Paper
Add Code

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

no code implementations • 2 May 2023 • Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees Van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai, Chris van der Lee, Yiru Li, Saad Mahamood, Margot Mieskes, Emiel van Miltenburg, Pablo Mosteiro, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Jie Ruan, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang

We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible.

Paper
Add Code

Can Large Language Models Transform Computational Social Science?

1 code implementation • 12 Apr 2023 • Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang

We conclude that the performance of today's LLMs can augment the CSS research pipeline in two ways: (1) serving as zero-shot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e. g., explaining the underlying attributes of a text).

Persuasiveness

Paper
Code

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

1 code implementation • 10 Apr 2023 • Jiaao Chen, Aston Zhang, Mu Li, Alex Smola, Diyi Yang

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation.

Denoising Image Generation +1

Paper
Code

Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints

1 code implementation • 17 Feb 2023 • Albert Lu, Hongxin Zhang, Yanzhe Zhang, Xuezhi Wang, Diyi Yang

The limits of open-ended generative models are unclear, yet increasingly important.

Text Generation

Paper
Code

Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

1 code implementation • 8 Feb 2023 • Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, Diyi Yang

Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot -- i. e., without adaptation on downstream data.

Arithmetic Reasoning Zero-Shot Learning

207

Paper
Code

Auditing Gender Presentation Differences in Text-to-Image Models

1 code implementation • 7 Feb 2023 • Yanzhe Zhang, Lu Jiang, Greg Turk, Diyi Yang

Text-to-image models, which can generate high-quality images based on textual input, have recently enabled various content-creation tools.

Paper
Code

Parameter-Efficient Fine-Tuning Design Spaces

no code implementations • 4 Jan 2023 • Jiaao Chen, Aston Zhang, Xingjian Shi, Mu Li, Alex Smola, Diyi Yang

We discover the following design patterns: (i) group layers in a spindle pattern; (ii) allocate the number of trainable parameters to layers uniformly; (iii) tune all the groups; (iv) assign proper tuning strategies to different groups.

Paper
Add Code

Human-in-the-loop Abstractive Dialogue Summarization

no code implementations • 19 Dec 2022 • Jiaao Chen, Mohan Dodda, Diyi Yang

Specifically, we ask humans to highlight the salient information to be included in summaries to provide the local feedback , and to make overall comparisons among summaries in terms of coherence, accuracy, coverage, concise and overall quality, as the global feedback.

Abstractive Dialogue Summarization

Paper
Add Code

Werewolf Among Us: A Multimodal Dataset for Modeling Persuasion Behaviors in Social Deduction Games

no code implementations • 16 Dec 2022 • Bolin Lai, Hongxin Zhang, Miao Liu, Aryan Pariani, Fiona Ryan, Wenqi Jia, Shirley Anugrah Hayati, James M. Rehg, Diyi Yang

We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes.

Persuasion Strategies

Paper
Add Code

Multi-VALUE: A Framework for Cross-Dialectal English NLP

no code implementations • 15 Dec 2022 • Caleb Ziems, William Held, Jingfeng Yang, Jwala Dhamala, Rahul Gupta, Diyi Yang

First, we use this system to stress tests question answering, machine translation, and semantic parsing.

Data Augmentation Machine Translation +3

Paper
Add Code

DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue

1 code implementation • 15 Dec 2022 • William Held, Christopher Hidey, Fei Liu, Eric Zhu, Rahul Goel, Diyi Yang, Rushin Shah

Modern virtual assistants use internal semantic parsing engines to convert user utterances to actionable commands.

Semantic Parsing XLM-R

Paper
Code

On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning

1 code implementation • 15 Dec 2022 • Omar Shaikh, Hongxin Zhang, William Held, Michael Bernstein, Diyi Yang

Generating a Chain of Thought (CoT) has been shown to consistently improve large language model (LLM) performance on a wide range of NLP tasks.

Instruction Following Language Modelling +2

Paper
Code

ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

1 code implementation • CVPR 2023 • James Seale Smith, Paola Cascante-Bonilla, Assaf Arbelle, Donghyun Kim, Rameswar Panda, David Cox, Diyi Yang, Zsolt Kira, Rogerio Feris, Leonid Karlinsky

This leads to reasoning mistakes, which need to be corrected as they occur by teaching VL models the missing SVLC skills; often this must be done using private data where the issue was found, which naturally leads to a data-free continual (no task-id) VL learning setting.

Paper
Code

Modeling Motivational Interviewing Strategies On An Online Peer-to-Peer Counseling Platform

no code implementations • 9 Nov 2022 • Raj Sanjay Shah, Faye Holt, Shirley Anugrah Hayati, Aastha Agarwal, Yi-Chia Wang, Robert E. Kraut, Diyi Yang

This work provides a deeper understanding of the use of motivational interviewing techniques on peer-to-peer counselor platforms and sheds light on how to build better training programs for volunteer counselors on online platforms.

Paper
Add Code

WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model for Financial Domain

no code implementations • 31 Oct 2022 • Raj Sanjay Shah, Kunal Chawla, Dheeraj Eidnani, Agam Shah, Wendi Du, Sudheer Chava, Natraj Raman, Charese Smiley, Jiaao Chen, Diyi Yang

To this end, we contribute the Financial Language Understanding Evaluation (FLUE), an open-source comprehensive suite of benchmarks for the financial domain.

FLUE Language Modelling

Paper
Add Code

Geographic Citation Gaps in NLP Research

1 code implementation • 26 Oct 2022 • Mukund Rungta, Janvijay Singh, Saif M. Mohammad, Diyi Yang

Similar disparities are also believed to exist for paper citation counts.

Paper
Code

Robustness of Demonstration-based Learning Under Limited Data Scenario

1 code implementation • 19 Oct 2022 • Hongxin Zhang, Yanzhe Zhang, Ruiyi Zhang, Diyi Yang

Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.

Few-shot NER

Paper
Code

Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers

no code implementations • 11 Oct 2022 • William Held, Diyi Yang

However, as a fixed-size model acquires more languages, its performance across all languages degrades, a phenomenon termed interference.

Sentence Sentence Classification

Paper
Add Code

DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification

1 code implementation • COLING 2022 • Hui Chen, Wei Han, Diyi Yang, Soujanya Poria

This paper proposes a simple yet effective interpolation-based data augmentation approach termed DoubleMix, to improve the robustness of models in text classification.

Sentence Text Augmentation +2

Paper
Code

A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch

no code implementations • 5 Aug 2022 • Patsorn Sangkloy, Wittawat Jitkrittum, Diyi Yang, James Hays

We empirically demonstrate that using an input sketch (even a poorly drawn one) in addition to text considerably increases retrieval recall compared to traditional text-based image retrieval.

Image Retrieval Retrieval

Paper
Add Code

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

2,641

Paper
Code

SeqZero: Few-shot Compositional Semantic Parsing with Sequential Prompts and Zero-shot Models

1 code implementation • Findings (NAACL) 2022 • Jingfeng Yang, Haoming Jiang, Qingyu Yin, Danqing Zhang, Bing Yin, Diyi Yang

SeqZero achieves SOTA performance of BART-based models on GeoQuery and EcommerceQuery, which are two few-shot datasets with compositional data split.

Out-of-Distribution Generalization Semantic Parsing

Paper
Code

TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding

1 code implementation • NAACL 2022 • Le Zhang, Zichao Yang, Diyi Yang

Data augmentation is an effective approach to tackle over-fitting.

Constituency Parsing Data Augmentation +3

Paper
Code

SUBS: Subtree Substitution for Compositional Semantic Parsing

1 code implementation • NAACL 2022 • Jingfeng Yang, Le Zhang, Diyi Yang

Although sequence-to-sequence models often achieve good performance in semantic parsing for i. i. d.

Data Augmentation Semantic Parsing

Paper
Code

Inducing Positive Perspectives with Text Reframing

1 code implementation • ACL 2022 • Caleb Ziems, Minzhi Li, Anthony Zhang, Diyi Yang

Sentiment transfer is one popular example of a text style transfer task, where the goal is to reverse the sentiment polarity of a text.

Sentence Style Transfer +1

Paper
Code

The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems

2 code implementations • ACL 2022 • Caleb Ziems, Jane A. Yu, Yi-Chia Wang, Alon Halevy, Diyi Yang

In this work, we introduce a new resource, not to authoritatively resolve moral ambiguities, but instead to facilitate systematic understanding of the intuitions, values and moral judgments reflected in the utterances of dialogue systems.

Attribute Benchmarking

Paper
Code

VALUE: Understanding Dialect Disparity in NLU

1 code implementation • ACL 2022 • Caleb Ziems, Jiaao Chen, Camille Harris, Jessica Anderson, Diyi Yang

To understand disparities in current models and to facilitate more dialect-competent NLU systems, we introduce the VernAcular Language Understanding Evaluation (VALUE) benchmark, a challenging variant of GLUE that we created with a set of lexical and morphosyntactic transformation rules.

Linguistic Acceptability Natural Language Understanding

Paper
Code

Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension

1 code implementation • 26 Mar 2022 • Ying Xu, Dakuo Wang, Mo Yu, Daniel Ritchie, Bingsheng Yao, Tongshuang Wu, Zheng Zhang, Toby Jia-Jun Li, Nora Bradford, Branda Sun, Tran Bao Hoang, Yisi Sang, Yufang Hou, Xiaojuan Ma, Diyi Yang, Nanyun Peng, Zhou Yu, Mark Warschauer

Through benchmarking with QG models, we show that the QG model trained on FairytaleQA is capable of asking high-quality and more diverse questions.

Ranked #1 on Question Generation on FairytaleQA

Benchmarking Question Answering +2

Paper
Code

Leveraging Expert Guided Adversarial Augmentation For Improving Generalization in Named Entity Recognition

1 code implementation • Findings (ACL) 2022 • Aaron Reich, Jiaao Chen, Aastha Agrawal, Yanzhe Zhang, Diyi Yang

We found that state-of-the-art NER systems trained on CoNLL 2003 training data drop performance dramatically on our challenging set.

Domain Generalization named-entity-recognition +2

Paper
Code

Continual Sequence Generation with Adaptive Compositional Modules

2 code implementations • ACL 2022 • Yanzhe Zhang, Xuezhi Wang, Diyi Yang

Continual learning is essential for real-world deployment when there is a need to quickly adapt the model to new tasks without forgetting knowledge of old tasks.

Continual Learning Transfer Learning

Paper
Code

Measure and Improve Robustness in NLP Models: A Survey

no code implementations • NAACL 2022 • Xuezhi Wang, Haohan Wang, Diyi Yang

Despite robustness being an increasingly studied topic, it has been separately explored in applications like vision and NLP, with various definitions, evaluation and mitigation strategies in multiple lines of research.

Paper
Add Code

Interpreting Deep Learning Models in Natural Language Processing: A Review

no code implementations • 20 Oct 2021 • Xiaofei Sun, Diyi Yang, Xiaoya Li, Tianwei Zhang, Yuxian Meng, Han Qiu, Guoyin Wang, Eduard Hovy, Jiwei Li

Neural network models have achieved state-of-the-art performances in a wide range of natural language processing (NLP) tasks.

Paper
Add Code

Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models

1 code implementation • Findings (NAACL) 2022 • Tianlu Wang, Rohit Sridhar, Diyi Yang, Xuezhi Wang

Recently, NLP models have achieved remarkable progress across a variety of tasks; however, they have also been criticized for being not robust.

Paper
Code

GNN is a Counter? Revisiting GNN for Question Answering

no code implementations • ICLR 2022 • Kuan Wang, Yuyu Zhang, Diyi Yang, Le Song, Tao Qin

To open the black box of GNN and investigate these problems, we dissect state-of-the-art GNN modules for QA and analyze their reasoning capability.

Ranked #12 on Question Answering on OpenBookQA

Knowledge Graphs Question Answering

Paper
Add Code

Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework

1 code implementation • 27 Sep 2021 • Matan Halevy, Camille Harris, Amy Bruckman, Diyi Yang, Ayanna Howard

While previous work has focused on a single fairness criteria, we propose to use additional descriptive fairness metrics to better understand the source of these biases.

Descriptive Fairness

Paper
Code

Semantic Categorization of Social Knowledge for Commonsense Question Answering

1 code implementation • EMNLP (sustainlp) 2021 • Gengyu Wang, Xiaochen Hou, Diyi Yang, Kathleen McKeown, Jing Huang

Large pre-trained language models (PLMs) have led to great success on various commonsense question answering (QA) tasks in an end-to-end fashion.

Question Answering

Paper
Code

Latent Hatred: A Benchmark for Understanding Implicit Hate Speech

1 code implementation • EMNLP 2021 • Mai ElSherief, Caleb Ziems, David Muchlinski, Vaishnavi Anupindi, Jordyn Seybolt, Munmun De Choudhury, Diyi Yang

Hate speech has grown significantly on social media, causing serious consequences for victims of all demographics.

Paper
Code

To Protect and To Serve? Analyzing Entity-Centric Framing of Police Violence

no code implementations • Findings (EMNLP) 2021 • Caleb Ziems, Diyi Yang

Framing has significant but subtle effects on public opinion and policy.

Paper
Add Code

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

1 code implementation • 2 Sep 2021 • Amir Feder, Katherine A. Keith, Emaad Manzoor, Reid Pryzant, Dhanya Sridhar, Zach Wood-Doughty, Jacob Eisenstein, Justin Grimmer, Roi Reichart, Margaret E. Roberts, Brandon M. Stewart, Victor Veitch, Diyi Yang

A fundamental goal of scientific research is to learn about causal relationships.

Causal Inference Fairness

Paper
Code

A Search Engine for Discovery of Scientific Challenges and Directions

1 code implementation • NeurIPS Workshop AI4Scien 2021 • Dan Lahav, Jon Saad Falcon, Bailey Kuehl, Sophie Johnson, Sravanthi Parasa, Noam Shomron, Duen Horng Chau, Diyi Yang, Eric Horvitz, Daniel S. Weld, Tom Hope

To address this problem, we present a novel task of extraction and search of scientific challenges and directions, to facilitate rapid knowledge discovery.

Paper
Code

Linguistic Characterization of Divisive Topics Online: Case Studies on Contentiousness in Abortion, Climate Change, and Gun Control

no code implementations • 30 Aug 2021 • Jacob Beel, Tong Xiang, Sandeep Soni, Diyi Yang

As public discourse continues to move and grow online, conversations about divisive topics on social media platforms have also increased.

Paper
Add Code

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalizability

1 code implementation • ACL 2021 • Jiaao Chen, Dinghan Shen, Weizhu Chen, Diyi Yang

Fine-tuning large pre-trained models with task-specific data has achieved great success in NLP.

Data Augmentation Natural Language Understanding

Paper
Code

Quantifying the Impact of Human Capital, Job History, and Language Factors on Job Seniority with a Large-scale Analysis of Resumes

no code implementations • 15 Jun 2021 • Austin P Wright, Caleb Ziems, Haekyu Park, Jon Saad-Falcon, Duen Horng Chau, Diyi Yang, Maria Tomprou

As job markets worldwide have become more competitive and applicant selection criteria have become more opaque, and different (and sometimes contradictory) information and advice is available for job seekers wishing to progress in their careers, it has never been more difficult to determine which factors in a r\'esum\'e most effectively help career progression.

Paper
Add Code

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

no code implementations • 14 Jun 2021 • Jiaao Chen, Derek Tam, Colin Raffel, Mohit Bansal, Diyi Yang

NLP has achieved great progress in the past decade through the use of neural models and large labeled datasets.

Data Augmentation News Classification +1

Paper
Add Code

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

2 code implementations • Findings (ACL) 2021 • Aditya Gupta, Jiacheng Xu, Shyam Upadhyay, Diyi Yang, Manaal Faruqui

Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation.

Data Augmentation Question Answering +1

Paper
Code

The Importance of Modeling Social Factors of Language: Theory and Practice

no code implementations • NAACL 2021 • Dirk Hovy, Diyi Yang

We show that current NLP systems systematically break down when faced with interpreting the social factors of language.

Paper
Add Code

Personalized Response Generation via Generative Split Memory Network

1 code implementation • NAACL 2021 • Yuwei Wu, Xuezhe Ma, Diyi Yang

Despite the impressive successes of generation and dialogue systems, how to endow a text generation system with particular personality traits to deliver more personalized responses remains under-investigated.

Response Generation Text Generation

Paper
Code

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization

1 code implementation • 31 May 2021 • Jiaao Chen, Dinghan Shen, Weizhu Chen, Diyi Yang

Fine-tuning large pre-trained models with task-specific data has achieved great success in NLP.

Data Augmentation Natural Language Understanding

Paper
Code

Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs

1 code implementation • NAACL 2021 • Jiaao Chen, Diyi Yang

Abstractive conversation summarization has received much attention recently.

Paper
Code

Continual Learning for Text Classification with Information Disentanglement Based Regularization

1 code implementation • NAACL 2021 • Yufan Huang, Yanzhe Zhang, Jiaao Chen, Xuezhi Wang, Diyi Yang

Continual learning has become increasingly important as it enables NLP models to constantly learn and gain knowledge over time.

Continual Learning Disentanglement +4

Paper
Code

Putting Humans in the Natural Language Processing Loop: A Survey

no code implementations • EACL (HCINLP) 2021 • Zijie J. Wang, Dongjin Choi, Shenyu Xu, Diyi Yang

How can we design Natural Language Processing (NLP) systems that learn from human feedback?

Paper
Add Code

RECAST: Enabling User Recourse and Interpretability of Toxicity Detection Models with Interactive Visualization

no code implementations • 8 Feb 2021 • Austin P Wright, Omar Shaikh, Haekyu Park, Will Epperson, Muhammed Ahmed, Stephane Pinel, Duen Horng Chau, Diyi Yang

With the widespread use of toxic language online, platforms are increasingly using automated systems that leverage advances in natural language processing to automatically flag and remove toxic comments.

Paper
Add Code

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

no code implementations • ACL (GEM) 2021 • Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics.

Ranked #1 on Extreme Summarization on GEM-XSum

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +5

Paper
Add Code

Weakly-Supervised Hierarchical Models for Predicting Persuasive Strategies in Good-faith Textual Requests

1 code implementation • 16 Jan 2021 • Jiaao Chen, Diyi Yang

Modeling persuasive language has the potential to better facilitate our decision-making processes.

Decision Making Sentence

Paper
Code

Tuiteamos o pongamos un tuit? Investigating the Social Constraints of Loanword Integration in Spanish Social Media

no code implementations • SCiL 2021 • Ian Stewart, Diyi Yang, Jacob Eisenstein

In social media, we find that speaker background and expectations of formality explain loanword and native word integration, such that authors who use more Spanish and who write to a wider audience tend to use integrated verb forms more often.

Paper
Add Code

Semi-supervised Formality Style Transfer using Language Model Discriminator and Mutual Information Maximization

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Kunal Chawla, Diyi Yang

Formality style transfer is the task of converting informal sentences to grammatically-correct formal sentences, which can be used to improve performance of many downstream NLP tasks.

Formality Style Transfer Language Modelling +4

Paper
Code

Examining the Ordering of Rhetorical Strategies in Persuasive Requests

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Omar Shaikh, Jiaao Chen, Jon Saad-Falcon, Duen Horng Chau, Diyi Yang

We find that specific (orderings of) strategies interact uniquely with a request's content to impact success rate, and thus the persuasiveness of a request.

Persuasiveness

Paper
Code

Local Additivity Based Data Augmentation for Semi-supervised NER

1 code implementation • EMNLP 2020 • Jiaao Chen, Zhenghui Wang, Ran Tian, Zichao Yang, Diyi Yang

Named Entity Recognition (NER) is one of the first stages in deep language understanding yet current NER models heavily rely on human-annotated data.

Data Augmentation named-entity-recognition +3

Paper
Code

Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization

1 code implementation • EMNLP 2020 • Jiaao Chen, Diyi Yang

Text summarization is one of the most challenging and interesting problems in NLP.

Abstractive Dialogue Summarization Text Summarization

Paper
Code

Evaluating Graph Vulnerability and Robustness using TIGER

1 code implementation • 10 Jun 2020 • Scott Freitas, Diyi Yang, Srijan Kumar, Hanghang Tong, Duen Horng Chau

By democratizing the tools required to study network robustness, our goal is to assist researchers and practitioners in analyzing their own networks; and facilitate the development of new research in the field.

144

Paper
Code

Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media during the COVID-19 Crisis

1 code implementation • 25 May 2020 • Bing He, Caleb Ziems, Sandeep Soni, Naren Ramakrishnan, Diyi Yang, Srijan Kumar

The spread of COVID-19 has sparked racism and hate on social media targeted towards Asian communities.

Paper
Code

ToTTo: A Controlled Table-To-Text Generation Dataset

1 code implementation • EMNLP 2020 • Ankur P. Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das

We present ToTTo, an open-domain English table-to-text dataset with over 120, 000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description.

Ranked #3 on Data-to-Text Generation on ToTTo

Conditional Text Generation Data-to-Text Generation +2

419

Paper
Code

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

2 code implementations • ACL 2020 • Jiaao Chen, Zichao Yang, Diyi Yang

This paper presents MixText, a semi-supervised learning method for text classification, which uses our newly designed data augmentation method called TMix.

Data Augmentation General Classification +1

342

Paper
Code

Semi-Supervised Models via Data Augmentationfor Classifying Interactive Affective Responses

1 code implementation • 23 Apr 2020 • Jiaao Chen, Yuwei Wu, Diyi Yang

We present semi-supervised models with data augmentation (SMDA), a semi-supervised text classification system to classify interactive affective responses.

Data Augmentation Semi-Supervised Text Classification +2

Paper
Code

RECAST: Interactive Auditing of Automatic Toxicity Detection Models

no code implementations • 7 Jan 2020 • Austin P. Wright, Omar Shaikh, Haekyu Park, Will Epperson, Muhammed Ahmed, Stephane Pinel, Diyi Yang, Duen Horng Chau

As toxic language becomes nearly pervasive online, there has been increasing interest in leveraging the advancements in natural language processing (NLP), from very large transformer models to automatically detecting and removing toxic comments.

Adversarial Robustness Fairness

Paper
Add Code

Automatically Neutralizing Subjective Bias in Text

1 code implementation • 21 Nov 2019 • Reid Pryzant, Richard Diehl Martinez, Nathan Dass, Sadao Kurohashi, Dan Jurafsky, Diyi Yang

To address this issue, we introduce a novel testbed for natural language generation: automatically bringing inappropriately subjective text into a neutral point of view ("neutralizing" biased text).

Sentence Text Generation

189

Paper
Code

Characterizing Collective Attention via Descriptor Context: A Case Study of Public Discussions of Crisis Events

1 code implementation • 19 Sep 2019 • Ian Stewart, Diyi Yang, Jacob Eisenstein

But according to rationalist models of natural language communication, the collective salience of each entity will be expressed not only in how often it is mentioned, but in the form that those mentions take.

Paper
Code

Let's Make Your Request More Persuasive: Modeling Persuasive Strategies via Semi-Supervised Neural Nets on Crowdfunding Platforms

no code implementations • NAACL 2019 • Diyi Yang, Jiaao Chen, Zichao Yang, Dan Jurafsky, Eduard Hovy

Modeling what makes a request persuasive - eliciting the desired response from a reader - is critical to the study of propaganda, behavioral economics, and advertising.

Persuasiveness Sentence

Paper
Add Code

Identifying Semantic Edit Intentions from Revisions in Wikipedia

no code implementations • EMNLP 2017 • Diyi Yang, Aaron Halfaker, Robert Kraut, Eduard Hovy

Most studies on human editing focus merely on syntactic revision operations, failing to capture the intentions behind revision changes, which are essential for facilitating the single and collaborative writing process.

Information Retrieval Lexical Simplification +2