Search Results for author: Yulia Tsvetkov

Found 128 papers, 67 papers with code

Threat Scenarios and Best Practices to Detect Neural Fake News

1 code implementation • COLING 2022 • Artidoro Pagnoni, Martin Graciarena, Yulia Tsvetkov

In this work, we discuss different threat scenarios from neural fake news generated by state-of-the-art language models.

Out-of-Distribution Generalization Text Detection

Paper
Code

What Code-Switching Strategies are Effective in Dialog Systems?

no code implementations • SCiL 2020 • Emily Ahn, Cecilia Jimenez, Yulia Tsvetkov, Alan W Black

Paper
Add Code

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

no code implementations • 10 Apr 2024 • Yu Ying Chiu, Liwei Jiang, Maria Antoniak, Chan Young Park, Shuyue Stella Li, Mehar Bhatia, Sahithya Ravi, Yulia Tsvetkov, Vered Shwartz, Yejin Choi

Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions, that modern LLMs fail at, in a gamified manner.

Paper
Add Code

DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages

1 code implementation • 16 Mar 2024 • Fahim Faisal, Orevaoghene Ahia, Aarohi Srivastava, Kabir Ahuja, David Chiang, Yulia Tsvetkov, Antonios Anastasopoulos

This allows for a comprehensive evaluation of NLP system performance on different language varieties.

Paper
Code

Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs

1 code implementation • 5 Mar 2024 • Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim, Yulia Tsvetkov, Yejin Choi, Sherif Saad, Santu Rana

In this paper, we introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent, compared to what is revealed by prompting the target model with the training data directly, which is the dominant approach of quantifying memorization in LLMs.

Memorization

Paper
Code

Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers

1 code implementation • 27 Feb 2024 • Roy Xie, Orevaoghene Ahia, Yulia Tsvetkov, Antonios Anastasopoulos

Identifying linguistic differences between dialects of a language often requires expert knowledge and meticulous human analysis.

Paper
Code

Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks

1 code implementation • 18 Feb 2024 • Yichen Wang, Shangbin Feng, Abe Bohan Hou, Xiao Pu, Chao Shen, Xiaoming Liu, Yulia Tsvetkov, Tianxing He

Our experiments reveal that almost none of the existing detectors remain robust under all the attacks, and all detectors exhibit different loopholes.

Paper
Code

DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection

no code implementations • 16 Feb 2024 • Herun Wan, Shangbin Feng, Zhaoxuan Tan, Heng Wang, Yulia Tsvetkov, Minnan Luo

Large language models are limited by challenges in factuality and hallucinations to be directly employed off-the-shelf for judging the veracity of news articles, where factual accuracy is paramount.

Misinformation

Paper
Add Code

Do Membership Inference Attacks Work on Large Language Models?

1 code implementation • 12 Feb 2024 • Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, Hannaneh Hajishirzi

Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data.

Membership Inference Attack

Paper
Code

Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration

no code implementations • 1 Feb 2024 • Shangbin Feng, Weijia Shi, Yike Wang, Wenxuan Ding, Vidhisha Balachandran, Yulia Tsvetkov

Despite efforts to expand the knowledge of large language models (LLMs), knowledge gaps -- missing or outdated information in LLMs -- might always persist given the evolving nature of knowledge.

Retrieval

Paper
Add Code

What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

no code implementations • 1 Feb 2024 • Shangbin Feng, Herun Wan, Ningnan Wang, Zhaoxuan Tan, Minnan Luo, Yulia Tsvetkov

Social media bot detection has always been an arms race between advancements in machine learning bot detectors and adversarial bot strategies to evade detection.

Paper
Add Code

Tuning Language Models by Proxy

1 code implementation • 16 Jan 2024 • Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith

Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors.

Domain Adaptation Math +1

Paper
Code

Fine-grained Hallucination Detection and Editing for Language Models

no code implementations • 12 Jan 2024 • Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Graham Neubig, Yulia Tsvetkov, Hannaneh Hajishirzi

On our benchmark, our automatic and human evaluations show that FAVA significantly outperforms ChatGPT and GPT-4 on fine-grained hallucination detection, and edits suggested by FAVA improve the factuality of LM-generated text.

Hallucination Retrieval

Paper
Add Code

P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models

no code implementations • 16 Nov 2023 • YuHan Liu, Shangbin Feng, Xiaochuang Han, Vidhisha Balachandran, Chan Young Park, Sachin Kumar, Yulia Tsvetkov

In this work, we take a first step towards designing summarization systems that are faithful to the author's intent, not only the semantic content of the article.

News Summarization

Paper
Add Code

Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions

no code implementations • 13 Nov 2023 • Sachin Kumar, Chan Young Park, Yulia Tsvetkov

GEN-Z is generative, as it measures the LM likelihood of input text, conditioned on natural language descriptions of labels.

Language Modelling text-classification +3

Paper
Add Code

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

no code implementations • 27 Oct 2023 • Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou, Yulia Tsvetkov, Maarten Sap, Reza Shokri, Yejin Choi

The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.)

Privacy Preserving

Paper
Add Code

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

1 code implementation • 17 Oct 2023 • Melanie Sclar, Yejin Choi, Yulia Tsvetkov, Alane Suhr

In this work, we focus on LLM sensitivity to a quintessential class of meaning-preserving design choices: prompt formatting.

Language Modelling

Paper
Code

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models

1 code implementation • 15 Oct 2023 • Yuyang Bai, Shangbin Feng, Vidhisha Balachandran, Zhaoxuan Tan, Shiqi Lou, Tianxing He, Yulia Tsvetkov

To gain a better understanding of LLMs' knowledge abilities and their generalization, we evaluate 10 open-source and black-box LLMs on the KGQuiz benchmark across the five knowledge-intensive tasks and knowledge domains.

Multiple-choice World Knowledge

Paper
Code

MatFormer: Nested Transformer for Elastic Inference

2 code implementations • 11 Oct 2023 • Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, KaiFeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain

Furthermore, we observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.

Language Modelling

2,983

Paper
Code

On the Zero-Shot Generalization of Machine-Generated Text Detectors

no code implementations • 8 Oct 2023 • Xiao Pu, Jingyu Zhang, Xiaochuang Han, Yulia Tsvetkov, Tianxing He

The rampant proliferation of large language models, fluent enough to generate text indistinguishable from human-written language, gives unprecedented importance to the detection of machine-generated text.

Zero-shot Generalization

Paper
Add Code

SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation

1 code implementation • 6 Oct 2023 • Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, Yulia Tsvetkov

Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design.

Sentence Text Generation

Paper
Code

Knowledge Crosswords: Geometric Reasoning over Structured Knowledge with Large Language Models

1 code implementation • 2 Oct 2023 • Wenxuan Ding, Shangbin Feng, YuHan Liu, Zhaoxuan Tan, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov

We additionally propose two new approaches, Staged Prompting and Verify-All, to augment LLMs' ability to backtrack and verify structured constraints.

Paper
Code

Resolving Knowledge Conflicts in Large Language Models

1 code implementation • 2 Oct 2023 • Yike Wang, Shangbin Feng, Heng Wang, Weijia Shi, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov

To this end, we introduce KNOWLEDGE CONFLICT, an evaluation framework for simulating contextual knowledge conflicts and quantitatively evaluating to what extent LLMs achieve these goals.

Paper
Code

LatticeGen: A Cooperative Framework which Hides Generated Text in a Lattice for Privacy-Aware Generation on Cloud

no code implementations • 29 Sep 2023 • Mengke Zhang, Tianxing He, Tianle Wang, Lu Mi, FatemehSadat Mireshghallah, Binyi Chen, Hao Wang, Yulia Tsvetkov

In the current user-server interaction paradigm of prompted generation with large language models (LLM) on cloud, the server fully controls the generation process, which leaves zero options for users who want to keep the generated text to themselves.

Paper
Add Code

Understanding In-Context Learning via Supportive Pretraining Data

no code implementations • 26 Jun 2023 • Xiaochuang Han, Daniel Simig, Todor Mihaylov, Yulia Tsvetkov, Asli Celikyilmaz, Tianlu Wang

We observe that a continued pretraining on this small subset significantly improves the model's ICL ability, by up to 18%.

In-Context Learning

Paper
Add Code

Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker

no code implementations • 1 Jun 2023 • Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, Yulia Tsvetkov

We present SymbolicToM, a plug-and-play approach to reason about the belief states of multiple characters in reading comprehension tasks via explicit symbolic representation.

Reading Comprehension

Paper
Add Code

Examining risks of racial biases in NLP tools for child protective services

no code implementations • 30 May 2023 • Anjalie Field, Amanda Coston, Nupoor Gandhi, Alexandra Chouldechova, Emily Putnam-Hornstein, David Steier, Yulia Tsvetkov

Given well-established racial bias in this setting, we investigate possible ways deployed NLP is liable to increase racial disparities.

coreference-resolution Fairness +3

Paper
Add Code

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

no code implementations • 24 May 2023 • Weijia Shi, Xiaochuang Han, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, Scott Wen-tau Yih

Language models (LMs) often struggle to pay enough attention to the input context, and generate texts that are unfaithful or contain hallucinations.

Paper
Add Code

GlobalBench: A Benchmark for Global Progress in Natural Language Processing

no code implementations • 24 May 2023 • Yueqi Song, Catherine Cui, Simran Khanuja, PengFei Liu, Fahim Faisal, Alissa Ostapenko, Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Yulia Tsvetkov, Antonios Anastasopoulos, Graham Neubig

Despite the major advances in NLP, significant disparities in NLP system performance across languages still exist.

Paper
Add Code

David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs

no code implementations • 24 May 2023 • Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, Marjan Ghazvininejad

Diffusion-based language models are emerging as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced controllability at inference time.

Paper
Add Code

BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer

no code implementations • 24 May 2023 • Akari Asai, Sneha Kudugunta, Xinyan Velocity Yu, Terra Blevins, Hila Gonen, Machel Reid, Yulia Tsvetkov, Sebastian Ruder, Hannaneh Hajishirzi

Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English.

Benchmarking Cross-Lingual Transfer +1

Paper
Add Code

TalkUp: Paving the Way for Understanding Empowering Language

no code implementations • 23 May 2023 • Lucille Njoo, Chan Young Park, Octavia Stappart, Marvin Thielk, Yi Chu, Yulia Tsvetkov

Empowering language is important in many real-world contexts, from education to workplace dynamics to healthcare.

Paper
Add Code

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

no code implementations • 23 May 2023 • Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, Yulia Tsvetkov

Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products.

Fairness Language Modelling

Paper
Add Code

Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models

2 code implementations • 17 May 2023 • Shangbin Feng, Weijia Shi, Yuyang Bai, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov

Ultimately, Knowledge Card framework enables dynamic synthesis and updates of knowledge from diverse domains.

Retrieval

Paper
Code

Can Language Models Solve Graph Problems in Natural Language?

2 code implementations • NeurIPS 2023 • Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, Yulia Tsvetkov

We then propose Build-a-Graph Prompting and Algorithmic Prompting, two instruction-based approaches to enhance LLMs in solving natural language graph problems.

In-Context Learning Knowledge Probing +2

Paper
Code

From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models

1 code implementation • 15 May 2023 • Shangbin Feng, Chan Young Park, YuHan Liu, Yulia Tsvetkov

We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks.

Fairness Misinformation

Paper
Code

FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge

1 code implementation • 14 May 2023 • Shangbin Feng, Vidhisha Balachandran, Yuyang Bai, Yulia Tsvetkov

We propose FactKB, a simple new approach to factuality evaluation that is generalizable across domains, in particular with respect to entities and relations.

News Summarization

Paper
Code

Assessing Language Model Deployment with Risk Cards

2 code implementations • 31 Mar 2023 • Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad

However, there is no risk-centric framework for documenting the complexity of a landscape in which some risks are shared across models and contexts, while others are specific, and where certain conditions may be required for risks to manifest as harms.

Language Modelling Text Generation

801

Paper
Code

On the Blind Spots of Model-Based Evaluation Metrics for Text Generation

1 code implementation • 20 Dec 2022 • Tianxing He, Jingyu Zhang, Tianle Wang, Sachin Kumar, Kyunghyun Cho, James Glass, Yulia Tsvetkov

In this work, we explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics: stress tests with synthetic data.

Text Generation

Paper
Code

Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?

no code implementations • 20 Dec 2022 • Weijia Shi, Xiaochuang Han, Hila Gonen, Ari Holtzman, Yulia Tsvetkov, Luke Zettlemoyer

Large language models can perform new tasks in a zero-shot fashion, given natural language prompts that specify the desired behavior.

Paper
Add Code

SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control

1 code implementation • 31 Oct 2022 • Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov

Despite the growing success of diffusion models in continuous-valued domains (e. g., images), similar efforts for discrete domains such as text have yet to match the performance of autoregressive language models.

Language Modelling Text Generation

Paper
Code

Gendered Mental Health Stigma in Masked Language Models

no code implementations • 27 Oct 2022 • Inna Wanyin Lin, Lucille Njoo, Anjalie Field, ASHISH SHARMA, Katharina Reinecke, Tim Althoff, Yulia Tsvetkov

Mental health stigma prevents many individuals from receiving the appropriate care, and social psychology studies have shown that mental health tends to be overlooked in men.

Paper
Add Code

Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation

no code implementations • 25 Oct 2022 • Melanie Sclar, Peter West, Sachin Kumar, Yulia Tsvetkov, Yejin Choi

Moreover, we uniquely propose iterative distillation of knowledge, where student models from the previous iteration of distillation serve as teacher models in the next iteration.

Knowledge Distillation Sentence +1

Paper
Add Code

Correcting Diverse Factual Errors in Abstractive Summarization via Post-Editing and Language Model Infilling

1 code implementation • 22 Oct 2022 • Vidhisha Balachandran, Hannaneh Hajishirzi, William W. Cohen, Yulia Tsvetkov

Abstractive summarization models often generate inconsistent summaries containing factual errors or hallucinated content.

Abstractive Text Summarization Language Modelling

Paper
Code

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

no code implementations • 14 Oct 2022 • Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov

Recent advances in the capacity of large language models to generate human-like text have resulted in their increased adoption in user-facing settings.

Language Modelling Text Generation

Paper
Add Code

KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding

1 code implementation • 8 Oct 2022 • Shangbin Feng, Zhaoxuan Tan, Wenqian Zhang, Zhenyu Lei, Yulia Tsvetkov

With the advent of pretrained language models (LMs), increasing research efforts have been focusing on infusing commonsense and domain-specific knowledge to prepare LMs for downstream tasks.

document understanding Knowledge Graphs +3

Paper
Code

Gradient-Based Constrained Sampling from Language Models

no code implementations • 25 May 2022 • Sachin Kumar, Biswajit Paria, Yulia Tsvetkov

Large pretrained language models generate fluent text but are notoriously hard to controllably sample from.

Language Modelling Text Generation

Paper
Add Code

ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data

no code implementations • 25 May 2022 • Xiaochuang Han, Yulia Tsvetkov

Large pretrained language models have been performing increasingly well in a variety of downstream tasks via prompting.

Natural Language Inference Sentiment Analysis

Paper
Add Code

Challenges and Opportunities in Information Manipulation Detection: An Examination of Wartime Russian Media

no code implementations • 24 May 2022 • Chan Young Park, Julia Mendelsohn, Anjalie Field, Yulia Tsvetkov

NLP research on public opinion manipulation campaigns has primarily focused on detecting overt strategies such as fake news and disinformation.

Paper
Add Code

Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching

1 code implementation • ACL 2022 • Alissa Ostapenko, Shuly Wintner, Melinda Fricke, Yulia Tsvetkov

Natural language processing (NLP) models trained on people-generated data can be unreliable because, without any constraints, they can learn from spurious correlations that are not relevant to the task.

Paper
Code

Unsupervised Keyphrase Extraction via Interpretable Neural Networks

1 code implementation • 15 Mar 2022 • Rishabh Joshi, Vidhisha Balachandran, Emily Saldanha, Maria Glenski, Svitlana Volkova, Yulia Tsvetkov

Keyphrase extraction aims at automatically extracting a list of "important" phrases representing the key concepts in a document.

Keyphrase Extraction Topic Classification

Paper
Code

Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs

1 code implementation • EMNLP (MRL) 2021 • Monisha Jegadeesan, Sachin Kumar, John Wieting, Yulia Tsvetkov

We present a novel technique for zero-shot paraphrase generation.

Paraphrase Generation Word Embeddings

Paper
Code

Can Machines Learn Morality? The Delphi Experiment

1 code implementation • 14 Oct 2021 • Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jenny Liang, Jesse Dodge, Keisuke Sakaguchi, Maxwell Forbes, Jon Borchardt, Saadia Gabriel, Yulia Tsvetkov, Oren Etzioni, Maarten Sap, Regina Rini, Yejin Choi

As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof.

Descriptive Ethics

Paper
Code

Detecting Community Sensitive Norm Violations in Online Conversations

1 code implementation • Findings (EMNLP) 2021 • Chan Young Park, Julia Mendelsohn, Karthik Radhakrishnan, Kinjal Jain, Tushar Kanakagiri, David Jurgens, Yulia Tsvetkov

Online platforms and communities establish their own norms that govern what behavior is acceptable within the community.

Paper
Code

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

1 code implementation • Findings (EMNLP) 2021 • Xiaochuang Han, Yulia Tsvetkov

Among the most critical limitations of deep learning NLP models are their lack of interpretability, and their reliance on spurious correlations.

Paper
Code

Improving Span Representation for Domain-adapted Coreference Resolution

1 code implementation • CRAC (ACL) 2021 • Nupoor Gandhi, Anjalie Field, Yulia Tsvetkov

Recent work has shown fine-tuning neural coreference models can produce strong performance when adapting to different domains.

coreference-resolution Domain Adaptation

Paper
Code

Efficient Test Time Adapter Ensembling for Low-resource Language Varieties

1 code implementation • Findings (EMNLP) 2021 • Xinyi Wang, Yulia Tsvetkov, Sebastian Ruder, Graham Neubig

Adapters are light-weight modules that allow parameter-efficient fine-tuning of pretrained models.

Cross-Lingual Transfer named-entity-recognition +4

Paper
Code

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

2 code implementations • ICLR 2022 • ZiRui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks.

Ranked #4 on Visual Entailment on SNLI-VE val

Image Captioning Language Modelling +2

Paper
Code

Controlled Text Generation as Continuous Optimization with Multiple Constraints

1 code implementation • NeurIPS 2021 • Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov

As large-scale language model pretraining pushes the state-of-the-art in text generation, recent work has turned to controlling attributes of the text such models generate.

Language Modelling Machine Translation +4

Paper
Code

A Survey of Race, Racism, and Anti-Racism in NLP

no code implementations • ACL 2021 • Anjalie Field, Su Lin Blodgett, Zeerak Waseem, Yulia Tsvetkov

Despite inextricable ties between race and language, little work has considered race in NLP research and development.

Paper
Add Code

Machine Translation into Low-resource Language Varieties

no code implementations • ACL 2021 • Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov

State-of-the-art machine translation (MT) systems are typically trained to generate the "standard" target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language.

Machine Translation Translation

Paper
Add Code

Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation

1 code implementation • Findings (ACL) 2021 • Prakhar Gupta, Yulia Tsvetkov, Jeffrey P. Bigham

Experiments on classification, ranking and evaluation tasks across multiple datasets demonstrate that our approaches outperform strong baselines in providing informative negative examples for training dialogue systems.

Binary Classification Dialogue Evaluation

Paper
Code

DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues

2 code implementations • ICLR 2021 • Rishabh Joshi, Vidhisha Balachandran, Shikhar Vashishth, Alan Black, Yulia Tsvetkov

To successfully negotiate a deal, it is not enough to communicate fluently: pragmatic planning of persuasive negotiation strategies is essential.

Response Generation

Paper
Code

Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics

2 code implementations • NAACL 2021 • Artidoro Pagnoni, Vidhisha Balachandran, Yulia Tsvetkov

Modern summarization models generate highly fluent but often factually unreliable outputs.

Abstractive Text Summarization

Paper
Code

Simple and Efficient ways to Improve REALM

no code implementations • EMNLP (MRQA) 2021 • Vidhisha Balachandran, Ashish Vaswani, Yulia Tsvetkov, Niki Parmar

Dense retrieval has been shown to be effective for retrieving relevant documents for Open Domain QA, surpassing popular sparse retrieval methods like BM25.

Retrieval

Paper
Add Code

An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation

no code implementations • 31 Mar 2021 • Lidia Kidane, Sachin Kumar, Yulia Tsvetkov

It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions, often requiring large amounts of auxiliary data to achieve competitive results.

Data Augmentation Machine Translation +2

Paper
Add Code

Evaluating the Morphosyntactic Well-formedness of Generated Texts

1 code implementation • EMNLP 2021 • Adithya Pratapa, Antonios Anastasopoulos, Shruti Rijhwani, Aditi Chaudhary, David R. Mortensen, Graham Neubig, Yulia Tsvetkov

Text generation systems are ubiquitous in natural language processing applications.

Machine Translation Text Generation +1

Paper
Code

SelfExplain: A Self-Explaining Architecture for Neural Text Classifiers

2 code implementations • EMNLP 2021 • Dheeraj Rajagopal, Vidhisha Balachandran, Eduard Hovy, Yulia Tsvetkov

We introduce SelfExplain, a novel self-explaining model that explains a text classifier's predictions using phrase-based concepts.

text-classification Text Classification

Paper
Code

Controlled Analyses of Social Biases in Wikipedia Bios

1 code implementation • 31 Dec 2020 • Anjalie Field, Chan Young Park, Kevin Z. Lin, Yulia Tsvetkov

In this work, we present a methodology for analyzing Wikipedia pages about people that isolates dimensions of interest (e. g., gender), from other attributes (e. g., occupation).

Paper
Code

Understanding Linguistic Accommodation in Code-Switched Human-Machine Dialogues

1 code implementation • CONLL 2020 • Tanmay Parekh, Emily Ahn, Yulia Tsvetkov, Alan W Black

Code-switching is a ubiquitous phenomenon in multilingual communities.

Paper
Code

Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia

no code implementations • 21 Oct 2020 • Chan Young Park, Xinru Yan, Anjalie Field, Yulia Tsvetkov

Specific lexical choices in narrative text reflect both the writer's attitudes towards people in the narrative and influence the audience's reactions.

Paper
Add Code

End-to-End Differentiable GANs for Text Generation

no code implementations • NeurIPS Workshop ICBINB 2020 • Sachin Kumar, Yulia Tsvetkov

We posit that this gap is due to autoregressive nature and architectural requirements for text generation as well as a fundamental difference between the definition of Wasserstein distance in image and text domains.

Text Generation

Paper
Add Code

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

1 code implementation • ICLR 2021 • ZiRui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Massively multilingual models subsuming tens or even hundreds of languages pose great challenges to multi-task optimization.

Machine Translation Multi-Task Learning +1

Paper
Code

Fortifying Toxic Speech Detectors Against Veiled Toxicity

1 code implementation • EMNLP 2020 • Xiaochuang Han, Yulia Tsvetkov

Modern toxic speech detectors are incompetent in recognizing disguised offensive language, such as adversarial attacks that deliberately avoid known toxic lexicons, or manifestations of implicit bias.

Paper
Code

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

1 code implementation • EMNLP 2020 • ZiRui Wang, Zachary C. Lipton, Yulia Tsvetkov

Modern multilingual models are trained on concatenated text from multiple languages in hopes of conferring benefits to each (positive transfer), with the most pronounced benefits accruing to low-resource languages.

Meta-Learning

Paper
Code

Automatic Extraction of Rules Governing Morphological Agreement

1 code implementation • EMNLP 2020 • Aditi Chaudhary, Antonios Anastasopoulos, Adithya Pratapa, David R. Mortensen, Zaid Sheikh, Yulia Tsvetkov, Graham Neubig

Using cross-lingual transfer, even with no expert annotations in the language of interest, our framework extracts a grammatical specification which is nearly equivalent to those created with large amounts of gold-standard annotated data.

Cross-Lingual Transfer Descriptive

Paper
Code

Controlling Dialogue Generation with Semantic Exemplars

1 code implementation • NAACL 2021 • Prakhar Gupta, Jeffrey P. Bigham, Yulia Tsvetkov, Amy Pavel

Dialogue systems pretrained with large language models generate locally coherent responses, but lack the fine-grained control over responses necessary to achieve specific goals.

Dialogue Generation Response Generation

Paper
Code

LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification

no code implementations • SEMEVAL 2020 • Sopan Khosla, Rishabh Joshi, Ritam Dutt, Alan W. black, Yulia Tsvetkov

In this paper we describe our submission for the task of Propaganda Span Identification in news articles.

Position Propaganda span identification +2

Paper
Add Code

A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with Bilingual Semantic Similarity Rewards

1 code implementation • WS 2020 • Zi-Yi Dou, Sachin Kumar, Yulia Tsvetkov

The model uses reinforcement learning to directly optimize a bilingual semantic similarity metric between the summaries generated in a target language and gold summaries in a source language.

Machine Translation reinforcement-learning +5

Paper
Code

Cross-Cultural Similarity Features for Cross-Lingual Transfer Learning of Pragmatically Motivated Tasks

2 code implementations • EACL 2021 • Jimin Sun, Hwijeen Ahn, Chan Young Park, Yulia Tsvetkov, David R. Mortensen

Much work in cross-lingual transfer learning explored how to select better transfer languages for multilingual tasks, primarily focusing on typological and genealogical similarities between languages.

Cross-Lingual Transfer Dependency Parsing +2

Paper
Code

Demoting Racial Bias in Hate Speech Detection

no code implementations • WS 2020 • Mengzhou Xia, Anjalie Field, Yulia Tsvetkov

In current hate speech datasets, there exists a high correlation between annotators' perceptions of toxicity and signals of African American English (AAE).

Hate Speech Detection

Paper
Add Code

A Computational Analysis of Polarization on Indian and Pakistani Social Media

1 code implementation • 20 May 2020 • Aman Tyagi, Anjalie Field, Priyank Lathwal, Yulia Tsvetkov, Kathleen M. Carley

Between February 14, 2019 and March 4, 2019, a terrorist attack in Pulwama, Kashmir followed by retaliatory airstrikes led to rising tensions between India and Pakistan, two nuclear-armed countries.

Paper
Code

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions

1 code implementation • ACL 2020 • Xiaochuang Han, Byron C. Wallace, Yulia Tsvetkov

In this work, we investigate the use of influence functions for NLP, providing an alternative approach to interpreting neural text classifiers.

Natural Language Inference

Paper
Code

Unsupervised Discovery of Implicit Gender Bias

1 code implementation • EMNLP 2020 • Anjalie Field, Yulia Tsvetkov

Despite their prevalence in society, social biases are difficult to identify, primarily because human judgements in this domain can be unreliable.

Paper
Code

Balancing Training for Multilingual Neural Machine Translation

2 code implementations • ACL 2020 • Xinyi Wang, Yulia Tsvetkov, Graham Neubig

When training multilingual machine translation (MT) models that can translate to/from multiple languages, we are faced with imbalanced training sets: some languages have much more training data than others.

Machine Translation Translation

Paper
Code

A Framework for the Computational Linguistic Analysis of Dehumanization

no code implementations • 6 Mar 2020 • Julia Mendelsohn, Yulia Tsvetkov, Dan Jurafsky

Dehumanization is a pernicious psychological process that often leads to extreme intergroup bias, hate speech, and violence aimed at targeted social groups.

Abusive Language

Paper
Add Code

StructSum: Summarization via Structured Representations

1 code implementation • EACL 2021 • Vidhisha Balachandran, Artidoro Pagnoni, Jay Yoon Lee, Dheeraj Rajagopal, Jaime Carbonell, Yulia Tsvetkov

To this end, we propose incorporating latent and explicit dependencies across sentences in the source document into end-to-end single-document summarization models.

Abstractive Text Summarization Document Summarization +1

Paper
Code

Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods

1 code implementation • SCiL 2020 • Maria Ryskina, Ella Rabinovich, Taylor Berg-Kirkpatrick, David R. Mortensen, Yulia Tsvetkov

Besides presenting a new linguistic application of distributional semantics, this study tackles the linguistic question of the role of language-internal factors (in our case, sparsity) in language change motivated by language-external factors (reflected in frequency growth).

Paper
Code

Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts

no code implementations • IJCNLP 2019 • Luke Breitfeller, Emily Ahn, David Jurgens, Yulia Tsvetkov

Microaggressions are subtle, often veiled, manifestations of human biases.

Active Learning

Paper
Add Code

A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation

no code implementations • WS 2019 • Gayatri Bhat, Sachin Kumar, Yulia Tsvetkov

Neural models that eliminate the softmax bottleneck by generating word embeddings (rather than multinomial distributions over a vocabulary) attain faster training with fewer learnable parameters.

Machine Translation Translation +1

Paper
Add Code

Learning to Generate Word- and Phrase-Embeddings for Efficient Phrase-Based Neural Machine Translation

no code implementations • WS 2019 • Chan Young Park, Yulia Tsvetkov

In this paper, we introduce a phrase-based NMT model built upon continuous-output NMT, in which the decoder generates embeddings of words or phrases.

Machine Translation NMT +1

Paper
Add Code

Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History

no code implementations • ICLR 2020 • Yiheng Zhou, Yulia Tsvetkov, Alan W. black, Zhou Yu

We train FSTs on a set of strategies and tactics used in negotiation dialogs.

Paper
Add Code

A Dynamic Strategy Coach for Effective Negotiation

no code implementations • WS 2019 • Yiheng Zhou, He He, Alan W. black, Yulia Tsvetkov

We consider a bargaining scenario where a seller and a buyer negotiate the price of an item for sale through a text-based dialog.

Decision Making Text Generation

Paper
Add Code

Topics to Avoid: Demoting Latent Confounds in Text Classification

1 code implementation • IJCNLP 2019 • Sachin Kumar, Shuly Wintner, Noah A. Smith, Yulia Tsvetkov

Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well.

General Classification Native Language Identification +2

Paper
Code

CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology

1 code implementation • WS 2019 • Aditi Chaudhary, Elizabeth Salesky, Gayatri Bhat, David R. Mortensen, Jaime G. Carbonell, Yulia Tsvetkov

This paper presents the submission by the CMU-01 team to the SIGMORPHON 2019 task 2 of Morphological Analysis and Lemmatization in Context.

LEMMA Lemmatization +3

Paper
Code

Measuring Bias in Contextualized Word Representations

1 code implementation • WS 2019 • Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W. black, Yulia Tsvetkov

Contextual word embeddings such as BERT have achieved state of the art performance in numerous NLP tasks.

Word Embeddings

Paper
Code

Entity-Centric Contextual Affective Analysis

no code implementations • ACL 2019 • Anjalie Field, Yulia Tsvetkov

While contextualized word representations have improved state-of-the-art benchmarks in many NLP tasks, their potential usefulness for social-oriented tasks remains largely unexplored.

Word Embeddings

Paper
Add Code

Contextual Affective Analysis: A Case Study of People Portrayals in Online #MeToo Stories

2 code implementations • 8 Apr 2019 • Anjalie Field, Gayatri Bhat, Yulia Tsvetkov

We show that while these articles are sympathetic towards women who have experienced sexual harassment, they consistently present men as most powerful, even after sexual assault allegations.

Social and Information Networks

Paper
Code

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings

1 code implementation • NAACL 2019 • Thomas Manzini, Yao Chong Lim, Yulia Tsvetkov, Alan W. black

Online texts -- across genres, registers, domains, and styles -- are riddled with human stereotypes, expressed in overt or subtle ways.

BIG-bench Machine Learning Word Embeddings

Paper
Code

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs

1 code implementation • ICLR 2019 • Sachin Kumar, Yulia Tsvetkov

The Softmax function is used in the final layer of nearly all existing sequence-to-sequence models for language generation.

Machine Translation Text Generation +2

Paper
Code

Style Transfer Through Multilingual and Feedback-Based Back-Translation

no code implementations • 17 Sep 2018 • Shrimai Prabhumoye, Yulia Tsvetkov, Alan W. black, Ruslan Salakhutdinov

Style transfer is the task of transferring an attribute of a sentence (e. g., formality) while maintaining its semantic content.

Attribute Sentence +2

Paper
Add Code

Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies

no code implementations • EMNLP 2018 • Anjalie Field, Doron Kliger, Shuly Wintner, Jennifer Pan, Dan Jurafsky, Yulia Tsvetkov

Amidst growing concern over media manipulation, NLP attention has focused on overt strategies like censorship and "fake news'".

Paper
Add Code

Socially Responsible NLP

no code implementations • NAACL 2018 • Yulia Tsvetkov, Vinodkumar Prabhakaran, Rob Voigt

As language technologies have become increasingly prevalent, there is a growing awareness that decisions we make about our data, methods, and tools are often tied up with their impact on people and societies.

Decision Making Ethics

Paper
Add Code

Native Language Cognate Effects on Second Language Lexical Choice

1 code implementation • TACL 2018 • Ella Rabinovich, Yulia Tsvetkov, Shuly Wintner

We present a computational analysis of cognate effects on the spontaneous linguistic productions of advanced non-native speakers.

Paper
Code

RtGender: A Corpus for Studying Differential Responses to Gender

no code implementations • LREC 2018 • Rob Voigt, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, Yulia Tsvetkov

Text Generation

Paper
Add Code

Style Transfer Through Back-Translation

3 code implementations • ACL 2018 • Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, Alan W. black

We first learn a latent representation of the input sentence which is grounded in a language translation model in order to better preserve the meaning of the sentence while reducing stylistic properties.

Ranked #10 on Unsupervised Text Style Transfer on Yelp

Sentence Style Transfer +2

161

Paper
Code

Incorporating Dialectal Variability for Socially Equitable Language Identification

no code implementations • ACL 2017 • David Jurgens, Yulia Tsvetkov, Dan Jurafsky

Language identification (LID) is a critical first step for processing multilingual text.

Language Identification

Paper
Add Code

Correlation-based Intrinsic Evaluation of Word Vector Representations

no code implementations • WS 2016 • Yulia Tsvetkov, Manaal Faruqui, Chris Dyer

We introduce QVEC-CCA--an intrinsic evaluation metric for word vector representations based on correlations of learned vectors with features extracted from linguistic resources.

Word Similarity

Paper
Add Code

Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

no code implementations • NAACL 2016 • Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W. black, Lori Levin, Chris Dyer

We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted.

Representation Learning

Paper
Add Code

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning

no code implementations • ACL 2016 • Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Brian MacWhinney, Chris Dyer

We use Bayesian optimization to learn curricula for word representation learning, optimizing performance on downstream tasks that depend on the learned representations as features.

Bayesian Optimization Representation Learning

Paper
Add Code

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

1 code implementation • WS 2016 • Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, Chris Dyer

Our study suggests that the use of word similarity tasks for evaluation of word vectors is not sustainable and calls for further research on evaluation methods.

Paper
Code

Massively Multilingual Word Embeddings

1 code implementation • 5 Feb 2016 • Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith

We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space.

Multilingual Word Embeddings Text Categorization

Paper
Code

Morphological Inflection Generation Using Character Sequence to Sequence Learning

1 code implementation • NAACL 2016 • Manaal Faruqui, Yulia Tsvetkov, Graham Neubig, Chris Dyer

Morphological inflection generation is the task of generating the inflected form of a given lemma corresponding to a particular linguistic transformation.

LEMMA Morphological Inflection

Paper
Code

Evaluation of Word Vector Representations by Subspace Alignment

1 code implementation • EMNLP 2015 • Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Guillaume Lample, Chris Dyer

Chunking Named Entity Recognition (NER) +2

Paper
Code

Not All Contexts Are Created Equal: Better Word Representations with Variable Attention

no code implementations • EMNLP 2015 • Wang Ling, Yulia Tsvetkov, Silvio Amir, Ferm, Ram{\'o}n ez, Chris Dyer, Alan W. black, Isabel Trancoso, Chu-Cheng Lin

Dependency Parsing Machine Translation +1

Paper
Add Code

Lexicon Stratification for Translating Out-of-Vocabulary Words

no code implementations • IJCNLP 2015 • Yulia Tsvetkov, Chris Dyer

Transliteration

Paper
Add Code

Sparse Overcomplete Word Vector Representations

3 code implementations • IJCNLP 2015 • Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, Noah Smith

Current distributed representations of words show little resemblance to theories of lexical semantics.

Paper
Code

Constraint-Based Models of Lexical Borrowing

no code implementations • HLT 2015 • Yulia Tsvetkov, Chris Dyer, Waleed Ammar

Transliteration

Paper
Add Code

Automatic Classification of Communicative Functions of Definiteness

no code implementations • COLING 2014 • Archna Bhatia, Chu-Cheng Lin, Nathan Schneider, Yulia Tsvetkov, Fatima Talib Al-Raisi, Laleh Roostapour, Jordan Bender, Abhimanu Kumar, Lori Levin, M Simons, y, Chris Dyer

Classification General Classification

Paper
Add Code

Metaphor Detection with Cross-Lingual Model Transfer

1 code implementation • ACL 2014 • Yulia Tsvetkov, Leonid Boytsov, Anatole Gershman, Eric Nyberg, Chris Dyer

Decision Making Machine Translation +1

Paper
Code

Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources

no code implementations • CL 2014 • Yulia Tsvetkov, Shuly Wintner

Paper
Add Code

The CMU Machine Translation Systems at WMT 2014

no code implementations • WS 2014 • Austin Matthews, Waleed Ammar, Archna Bhatia, Weston Feely, Greg Hanneman, Eva Schlinger, Swabha Swayamdipta, Yulia Tsvetkov, Alon Lavie, Chris Dyer

Lemmatization Machine Translation +1

Paper
Add Code

A Unified Annotation Scheme for the Semantic/Pragmatic Components of Definiteness

no code implementations • LREC 2014 • Archna Bhatia, M Simons, y, Lori Levin, Yulia Tsvetkov, Chris Dyer, Jordan Bender

We present a definiteness annotation scheme that captures the semantic, pragmatic, and discourse information, which we call communicative functions, associated with linguistic descriptions such as {``}a story about my speech{''}, {``}the story{''}, {``}every time I give it{''}, {``}this slideshow{''}.

Machine Translation Specificity

Paper
Add Code

Augmenting English Adjective Senses with Supersenses

1 code implementation • LREC 2014 • Yulia Tsvetkov, Nathan Schneider, Dirk Hovy, Archna Bhatia, Manaal Faruqui, Chris Dyer

We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification.

Classification General Classification