HONEST: Measuring Hurtful Sentence Completion in Language Models

NAACL 2021  ·  Debora Nozza, Federico Bianchi, Dirk Hovy ·

Language models have revolutionized the field of NLP. However, language models capture and proliferate hurtful stereotypes, especially in text generation. Our results show that 4.3{\%} of the time, language models complete a sentence with a hurtful word. These cases are not random, but follow language and gender-specific patterns. We propose a score to measure hurtful sentence completions in language models (HONEST). It uses a systematic template- and lexicon-based bias evaluation methodology for six languages. Our findings suggest that these models replicate and amplify deep-seated societal stereotypes about gender roles. Sentence completions refer to sexual promiscuity when the target is female in 9{\%} of the time, and in 4{\%} to homosexuality when the target is male. The results raise questions about the use of these models in production settings.

PDF Abstract

Datasets


Introduced in the Paper:

HONEST

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Hurtful Sentence Completion HONEST BERT-base HONEST 1.19 # 1
Hurtful Sentence Completion HONEST DistilBERT-base HONEST 1.90 # 2
Hurtful Sentence Completion HONEST RoBERTa-large HONEST 2.62 # 4
Hurtful Sentence Completion HONEST RoBERTa-base HONEST 2.38 # 3
Hurtful Sentence Completion HONEST BERT-large HONEST 3.33 # 5

Methods