Search Results for author: Hannah Rose Kirk

Found 23 papers, 12 papers with code

Introducing v0.5 of the AI Safety Benchmark from MLCommons

no code implementations • 18 Apr 2024 • Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.

Paper
Add Code

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

1 code implementation • 26 Feb 2024 • Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy

Motivated by this discrepancy, we challenge the prevailing constrained evaluation paradigm for values and opinions in LLMs and explore more realistic unconstrained evaluations.

Multiple-choice

Paper
Code

Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

no code implementations • 14 Feb 2024 • Jessica Quaye, Alicia Parrish, Oana Inel, Charvi Rastogi, Hannah Rose Kirk, Minsuk Kahng, Erin Van Liemt, Max Bartolo, Jess Tsang, Justin White, Nathan Clement, Rafael Mosquera, Juan Ciro, Vijay Janapa Reddi, Lora Aroyo

By focusing on ``implicitly adversarial'' prompts (those that trigger T2I models to generate unsafe images for non-obvious reasons), we isolate a set of difficult safety issues that human creativity is well-suited to uncover.

Text-to-Image Generation

Paper
Add Code

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

no code implementations • 14 Nov 2023 • Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A. Hale, Paul Röttger

While some of the models do not give a single unsafe response, most give unsafe responses to more than 20% of the prompts, with over 50% unsafe responses in the extreme.

Paper
Add Code

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

no code implementations • 11 Oct 2023 • Hannah Rose Kirk, Andrew M. Bean, Bertie Vidgen, Paul Röttger, Scott A. Hale

Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs).

Paper
Add Code

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

no code implementations • 3 Oct 2023 • Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers.

Paper
Add Code

Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West

no code implementations • 15 Sep 2023 • Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale

In this paper, we quantify stereotypical bias in popular LLMs according to an Indian-centric frame and compare bias levels between the Indian and Western contexts.

Language Modelling Large Language Model

Paper
Add Code

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

1 code implementation • 2 Aug 2023 • Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy

In this paper, we introduce a new test suite called XSTest to identify such eXaggerated Safety behaviours in a systematic way.

Language Modelling

Paper
Code

DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures

1 code implementation • 31 Jul 2023 • Hannah Rose Kirk, Angus R. Williams, Liam Burke, Yi-Ling Chung, Ivan Debono, Pica Johansson, Francesca Stevens, Jonathan Bright, Scott A. Hale

We find that (i) small amounts of diverse data are hugely beneficial to generalisation and model adaptation; (ii) models transfer more easily across demographics but models trained on cross-domain data are more generalisable; (iii) some groups contribute more to generalisability than others; and (iv) dataset similarity is a signal of transferability.

text-classification Text Classification

Paper
Code

VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution

1 code implementation • NeurIPS 2023 • Siobhan Mackenzie Hall, Fernanda Gonçalves Abrantes, Hanwen Zhu, Grace Sodunke, Aleksandar Shtedritski, Hannah Rose Kirk

We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language models.

Benchmarking Retrieval

Paper
Code

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets

1 code implementation • 24 May 2023 • Brandon Smith, Miguel Farinha, Siobhan Mackenzie Hall, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

To address this issue, we propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets, where only the gender of the subject is edited and the background is fixed.

Paper
Code

Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models

no code implementations • 22 May 2023 • Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Max Bartolo, Oana Inel, Juan Ciro, Rafael Mosquera, Addison Howard, Will Cukierski, D. Sculley, Vijay Janapa Reddi, Lora Aroyo

To address this need, we introduce the Adversarial Nibbler challenge.

Paper
Add Code

Assessing Language Model Deployment with Risk Cards

2 code implementations • 31 Mar 2023 • Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad

However, there is no risk-centric framework for documenting the complexity of a landscape in which some risks are shared across models and contexts, while others are specific, and where certain conditions may be required for risks to manifest as harms.

Language Modelling Text Generation

805

Paper
Code

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

no code implementations • 9 Mar 2023 • Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

Large language models (LLMs) are used to generate content for a wide range of tasks, and are set to reach a growing audience in coming years due to integration in product interfaces like ChatGPT or search engines like Bing.

Paper
Add Code

SemEval-2023 Task 10: Explainable Detection of Online Sexism

1 code implementation • 7 Mar 2023 • Hannah Rose Kirk, Wenjie Yin, Bertie Vidgen, Paul Röttger

Online sexism is a widespread and harmful phenomenon.

Paper
Code

Auditing large language models: a three-layered approach

no code implementations • 16 Feb 2023 • Jakob Mökander, Jonas Schuett, Hannah Rose Kirk, Luciano Floridi

However, existing auditing procedures fail to address the governance challenges posed by LLMs, which display emergent capabilities and are adaptable to a wide range of downstream tasks.

Paper
Add Code

Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning

1 code implementation • TRAC (COLING) 2022 • Hannah Rose Kirk, Bertie Vidgen, Scott A. Hale

Annotating abusive language is expensive, logistically complex and creates a risk of psychological harm.

Abusive Language Active Learning

Paper
Code

DataPerf: Benchmarks for Data-Centric AI Development

1 code implementation • NeurIPS 2023 • Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Smriti Raje, Max Bartolo, Sabri Eyuboglu, Amirata Ghorbani, Emmett Goodman, Oana Inel, Tariq Kane, Christine R. Kirkpatrick, Tzu-Sheng Kuo, Jonas Mueller, Tristan Thrush, Joaquin Vanschoren, Margaret Warren, Adina Williams, Serena Yeung, Newsha Ardalani, Praveen Paritosh, Lilith Bat-Leah, Ce Zhang, James Zou, Carole-Jean Wu, Cody Coleman, Andrew Ng, Peter Mattson, Vijay Janapa Reddi

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems.

Paper
Code

Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements

1 code implementation • NAACL (GeBNLP) 2022 • Conrad Borchers, Dalia Sara Gala, Benjamin Gilburt, Eduard Oravkin, Wilfried Bounsi, Yuki M. Asano, Hannah Rose Kirk

The growing capability and availability of generative language models has enabled a wide range of new downstream tasks.

Language Modelling Prompt Engineering

Paper
Code

Handling and Presenting Harmful Text in NLP Research

no code implementations • 29 Apr 2022 • Hannah Rose Kirk, Abeba Birhane, Bertie Vidgen, Leon Derczynski

Text data can pose a risk of harm.

Misinformation Part-Of-Speech Tagging +1

Paper
Add Code

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

1 code implementation • 22 Mar 2022 • Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

Vision-language models can encode societal biases and stereotypes, but there are challenges to measuring and mitigating these multimodal harms due to lacking measurement robustness and feature degradation.

Paper
Code

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

1 code implementation • NAACL 2022 • Hannah Rose Kirk, Bertram Vidgen, Paul Röttger, Tristan Thrush, Scott A. Hale

Using the test suite, we expose weaknesses in existing hate detection models.

Benchmarking

Paper
Code

Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

no code implementations • ACL (WOAH) 2021 • Hannah Rose Kirk, Yennie Jun, Paulius Rauba, Gal Wachtel, Ruining Li, Xingjian Bai, Noah Broestl, Martin Doff-Sotta, Aleksandar Shtedritski, Yuki M. Asano

In this paper, we collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset.

Optical Character Recognition (OCR)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.