Search Results for author: Bertie Vidgen

Found 38 papers, 17 papers with code

Online Abuse and Human Rights: WOAH Satellite Session at RightsCon 2020

no code implementations EMNLP (ALW) 2020 Vinodkumar Prabhakaran, Zeerak Waseem, Seyi Akiwowo, Bertie Vidgen

In 2020 The Workshop on Online Abuse and Harms (WOAH) held a satellite panel at RightsCons 2020, an international human rights conference.

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

no code implementations19 Feb 2025 Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, Sean McGregor, Kenneth Fricklas, Mala Kumar, Quentin Feuillade--Montixi, Kurt Bollacker, Felix Friedrich, Ryan Tsang, Bertie Vidgen, Alicia Parrish, Chris Knotz, Eleonora Presani, Jonathan Bennion, Marisa Ferrara Boston, Mike Kuniavsky, Wiebke Hutiri, James Ezick, Malek Ben Salem, Rajat Sahay, Sujata Goswami, Usman Gohar, Ben Huang, Supheakmungkol Sarin, Elie Alhajjar, Canyu Chen, Roman Eng, Kashyap Ramanandula Manjusha, Virendra Mehta, Eileen Long, Murali Emani, Natan Vidra, Benjamin Rukundo, Abolfazl Shahbazi, Kongtao Chen, Rajat Ghosh, Vithursan Thangarasa, Pierre Peigné, Abhinav Singh, Max Bartolo, Satyapriya Krishna, Mubashara Akhtar, Rafael Gold, Cody Coleman, Luis Oala, Vassil Tashev, Joseph Marvin Imperial, Amy Russ, Sasidhar Kunapuli, Nicolas Miailhe, Julien Delaunay, Bhaktipriya Radharapu, Rajat Shinde, Tuesday, Debojyoti Dutta, Declan Grabb, Ananya Gangavarapu, Saurav Sahay, Agasthya Gangavarapu, Patrick Schramowski, Stephen Singam, Tom David, Xudong Han, Priyanka Mary Mammen, Tarunima Prabhakar, Venelin Kovatchev, Ahmed Ahmed, Kelvin N. Manyeki, Sandeep Madireddy, Foutse khomh, Fedor Zhdanov, Joachim Baumann, Nina Vasan, Xianjun Yang, Carlos Mougn, Jibin Rajan Varghese, Hussain Chinoy, Seshakrishna Jitendar, Manil Maskey, Claire V. Hardgrove, TianHao Li, Aakash Gupta, Emil Joswin, Yifan Mai, Shachi H Kumar, Cigdem Patlak, Kevin Lu, Vincent Alessi, Sree Bhargavi Balija, Chenhe Gu, Robert Sullivan, James Gealy, Matt Lavrisa, James Goel, Peter Mattson, Percy Liang, Joaquin Vanschoren

This work represents a crucial step toward establishing global standards for AI risk and reliability evaluation while acknowledging the need for continued development in areas such as multiturn interactions, multimodal understanding, coverage of additional languages, and emerging hazard categories.

Why human-AI relationships need socioaffective alignment

no code implementations4 Feb 2025 Hannah Rose Kirk, Iason Gabriel, Chris Summerfield, Bertie Vidgen, Scott A. Hale

Humans strive to design safe AI systems that align with our goals and remain under our control.

LMUnit: Fine-grained Evaluation with Natural Language Unit Tests

no code implementations17 Dec 2024 Jon Saad-Falcon, Rajan Vivek, William Berrios, Nandita Shankar Naik, Matija Franklin, Bertie Vidgen, Amanpreet Singh, Douwe Kiela, Shikib Mehri

As language models become integral to critical workflows, assessing their behavior remains a fundamental challenge -- human evaluation is costly and noisy, while automated metrics provide only coarse, difficult-to-interpret signals.

Language Modeling Language Modelling

WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting

1 code implementation1 May 2024 Olly Styles, Sam Miller, Patricio Cerda-Mardini, Tanaya Guha, Victor Sanchez, Bertie Vidgen

We evaluate five existing ReAct agents on WorkBench, finding they successfully complete as few as 3% of tasks (Llama2-70B), and just 43% for the best-performing (GPT-4).

Scheduling

Introducing v0.5 of the AI Safety Benchmark from MLCommons

1 code implementation18 Apr 2024 Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Srijan Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Sarah Luger, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

1 code implementation8 Apr 2024 Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

Researchers and practitioners have met these concerns by creating an abundance of datasets for evaluating and improving LLM safety.

Language Modeling Language Modelling +1

FinanceBench: A New Benchmark for Financial Question Answering

2 code implementations20 Nov 2023 Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, Bertie Vidgen

We test 16 state of the art model configurations (including GPT-4-Turbo, Llama2 and Claude2, with vector stores and long context prompts) on a sample of 150 cases from FinanceBench, and manually review their answers (n=2, 400).

How to refund a wrong transaction in PhonePe Question Answering +2

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

no code implementations14 Nov 2023 Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A. Hale, Paul Röttger

While some of the models do not give a single unsafe response, most give unsafe responses to more than 20% of the prompts, with over 50% unsafe responses in the extreme.

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

no code implementations3 Oct 2023 Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers.

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

no code implementations9 Mar 2023 Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

Large language models (LLMs) are used to generate content for a wide range of tasks, and are set to reach a growing audience in coming years due to integration in product interfaces like ChatGPT or search engines like Bing.

Red Teaming

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

1 code implementation NAACL (WOAH) 2022 Paul Röttger, Haitham Seelawi, Debora Nozza, Zeerak Talat, Bertie Vidgen

To help address this issue, we introduce Multilingual HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models.

Diagnostic Hate Speech Detection

An influencer-based approach to understanding radical right viral tweets

no code implementations15 Sep 2021 Laila Sprejer, Helen Margetts, Kleber Oliveira, David O'Sullivan, Bertie Vidgen

We show that it is crucial to account for the influencer-level structure, and find evidence of the importance of both influencer- and content-level factors, including the number of followers each influencer has, the type of content (original posts, quotes and replies), the length and toxicity of content, and whether influencers request retweets.

Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate

no code implementations Findings (ACL) 2021 Austin Botelho, Bertie Vidgen, Scott A. Hale

We show that both text- and visual- enrichment improves model performance, with the multimodal model (0. 771) outperforming other models' F1 scores (0. 544, 0. 737, and 0. 754).

Introducing CAD: the Contextual Abuse Dataset

1 code implementation NAACL 2021 Bertie Vidgen, Dong Nguyen, Helen Margetts, Patricia Rossini, Rebekah Tromble

Online abuse can inflict harm on users and communities, making online spaces unsafe and toxic.

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

2 code implementations ACL 2021 Bertie Vidgen, Tristan Thrush, Zeerak Waseem, Douwe Kiela

We provide a new dataset of ~40, 000 entries, generated and labelled by trained annotators over four rounds of dynamic data creation.

Hate Speech Detection

Detecting East Asian Prejudice on Social Media

4 code implementations EMNLP (ALW) 2020 Bertie Vidgen, Austin Botelho, David Broniatowski, Ella Guest, Matthew Hall, Helen Margetts, Rebekah Tromble, Zeerak Waseem, Scott Hale

The outbreak of COVID-19 has transformed societies across the world as governments tackle the health, economic and social costs of the pandemic.

Directions in Abusive Language Training Data: Garbage In, Garbage Out

no code implementations3 Apr 2020 Bertie Vidgen, Leon Derczynski

Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies.

Abusive Language

Islamophobes are not all the same! A study of far right actors on Twitter

no code implementations13 Oct 2019 Bertie Vidgen, Taha Yasseri, Helen Margetts

Far-right actors are often purveyors of Islamophobic hate speech online, using social media to spread divisive and prejudiced messages which can stir up intergroup tensions and conflict.

Social and Information Networks Computers and Society Physics and Society Applications

Detecting weak and strong Islamophobic hate speech on social media

no code implementations12 Dec 2018 Bertie Vidgen, Taha Yasseri

Islamophobic hate speech on social media inflicts considerable harm on both targeted individuals and wider society, and also risks reputational damage for the host platforms.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.