Search Results for author: Nithum Thain

Found 22 papers, 7 papers with code

Can We Improve Model Robustness through Secondary Attribute Counterfactuals?

no code implementations EMNLP 2021 Ananth Balashankar, Xuezhi Wang, Ben Packer, Nithum Thain, Ed Chi, Alex Beutel

By implementing RDI in the context of toxicity detection, we find that accounting for secondary attributes can significantly improve robustness, with improvements in sliced accuracy on the original dataset up to 7% compared to existing robustness methods.

Attribute coreference-resolution +3

Gemma: Open Models Based on Gemini Research and Technology

no code implementations13 Mar 2024 Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clément Crepy, Daniel Cer, Daphne Ippolito, David Reid, Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross Mcilroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu-Hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clément Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, Kathleen Kenealy

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models.

Detecting Hallucination and Coverage Errors in Retrieval Augmented Generation for Controversial Topics

no code implementations13 Mar 2024 Tyler A. Chang, Katrin Tomanek, Jessica Hoffmann, Nithum Thain, Erin Van Liemt, Kathleen Meier-Hellstern, Lucas Dixon

We explore a strategy to handle controversial topics in LLM-based chatbots based on Wikipedia's Neutral Point of View (NPOV) principle: acknowledge the absence of a single true answer and surface multiple perspectives.

Hallucination Retrieval +1

ConstitutionalExperts: Training a Mixture of Principle-based Prompts

no code implementations7 Mar 2024 Savvas Petridis, Ben Wedin, Ann Yuan, James Wexler, Nithum Thain

We also show that we can improve overall performance by learning unique prompts for different semantic regions of the training data and using a mixture-of-experts (MoE) architecture to route inputs at inference time.

Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals

no code implementations22 May 2023 Ananth Balashankar, Xuezhi Wang, Yao Qin, Ben Packer, Nithum Thain, Jilin Chen, Ed H. Chi, Alex Beutel

We demonstrate that with a small amount of human-annotated counterfactual data (10%), we can generate a counterfactual augmentation dataset with learned labels, that provides an 18-20% improvement in robustness and a 14-21% reduction in errors on 6 out-of-domain datasets, comparable to that of a fully human-annotated counterfactual dataset for both sentiment classification and question paraphrase tasks.

counterfactual Data Augmentation +2

Gradient-Based Automated Iterative Recovery for Parameter-Efficient Tuning

no code implementations13 Feb 2023 Maximilian Mozes, Tolga Bolukbasi, Ann Yuan, Frederick Liu, Nithum Thain, Lucas Dixon

In this paper, we explore the use of TracIn to improve model performance in the parameter-efficient tuning (PET) setting.

Decision Making Transfer Learning

Towards Agile Text Classifiers for Everyone

no code implementations13 Feb 2023 Maximilian Mozes, Jessica Hoffmann, Katrin Tomanek, Muhamed Kouate, Nithum Thain, Ann Yuan, Tolga Bolukbasi, Lucas Dixon

Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and chatbots.

Language Modelling text-classification +1

Measuring Recommender System Effects with Simulated Users

no code implementations12 Jan 2021 Sirui Yao, Yoni Halpern, Nithum Thain, Xuezhi Wang, Kang Lee, Flavien Prost, Ed H. Chi, Jilin Chen, Alex Beutel

Using this simulation framework, we can (a) isolate the effect of the recommender system from the user preferences, and (b) examine how the system performs not just on average for an "average user" but also the extreme experiences under atypical user behavior.

Collaborative Filtering Recommendation Systems

Fairness without Demographics through Adversarially Reweighted Learning

5 code implementations NeurIPS 2020 Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, Ed H. Chi

Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns.

Fairness

Classifying Constructive Comments

2 code implementations11 Apr 2020 Varada Kolhatkar, Nithum Thain, Jeffrey Sorensen, Lucas Dixon, Maite Taboada

The quality of the annotation scheme and the resulting dataset is evaluated using measurements of inter-annotator agreement, expert assessment of a sample, and by the constructiveness sub-characteristics, which we show provide a proxy for the general constructiveness concept.

Domain Adaptation

Practical Compositional Fairness: Understanding Fairness in Multi-Component Recommender Systems

no code implementations5 Nov 2019 Xuezhi Wang, Nithum Thain, Anu Sinha, Flavien Prost, Ed H. Chi, Jilin Chen, Alex Beutel

In addition to the theoretical results, we find on multiple datasets -- including a large-scale real-world recommender system -- that the overall system's end-to-end fairness is largely achievable by improving fairness in individual components.

Fairness Recommendation Systems

Debiasing Embeddings for Reduced Gender Bias in Text Classification

no code implementations WS 2019 Flavien Prost, Nithum Thain, Tolga Bolukbasi

(Bolukbasi et al., 2016) demonstrated that pretrained word embeddings can inherit gender bias from the data they were trained on.

General Classification text-classification +2

ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT

no code implementations SEMEVAL 2019 John Pavlopoulos, Nithum Thain, Lucas Dixon, Ion Androutsopoulos

This paper presents the application of two strong baseline systems for toxicity detection and evaluates their performance in identifying and categorizing offensive language in social media.

Language Identification

Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

4 code implementations11 Mar 2019 Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, Lucy Vasserman

Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large.

BIG-bench Machine Learning Fairness +2

WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

no code implementations EMNLP 2018 Yiqing Hua, Cristian Danescu-Niculescu-Mizil, Dario Taraborelli, Nithum Thain, Jeffery Sorensen, Lucas Dixon

We present a corpus that encompasses the complete history of conversations between contributors to Wikipedia, one of the largest online collaborative communities.

Conversations Gone Awry: Detecting Early Signs of Conversational Failure

no code implementations ACL 2018 Justine Zhang, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Nithum Thain, Dario Taraborelli

One of the main challenges online social systems face is the prevalence of antisocial behavior, such as harassment and personal attacks.

Ex Machina: Personal Attacks Seen at Scale

3 code implementations27 Oct 2016 Ellery Wulczyn, Nithum Thain, Lucas Dixon

The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon.

Cannot find the paper you are looking for? You can Submit a new open access paper.