Search Results for author: Sara Hooker

Found 29 papers, 14 papers with code

The Grand Illusion: The Myth of Software Portability and Implications for ML Progress

1 code implementation12 Sep 2023 Fraser Mince, Dzung Dinh, Jonas Kgomo, Neil Thompson, Sara Hooker

Collectively, our results reveal how costly straying from a narrow set of hardware-software combinations can be - and suggest that specialization of hardware impedes innovation in machine learning research.

Friction

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

1 code implementation11 Sep 2023 Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermiş, Acyr Locatelli, Sara Hooker

The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost.

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

no code implementations8 Sep 2023 Max Marion, Ahmet Üstün, Luiza Pozzobon, Alex Wang, Marzieh Fadaee, Sara Hooker

In this work, we take a wider view and explore scalable estimates of data quality that can be used to systematically measure the quality of pretraining data.

Memorization

Frontier AI Regulation: Managing Emerging Risks to Public Safety

no code implementations6 Jul 2023 Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O'Keefe, Jess Whittlestone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, Kevin Wolf

To address these challenges, at least three building blocks for the regulation of frontier models are needed: (1) standard-setting processes to identify appropriate requirements for frontier AI developers, (2) registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and (3) mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models.

Evaluating the Social Impact of Generative AI Systems in Systems and Society

no code implementations9 Jun 2023 Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Hal Daumé III, Jesse Dodge, Ellie Evans, Sara Hooker, Yacine Jernite, Alexandra Sasha Luccioni, Alberto Lusoli, Margaret Mitchell, Jessica Newman, Marie-Therese Png, Andrew Strait, Apostol Vassilev

We move toward a standard approach in evaluating a generative AI system for any modality, in two overarching categories: what is able to be evaluated in a base system that has no predetermined application and what is able to be evaluated in society.

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

1 code implementation24 Apr 2023 Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker

We evaluate the implications of these changes on the reproducibility of findings that compare the relative merits of models and methods that aim to curb toxicity.

FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling

no code implementations1 Mar 2023 Wei-Yin Ko, Daniel D'souza, Karina Nguyen, Randall Balestriero, Sara Hooker

Surprisingly, even with a simple homogenous ensemble -- all the individual models share the same training set, architecture, and design choices -- we find compelling and powerful gains in worst-k and minority group performance, i. e. fairness naturally emerges from ensembling.

Data Augmentation Fairness

Large language models are not zero-shot communicators

1 code implementation26 Oct 2022 Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette

We present our findings as the starting point for further research into evaluating how LLMs interpret language in context and to drive the development of more pragmatic and useful models of human discourse.

Studying the impact of magnitude pruning on contrastive learning methods

1 code implementation1 Jul 2022 Francesco Corti, Rahim Entezari, Sara Hooker, Davide Bacciu, Olga Saukh

We study the impact of different pruning techniques on the representation learned by deep neural networks trained with contrastive loss functions.

Contrastive Learning Network Pruning

Robust Distillation for Worst-class Performance

no code implementations13 Jun 2022 Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon

We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods.

Knowledge Distillation

When less is more: Simplifying inputs aids neural network understanding

no code implementations14 Jan 2022 Robin Tibor Schirrmeister, Rosanne Liu, Sara Hooker, Tonio Ball

To answer these questions, we need a clear measure of input simplicity (or inversely, complexity), an optimization objective that correlates with simplification, and a framework to incorporate such objective into training and inference.

Dataset Condensation

A Tale Of Two Long Tails

1 code implementation27 Jul 2021 Daniel D'souza, Zach Nussbaum, Chirag Agarwal, Sara Hooker

As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions.

Data Augmentation Vocal Bursts Valence Prediction

When does loss-based prioritization fail?

no code implementations16 Jul 2021 Niel Teng Hu, Xinyu Hu, Rosanne Liu, Sara Hooker, Jason Yosinski

Each example is propagated forward and backward through the network the same amount of times, independent of how much the example contributes to the learning protocol.

Randomness In Neural Network Training: Characterizing The Impact of Tooling

1 code implementation22 Jun 2021 Donglin Zhuang, Xingyao Zhang, Shuaiwen Leon Song, Sara Hooker

However, we also find that the cost of ensuring determinism varies dramatically between neural network architectures and hardware types, e. g., with overhead up to $746\%$, $241\%$, and $196\%$ on a spectrum of widely used GPU accelerator architectures, relative to non-deterministic training.

Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization

no code implementations2 Feb 2021 Kale-ab Tessera, Sara Hooker, Benjamin Rosman

Based upon these findings, we show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime.

Characterising Bias in Compressed Models

no code implementations6 Oct 2020 Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, Emily Denton

However, overall accuracy hides disproportionately high errors on a small subset of examples; we call this subset Compression Identified Exemplars (CIE).

Fairness Quantization

The Hardware Lottery

1 code implementation14 Sep 2020 Sara Hooker

Hardware, systems and algorithms research communities have historically had different incentive structures and fluctuating motivation to engage with each other explicitly.

Estimating Example Difficulty Using Variance of Gradients

1 code implementation CVPR 2022 Chirag Agarwal, Daniel D'souza, Sara Hooker

In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing.

Out-of-Distribution Detection

What Do Compressed Deep Neural Networks Forget?

2 code implementations13 Nov 2019 Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, Andrea Frome

However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques.

Fairness Interpretability Techniques for Deep Learning +4

Selective Brain Damage: Measuring the Disparate Impact of Model Pruning

no code implementations25 Sep 2019 Sara Hooker, Yann Dauphin, Aaron Courville, Andrea Frome

Neural network pruning techniques have demonstrated it is possible to remove the majority of weights in a network with surprisingly little degradation to top-1 test set accuracy.

Network Pruning

The State of Sparsity in Deep Neural Networks

6 code implementations25 Feb 2019 Trevor Gale, Erich Elsen, Sara Hooker

We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.

Model Compression Sparse Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.