1 code implementation • 12 Sep 2023 • Fraser Mince, Dzung Dinh, Jonas Kgomo, Neil Thompson, Sara Hooker
Collectively, our results reveal how costly straying from a narrow set of hardware-software combinations can be - and suggest that specialization of hardware impedes innovation in machine learning research.
1 code implementation • 11 Sep 2023 • Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermiş, Acyr Locatelli, Sara Hooker
The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost.
no code implementations • 8 Sep 2023 • Max Marion, Ahmet Üstün, Luiza Pozzobon, Alex Wang, Marzieh Fadaee, Sara Hooker
In this work, we take a wider view and explore scalable estimates of data quality that can be used to systematically measure the quality of pretraining data.
no code implementations • 6 Jul 2023 • Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O'Keefe, Jess Whittlestone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, Kevin Wolf
To address these challenges, at least three building blocks for the regulation of frontier models are needed: (1) standard-setting processes to identify appropriate requirements for frontier AI developers, (2) registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and (3) mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models.
no code implementations • 9 Jun 2023 • Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Hal Daumé III, Jesse Dodge, Ellie Evans, Sara Hooker, Yacine Jernite, Alexandra Sasha Luccioni, Alberto Lusoli, Margaret Mitchell, Jessica Newman, Marie-Therese Png, Andrew Strait, Apostol Vassilev
We move toward a standard approach in evaluating a generative AI system for any modality, in two overarching categories: what is able to be evaluated in a base system that has no predetermined application and what is able to be evaluated in society.
no code implementations • 30 May 2023 • Arash Ahmadian, Saurabh Dash, Hongyu Chen, Bharat Venkitesh, Stephen Gou, Phil Blunsom, Ahmet Üstün, Sara Hooker
In this work, we ask "are quantization cliffs in performance solely a factor of scale?"
1 code implementation • 24 Apr 2023 • Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker
We evaluate the implications of these changes on the reproducibility of findings that compare the relative merits of models and methods that aim to curb toxicity.
no code implementations • 1 Mar 2023 • Wei-Yin Ko, Daniel D'souza, Karina Nguyen, Randall Balestriero, Sara Hooker
Surprisingly, even with a simple homogenous ensemble -- all the individual models share the same training set, architecture, and design choices -- we find compelling and powerful gains in worst-k and minority group performance, i. e. fairness naturally emerges from ensembling.
no code implementations • 4 Nov 2022 • Kelechi Ogueji, Orevaoghene Ahia, Gbemileke Onilude, Sebastian Gehrmann, Sara Hooker, Julia Kreutzer
Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages.
1 code implementation • 26 Oct 2022 • Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette
We present our findings as the starting point for further research into evaluating how LLMs interpret language in context and to drive the development of more pragmatic and useful models of human discourse.
no code implementations • 20 Sep 2022 • Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, Sara Hooker
Modern machine learning research relies on relatively few carefully curated datasets.
no code implementations • 31 Aug 2022 • Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz
Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows.
1 code implementation • 1 Jul 2022 • Francesco Corti, Rahim Entezari, Sara Hooker, Davide Bacciu, Olga Saukh
We study the impact of different pruning techniques on the representation learned by deep neural networks trained with contrastive loss functions.
no code implementations • 13 Jun 2022 • Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon
We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods.
no code implementations • 14 Jan 2022 • Robin Tibor Schirrmeister, Rosanne Liu, Sara Hooker, Tonio Ball
To answer these questions, we need a clear measure of input simplicity (or inversely, complexity), an optimization objective that correlates with simplification, and a framework to incorporate such objective into training and inference.
1 code implementation • Findings (EMNLP) 2021 • Orevaoghene Ahia, Julia Kreutzer, Sara Hooker
However, evaluation of the trade-offs incurred by popular compression techniques has been centered on high-resource datasets.
1 code implementation • 27 Jul 2021 • Daniel D'souza, Zach Nussbaum, Chirag Agarwal, Sara Hooker
As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions.
no code implementations • 16 Jul 2021 • Niel Teng Hu, Xinyu Hu, Rosanne Liu, Sara Hooker, Jason Yosinski
Each example is propagated forward and backward through the network the same amount of times, independent of how much the example contributes to the learning protocol.
1 code implementation • 22 Jun 2021 • Donglin Zhuang, Xingyao Zhang, Shuaiwen Leon Song, Sara Hooker
However, we also find that the cost of ensuring determinism varies dramatically between neural network architectures and hardware types, e. g., with overhead up to $746\%$, $241\%$, and $196\%$ on a spectrum of widely used GPU accelerator architectures, relative to non-deterministic training.
no code implementations • 2 Feb 2021 • Kale-ab Tessera, Sara Hooker, Benjamin Rosman
Based upon these findings, we show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime.
no code implementations • 6 Oct 2020 • Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, Emily Denton
However, overall accuracy hides disproportionately high errors on a small subset of examples; we call this subset Compression Identified Exemplars (CIE).
1 code implementation • 14 Sep 2020 • Sara Hooker
Hardware, systems and algorithms research communities have historically had different incentive structures and fluctuating motivation to engage with each other explicitly.
1 code implementation • CVPR 2022 • Chirag Agarwal, Daniel D'souza, Sara Hooker
In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing.
no code implementations • 15 Apr 2020 • Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensbold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Seán Ó hÉigeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung
With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development.
Computers and Society
2 code implementations • 13 Nov 2019 • Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, Andrea Frome
However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques.
no code implementations • 25 Sep 2019 • Sara Hooker, Yann Dauphin, Aaron Courville, Andrea Frome
Neural network pruning techniques have demonstrated it is possible to remove the majority of weights in a network with surprisingly little degradation to top-1 test set accuracy.
6 code implementations • 25 Feb 2019 • Trevor Gale, Erich Elsen, Sara Hooker
We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.
3 code implementations • NeurIPS 2019 • Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim
We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks.
1 code implementation • ICLR 2018 • Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, Been Kim
Saliency methods aim to explain the predictions of deep neural networks.