Search Results for author: Dan Hendrycks

Found 52 papers, 35 papers with code

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

no code implementations • 18 Mar 2024 • Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li

While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected.

Ethics Fairness +1

Paper
Add Code

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

no code implementations • 5 Mar 2024 • Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks

To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs.

Multiple-choice

Paper
Add Code

Uncovering Latent Human Wellbeing in Language Model Embeddings

no code implementations • 19 Feb 2024 • Pedro Freire, ChengCheng Tan, Adam Gleave, Dan Hendrycks, Scott Emmons

Do language models implicitly learn a concept of human wellbeing?

Ethics Language Modelling +1

Paper
Add Code

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

1 code implementation • 6 Feb 2024 • Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks

Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods.

137

Paper
Code

Can LLMs Follow Simple Rules?

1 code implementation • 6 Nov 2023 • Norman Mu, Sarah Chen, Zifan Wang, Sizhe Chen, David Karamardian, Lulwa Aljeraisy, Basel Alomair, Dan Hendrycks, David Wagner

As Large Language Models (LLMs) are deployed with increasing real-world responsibilities, it is important to be able to specify and constrain the behavior of these systems in a reliable manner.

192

Paper
Code

Representation Engineering: A Top-Down Approach to AI Transparency

1 code implementation • 2 Oct 2023 • Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience.

Ranked #3 on Question Answering on TruthfulQA

Question Answering

528

Paper
Code

Identifying and Mitigating the Security Risks of Generative AI

no code implementations • 28 Aug 2023 • Clark Barrett, Brad Boyd, Elie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang

However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks.

Code Completion In-Context Learning +1

Paper
Add Code

AI Deception: A Survey of Examples, Risks, and Potential Solutions

no code implementations • 28 Aug 2023 • Peter S. Park, Simon Goldstein, Aidan O'Gara, Michael Chen, Dan Hendrycks

This paper argues that a range of current AI systems have learned how to deceive humans.

Paper
Add Code

An Overview of Catastrophic AI Risks

no code implementations • 21 Jun 2023 • Dan Hendrycks, Mantas Mazeika, Thomas Woodside

Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose catastrophic risks.

Paper
Add Code

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

no code implementations • NeurIPS 2023 • Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li

Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly.

Adversarial Robustness Ethics +1

Paper
Add Code

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

1 code implementation • 6 Apr 2023 • Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, HANLIN ZHANG, Scott Emmons, Dan Hendrycks

And how do we measure these behaviors in general-purpose models such as GPT-4?

Decision Making Ethics

Paper
Code

Natural Selection Favors AIs over Humans

no code implementations • 28 Mar 2023 • Dan Hendrycks

Evolution endowed humans with high intelligence, which allowed us to become one of the most successful species on the planet.

Paper
Add Code

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

2 code implementations • 2 Jan 2023 • Steven H. Wang, Antoine Scardigli, Leonard Tang, Wei Chen, Dimitry Levkin, Anya Chen, Spencer Ball, Thomas Woodside, Oliver Zhang, Dan Hendrycks

Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets.

Reading Comprehension

Paper
Code

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

1 code implementation • 18 Oct 2022 • Mantas Mazeika, Eric Tang, Andy Zou, Steven Basart, Jun Shern Chan, Dawn Song, David Forsyth, Jacob Steinhardt, Dan Hendrycks

In experiments, we show how video models that are primarily trained to recognize actions and find contours of objects can be repurposed to understand human preferences and the emotional content of videos.

Video Understanding

Paper
Code

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

3 code implementations • 13 Oct 2022 • Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, Ziwei Liu

Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications and has thus been extensively studied, with a plethora of methods developed in the literature.

Anomaly Detection Benchmarking +3

741

Paper
Code

Forecasting Future World Events with Neural Networks

1 code implementation • 30 Jun 2022 • Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks

We test language models on our forecasting task and find that performance is far below a human expert baseline.

Decision Making Language Modelling

173

Paper
Code

Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks

no code implementations • 17 Jun 2022 • Anthony M. Barrett, Dan Hendrycks, Jessica Newman, Brandie Nonnecke

In this document, we provide detailed actionable-guidance recommendations focused on identifying and managing risks of events with very high or catastrophic consequences, intended as a risk management practices resource for NIST for AI RMF version 1. 0 (released in January 2023), or for AI RMF users, or for other AI risk management guidance and standards as appropriate.

Management

Paper
Add Code

X-Risk Analysis for AI Research

no code implementations • 13 Jun 2022 • Dan Hendrycks, Mantas Mazeika

Artificial intelligence (AI) has the potential to greatly improve society, but as with any powerful technology, it comes with heightened risks and responsibilities.

Paper
Add Code

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

2,644

Paper
Code

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

2 code implementations • CVPR 2022 • Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, Jacob Steinhardt

In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy.

Adversarial Robustness Anomaly Detection +1

Paper
Code

Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines

no code implementations • 1 Dec 2021 • Jiachen Sun, Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen, Dan Hendrycks, Jihun Hamm, Z. Morley Mao

To alleviate this issue, we propose a novel data augmentation scheme, FourierMix, that produces augmentations to improve the spectral coverage of the training data.

Adversarial Robustness Benchmarking +1

Paper
Add Code

A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges

1 code implementation • 26 Oct 2021 • Mohammadreza Salehi, Hossein Mirzaei, Dan Hendrycks, Yixuan Li, Mohammad Hossein Rohban, Mohammad Sabokrou

To date, several research domains tackle the problem of detecting unfamiliar samples, including anomaly detection, novelty detection, one-class learning, open set recognition, and out-of-distribution detection.

Anomaly Detection Novelty Detection +2

Paper
Code

What Would Jiminy Cricket Do? Towards Agents That Behave Morally

1 code implementation • 25 Oct 2021 • Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt

When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong.

Paper
Code

Improving and Assessing Anomaly Detectors for Large-Scale Settings

no code implementations • 29 Sep 2021 • Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joseph Kwon, Mohammadreza Mostajabi, Jacob Steinhardt

We conduct extensive experiments in these more realistic settings for out-of-distribution detection and find that a surprisingly simple detector based on the maximum logit outperforms prior methods in all the large-scale multi-class, multi-label, and segmentation tasks, establishing a simple new baseline for future work.

Out-of-Distribution Detection Segmentation +1

Paper
Add Code

Unsolved Problems in ML Safety

no code implementations • 28 Sep 2021 • Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings.

Paper
Add Code

VisDA-2021 Competition Universal Domain Adaptation to Improve Performance on Out-of-Distribution Data

1 code implementation • 23 Jul 2021 • Dina Bashkirova, Dan Hendrycks, Donghyun Kim, Samarth Mishra, Kate Saenko, Kuniaki Saito, Piotr Teterwak, Ben Usman

Progress in machine learning is typically measured by training and testing a model on the same distribution of data, i. e., the same domain.

BIG-bench Machine Learning Universal Domain Adaptation +1

Paper
Code

Measuring Coding Challenge Competence With APPS

3 code implementations • 20 May 2021 • Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt

Recent models such as GPT-Neo can pass approximately 20% of the test cases of introductory problems, so we find that machine learning models are now beginning to learn how to code.

Ranked #8 on Code Generation on APPS

BIG-bench Machine Learning Code Generation

3,290

Paper
Code

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review

2 code implementations • 10 Mar 2021 • Dan Hendrycks, Collin Burns, Anya Chen, Spencer Ball

We address this bottleneck within the legal domain by introducing the Contract Understanding Atticus Dataset (CUAD), a new dataset for legal contract review.

357

Paper
Code

Measuring Mathematical Problem Solving With the MATH Dataset

4 code implementations • 5 Mar 2021 • Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.

Ranked #90 on Math Word Problem Solving on MATH

Math Math Word Problem Solving +1

699

Paper
Code

A Rigorous Evaluation of Real-World Distribution Shifts

no code implementations • 1 Jan 2021 • Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer

Motivated by this, we introduce a new data augmentation method which advances the state-of-the-art and outperforms models pretrained with 1000x more labeled data.

Data Augmentation

Paper
Add Code

How Multipurpose Are Language Models?

no code implementations • ICLR 2021 • Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

By comprehensively evaluating the breadth and depth of a model's academic and professional understanding, our test can be used to analyze models across many tasks and to identify important shortcomings.

Elementary Mathematics World Knowledge

Paper
Add Code

Towards Machine Ethics with Language Models

no code implementations • ICLR 2021 • Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

We show how to assess a language model’s knowledge of basic concepts of morality.

Ethics World Knowledge

Paper
Add Code

Measuring Massive Multitask Language Understanding

12 code implementations • 7 Sep 2020 • Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

Ranked #60 on Multi-task Language Understanding on MMLU

Elementary Mathematics Multi-task Language Understanding +1

5,631

Paper
Code

Aligning AI With Shared Human Values

2 code implementations • 5 Aug 2020 • Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

We show how to assess a language model's knowledge of basic concepts of morality.

Ranked #1 on Average on hendrycks2020ethics

Ethics reinforcement-learning +2

906

Paper
Code

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

1 code implementation • ICCV 2021 • Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer

We find that using larger models and artificial data augmentations can improve robustness on real-world distribution shifts, contrary to claims in prior work.

Ranked #29 on Domain Generalization on ImageNet-R

Data Augmentation Domain Generalization +1

237

Paper
Code

Pretrained Transformers Improve Out-of-Distribution Robustness

1 code implementation • ACL 2020 • Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song

Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions?

Paper
Code

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

15 code implementations • ICLR 2020 • Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan

We propose AugMix, a data processing technique that is simple to implement, adds limited computational overhead, and helps models withstand unforeseen corruptions.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Domain Generalization Image Classification +2

29,671

Paper
Code

Scaling Out-of-Distribution Detection for Real-World Settings

3 code implementations • 25 Nov 2019 • Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song

Out-of-Distribution Detection Segmentation +2

151

Paper
Code

Testing Robustness Against Unforeseen Adversaries

3 code implementations • 21 Aug 2019 • Max Kaufmann, Daniel Kang, Yi Sun, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks

To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a framework for evaluating model robustness against a range of unforeseen adversaries, including eighteen new non-L_p attacks.

Adversarial Defense Adversarial Robustness

Paper
Code

Natural Adversarial Examples

3 code implementations • CVPR 2021 • Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, Dawn Song

We also curate an adversarial out-of-distribution detection dataset called ImageNet-O, which is the first out-of-distribution detection dataset created for ImageNet models.

Ranked #39 on Domain Generalization on ImageNet-A

Adversarial Attack Data Augmentation +2

570

Paper
Code

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

3 code implementations • NeurIPS 2019 • Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song

Self-supervision provides effective representations for downstream tasks without requiring labels.

Ranked #4 on Anomaly Detection on Anomaly Detection on Unlabeled ImageNet-30 vs CUB-200

Outlier Detection Out-of-Distribution Detection +2

263

Paper
Code

Transfer of Adversarial Robustness Between Perturbation Types

no code implementations • 3 May 2019 • Daniel Kang, Yi Sun, Tom Brown, Dan Hendrycks, Jacob Steinhardt

We study the transfer of adversarial robustness of deep neural networks between different perturbation types.

Adversarial Robustness

Paper
Add Code

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

13 code implementations • ICLR 2019 • Dan Hendrycks, Thomas Dietterich

Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations.

Ranked #40 on Domain Generalization on ImageNet-C

Adversarial Defense Benchmarking +1

969

Paper
Code

Using Pre-Training Can Improve Model Robustness and Uncertainty

1 code implementation • 28 Jan 2019 • Dan Hendrycks, Kimin Lee, Mantas Mazeika

He et al. (2018) have called into question the utility of pre-training by showing that training from scratch can often yield similar performance to pre-training.

Adversarial Robustness General Classification +1

Paper
Code

Deep Anomaly Detection with Outlier Exposure

9 code implementations • ICLR 2019 • Dan Hendrycks, Mantas Mazeika, Thomas Dietterich

We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.

Ranked #3 on Out-of-Distribution Detection on CIFAR-100 (using extra training data)

Anomaly Detection Out-of-Distribution Detection +1

531

Paper
Code

Open Category Detection with PAC Guarantees

1 code implementation • ICML 2018 • Si Liu, Risheek Garrepalli, Thomas G. Dietterich, Alan Fern, Dan Hendrycks

Further, while there are algorithms for open category detection, there are few empirical results that directly report alien detection rates.

Paper
Code

Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations

2 code implementations • ICLR 2019 • Dan Hendrycks, Thomas G. Dietterich

Then we propose a new dataset called Icons-50 which opens research on a new kind of robustness, surface variation robustness.

Adversarial Defense Benchmarking

372

Paper
Code

Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise

1 code implementation • NeurIPS 2018 • Dan Hendrycks, Mantas Mazeika, Duncan Wilson, Kevin Gimpel

We utilize trusted data by proposing a loss correction technique that utilizes trusted examples in a data-efficient manner to mitigate the effects of label noise on deep neural network classifiers.

Data Poisoning

Paper
Code

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

14 code implementations • 7 Oct 2016 • Dan Hendrycks, Kevin Gimpel

We consider the two related problems of detecting if an example is misclassified or out-of-distribution.

Ranked #12 on Out-of-Distribution Detection on CIFAR-10 vs CIFAR-100

Anomaly Detection Automatic Speech Recognition +2

217

Paper
Code

Early Methods for Detecting Adversarial Images

1 code implementation • 1 Aug 2016 • Dan Hendrycks, Kevin Gimpel

Many machine learning classifiers are vulnerable to adversarial perturbations.

BIG-bench Machine Learning

Paper
Code

Adjusting for Dropout Variance in Batch Normalization and Weight Initialization

1 code implementation • 8 Jul 2016 • Dan Hendrycks, Kevin Gimpel

We show how to adjust for the variance introduced by dropout with corrections to weight initialization and Batch Normalization, yielding higher accuracy.

Data Augmentation

Paper
Code

Gaussian Error Linear Units (GELUs)

8 code implementations • 27 Jun 2016 • Dan Hendrycks, Kevin Gimpel

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function.

572

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.