no code implementations • ICML 2020 • Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry

Dataset replication is a useful tool for assessing whether models have overfit to a specific validation set or the exact circumstances under which it was generated.

no code implementations • 20 Jun 2024 • Erik Jones, Anca Dragan, Jacob Steinhardt

In this work, we show that individually testing models for misuse is inadequate; adversaries can misuse combinations of models even when each individual model is safe.

no code implementations • 6 Jun 2024 • Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

We interpret the function of individual neurons in CLIP by automatically describing them using text.

no code implementations • 28 Feb 2024 • Danny Halawi, Fred Zhang, Chen Yueh-Han, Jacob Steinhardt

In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters.

1 code implementation • 9 Feb 2024 • Alexander Pan, Erik Jones, Meena Jagadeesan, Jacob Steinhardt

Language models influence the external world: they query APIs that read and write to web pages, generate content that shapes human behavior, and run system commands as autonomous agents.

1 code implementation • CVPR 2024 • Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy

To aid in this discovery process, we explore the task of automatically describing the differences between two $\textbf{sets}$ of images, which we term Set Difference Captioning.

1 code implementation • 26 Oct 2023 • Jiahai Feng, Jacob Steinhardt

To correctly use in-context information, language models (LMs) must bind entities to their attributes.

1 code implementation • 9 Oct 2023 • Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

We decompose the image representation as a sum across individual image patches, model layers, and attention heads, and use CLIP's text representation to interpret the summands.

1 code implementation • 18 Jul 2023 • Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt

The first phenomenon, overthinking, appears when we decode predictions from intermediate layers, given correct vs. incorrect few-shot demonstrations.

no code implementations • 17 Jul 2023 • Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown

To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input.

no code implementations • 29 Jun 2023 • Yongyi Yang, Jacob Steinhardt, Wei Hu

This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution.

1 code implementation • NeurIPS 2023 • Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt, Nika Haghtalab

As the scale of machine learning models increases, trends such as scaling laws anticipate consistent downstream improvements in predictive accuracy.

1 code implementation • NeurIPS 2023 • Shengbang Tong, Erik Jones, Jacob Steinhardt

Because CLIP is the backbone for most state-of-the-art multimodal systems, these inputs produce failures in Midjourney 5. 1, DALL-E, VideoFusion, and others.

no code implementations • 13 Jun 2023 • Xinyan Hu, Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt

In content recommender systems such as TikTok and YouTube, the platform's recommendation algorithm shapes content producer incentives.

2 code implementations • 14 Mar 2023 • Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer.

1 code implementation • 8 Mar 2023 • Erik Jones, Anca Dragan, aditi raghunathan, Jacob Steinhardt

Auditing large language models for unexpected behaviors is critical to preempt catastrophic deployments, yet remains challenging.

no code implementations • 23 Feb 2023 • Kush Bhatia, Wenshuo Guo, Jacob Steinhardt

We specifically show that the well-studied problem of Gaussian process (GP) bandit optimization is a special case of our framework, and that our bounds either improve or are competitive with known regret guarantees for the Mat\'ern kernel.

1 code implementation • 12 Jan 2023 • Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt

Based on this understanding, we define progress measures that allow us to study the dynamics of training and split training into three continuous phases: memorization, circuit formation, and cleanup.

1 code implementation • 7 Dec 2022 • Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt

Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect.

2 code implementations • 1 Nov 2022 • Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt

Research in mechanistic interpretability seeks to explain behaviors of machine learning models in terms of their internal components.

1 code implementation • 18 Oct 2022 • Mantas Mazeika, Eric Tang, Andy Zou, Steven Basart, Jun Shern Chan, Dawn Song, David Forsyth, Jacob Steinhardt, Dan Hendrycks

In experiments, we show how video models that are primarily trained to recognize actions and find contours of objects can be repurposed to understand human preferences and the emotional content of videos.

1 code implementation • 30 Jun 2022 • Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks

We test language models on our forecasting task and find that performance is far below a human expert baseline.

1 code implementation • 27 Jun 2022 • Jean-Stanislas Denain, Jacob Steinhardt

Model visualizations provide information that outputs alone might miss.

1 code implementation • NeurIPS 2023 • Meena Jagadeesan, Nikhil Garg, Jacob Steinhardt

Producers seek to create content that will be shown by the recommendation algorithm, which can impact both the diversity and quality of their content.

no code implementations • 11 Mar 2022 • Alexander Wei, Wei Hu, Jacob Steinhardt

On the other hand, we find that the classical GCV estimator (Craven and Wahba, 1978) accurately predicts generalization risk even in such overparameterized settings.

no code implementations • 24 Feb 2022 • Erik Jones, Jacob Steinhardt

Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code.

1 code implementation • 11 Feb 2022 • Yaodong Yu, Zitong Yang, Alexander Wei, Yi Ma, Jacob Steinhardt

Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels.

1 code implementation • 28 Jan 2022 • Ruiqi Zhong, Charlie Snell, Dan Klein, Jacob Steinhardt

We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier.

1 code implementation • ICLR 2022 • Alexander Pan, Kush Bhatia, Jacob Steinhardt

Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied.

2 code implementations • CVPR 2022 • Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, Jacob Steinhardt

In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy.

no code implementations • 8 Dec 2021 • Alan Pham, Eunice Chan, Vikranth Srivatsa, Dhruba Ghosh, Yaoqing Yang, Yaodong Yu, Ruiqi Zhong, Joseph E. Gonzalez, Jacob Steinhardt

Overparameterization is shown to result in poor test accuracy on rare subgroups under a variety of settings where subgroup information is known.

no code implementations • NeurIPS 2021 • Frances Ding, Jean-Stanislas Denain, Jacob Steinhardt

To understand neural network behavior, recent works quantitatively compare different networks' learned representations using canonical correlation analysis (CCA), centered kernel alignment (CKA), and other dissimilarity measures.

1 code implementation • 25 Oct 2021 • Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt

When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong.

no code implementations • 29 Sep 2021 • Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joseph Kwon, Mohammadreza Mostajabi, Jacob Steinhardt

We conduct extensive experiments in these more realistic settings for out-of-distribution detection and find that a surprisingly simple detector based on the maximum logit outperforms prior methods in all the large-scale multi-class, multi-label, and segmentation tasks, establishing a simple new baseline for future work.

no code implementations • 28 Sep 2021 • Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings.

no code implementations • NeurIPS 2021 • Meena Jagadeesan, Alexander Wei, Yixin Wang, Michael I. Jordan, Jacob Steinhardt

Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data.

3 code implementations • 3 Aug 2021 • Frances Ding, Jean-Stanislas Denain, Jacob Steinhardt

To understand neural network behavior, recent works quantitatively compare different networks' learned representations using canonical correlation analysis (CCA), centered kernel alignment (CKA), and other dissimilarity measures.

3 code implementations • 20 May 2021 • Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt

Recent models such as GPT-Neo can pass approximately 20% of the test cases of introductory problems, so we find that machine learning models are now beginning to learn how to code.

Ranked #9 on Code Generation on APPS

1 code implementation • Findings (ACL) 2021 • Ruiqi Zhong, Dhruba Ghosh, Dan Klein, Jacob Steinhardt

We develop statistically rigorous methods to address this, and after accounting for pretraining and finetuning noise, we find that our BERT-Large is worse than BERT-Mini on at least 1-4% of instances across MNLI, SST-2, and QQP, compared to the overall accuracy improvement of 2-10%.

no code implementations • 17 Apr 2021 • Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt

This raises an interesting question whether learning is even possible in our setup, given that obtaining a generalizable estimate of utility $u^*$ might not be possible from finitely many samples.

1 code implementation • 17 Mar 2021 • Yaodong Yu, Zitong Yang, Edgar Dobriban, Jacob Steinhardt, Yi Ma

To investigate this gap, we decompose the test risk into its bias and variance components and study their behavior as a function of adversarial training perturbation radii ($\varepsilon$).

1 code implementation • 13 Mar 2021 • Charlie Snell, Ruiqi Zhong, Dan Klein, Jacob Steinhardt

Our approximation explains why models sometimes attend to salient words, and inspires a toy example where a multi-head attention model can overcome the above hard training distribution by improving learning dynamics rather than expressiveness.

1 code implementation • CVPR 2021 • Collin Burns, Jacob Steinhardt

Feature alignment is an approach to improving robustness to distribution shift that matches the distribution of feature activations between the training distribution and test distribution.

4 code implementations • 5 Mar 2021 • Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.

Ranked #96 on Math Word Problem Solving on MATH

no code implementations • ICLR 2021 • Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

By comprehensively evaluating the breadth and depth of a model's academic and professional understanding, our test can be used to analyze models across many tasks and to identify important shortcomings.

no code implementations • ICLR 2021 • Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

We show how to assess a language model’s knowledge of basic concepts of morality.

no code implementations • 1 Jan 2021 • Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer

Motivated by this, we introduce a new data augmentation method which advances the state-of-the-art and outperforms models pretrained with 1000x more labeled data.

2 code implementations • NeurIPS 2020 • Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin, aditi raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli

In this work, we propose a first-order dual SDP algorithm that (1) requires memory only linear in the total number of network activations, (2) only requires a fixed number of forward/backward passes through the network per iteration.

13 code implementations • 7 Sep 2020 • Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

By comprehensively evaluating the breadth and depth of a model's academic and professional understanding, our test can be used to analyze models across many tasks and to identify important shortcomings.

Ranked #74 on Multi-task Language Understanding on MMLU

no code implementations • 16 Aug 2020 • Charlie Snell, Ruiqi Zhong, Jacob Steinhardt, Dan Klein

If we ablate attention by fixing it to uniform, the output relevance still correlates with the attention of a normally trained model; but if we instead ablate output relevance, attention cannot be learned.

2 code implementations • 5 Aug 2020 • Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

We show how to assess a language model's knowledge of basic concepts of morality.

Ranked #1 on Average on hendrycks2020ethics

1 code implementation • ICCV 2021 • Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer

We find that using larger models and artificial data augmentations can improve robustness on real-world distribution shifts, contrary to claims in prior work.

Ranked #29 on Domain Generalization on ImageNet-R

no code implementations • 28 May 2020 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients".

1 code implementation • 19 May 2020 • Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry

We study ImageNet-v2, a replication of the ImageNet dataset on which models exhibit a significant (11-14%) drop in accuracy, even after controlling for a standard human-in-the-loop measure of data quality.

1 code implementation • ICML 2020 • Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, Yi Ma

We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network.

no code implementations • 21 Jan 2020 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We show that under TV corruptions, the breakdown point reduces to 1/4 for the same set of distributions.

3 code implementations • 25 Nov 2019 • Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song

We conduct extensive experiments in these more realistic settings for out-of-distribution detection and find that a surprisingly simple detector based on the maximum logit outperforms prior methods in all the large-scale multi-class, multi-label, and segmentation tasks, establishing a simple new baseline for future work.

no code implementations • 19 Sep 2019 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

This generalizes a property called resilience previously employed in the special case of mean estimation with outliers.

3 code implementations • 21 Aug 2019 • Max Kaufmann, Daniel Kang, Yi Sun, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks

To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a framework for evaluating model robustness against a range of unforeseen adversaries, including eighteen new non-L_p attacks.

3 code implementations • CVPR 2021 • Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, Dawn Song

We also curate an adversarial out-of-distribution detection dataset called ImageNet-O, which is the first out-of-distribution detection dataset created for ImageNet models.

Ranked #39 on Domain Generalization on ImageNet-A

no code implementations • 3 May 2019 • Daniel Kang, Yi Sun, Tom Brown, Dan Hendrycks, Jacob Steinhardt

We study the transfer of adversarial robustness of deep neural networks between different perturbation types.

2 code implementations • 13 Nov 2018 • Kensen Shi, Jacob Steinhardt, Percy Liang

We present FrAngel, a new approach to component-based synthesis that can synthesize short Java functions with control structures when given a desired signature, a set of input-output examples, and a collection of libraries (without formal specifications).

Programming Languages

2 code implementations • 2 Nov 2018 • Pang Wei Koh, Jacob Steinhardt, Percy Liang

In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition.

3 code implementations • NeurIPS 2018 • Aditi Raghunathan, Jacob Steinhardt, Percy Liang

One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family.

no code implementations • 9 Jul 2018 • Zachary C. Lipton, Jacob Steinhardt

Collectively, machine learning (ML) researchers are engaged in the creation and dissemination of knowledge about data-driven algorithms.

1 code implementation • 7 Mar 2018 • Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, Alistair Stewart

In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers.

no code implementations • 20 Feb 2018 • Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Filar, Hyrum Anderson, Heather Roff, Gregory C. Allen, Jacob Steinhardt, Carrick Flynn, Seán Ó hÉigeartaigh, Simon Beard, Haydn Belfield, Sebastian Farquhar, Clare Lyle, Rebecca Crootof, Owain Evans, Michael Page, Joanna Bryson, Roman Yampolskiy, Dario Amodei

This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats.

4 code implementations • ICLR 2018 • Aditi Raghunathan, Jacob Steinhardt, Percy Liang

While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs.

no code implementations • 20 Nov 2017 • Pravesh K. Kothari, Jacob Steinhardt

As an immediate corollary, for any $\gamma > 0$, we obtain an efficient algorithm for learning the means of a mixture of $k$ arbitrary \Poincare distributions in $\mathbb{R}^d$ in time $d^{O(1/\gamma)}$ so long as the means have separation $\Omega(k^{\gamma})$.

2 code implementations • NeurIPS 2017 • Jacob Steinhardt, Pang Wei Koh, Percy Liang

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model.

no code implementations • 17 Apr 2017 • Jacob Steinhardt

This matches the conjectured computational threshold for the classical planted clique problem, and thus raises the intriguing possibility that, once we require robustness, there is no computational-statistical gap for planted clique.

no code implementations • 15 Mar 2017 • Jacob Steinhardt, Moses Charikar, Gregory Valiant

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data.

no code implementations • 7 Nov 2016 • Moses Charikar, Jacob Steinhardt, Gregory Valiant

For example, given a dataset of $n$ points for which an unknown subset of $\alpha n$ points are drawn from a distribution of interest, and no assumptions are made about the remaining $(1-\alpha)n$ points, is it possible to return a list of $\operatorname{poly}(1/\alpha)$ answers, one of which is correct?

1 code implementation • 21 Jun 2016 • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society.

no code implementations • NeurIPS 2016 • Jacob Steinhardt, Percy Liang

We show how to estimate a model's test error from unlabeled data, on distributions very different from the training distribution, while assuming only that certain conditional independencies are preserved between train and test.

no code implementations • NeurIPS 2016 • Jacob Steinhardt, Gregory Valiant, Moses Charikar

We consider a crowdsourcing model in which $n$ workers are asked to rate the quality of $n$ items previously generated by other workers.

1 code implementation • NeurIPS 2015 • Jacob Steinhardt, Percy S. Liang

For weakly-supervised problems with deterministic constraints between the latent variables and observed output, learning necessitates performing inference over latent variables conditioned on the output, which can be intractable no matter how simple the model family is.

1 code implementation • 9 May 2015 • Tianlin Shi, Jacob Steinhardt, Percy Liang

In structured prediction, most inference algorithms allocate a homogeneous amount of computation to all parts of the output, which can be wasteful when different parts vary widely in terms of difficulty.

1 code implementation • 24 Feb 2015 • Jacob Steinhardt, Percy Liang

Markov Chain Monte Carlo (MCMC) algorithms are often used for approximate inference inside learning, but their slow mixing can be difficult to diagnose and the approximations can seriously degrade learning.

1 code implementation • 24 Feb 2015 • Jacob Steinhardt, Percy Liang

A classic tension exists between exact inference in a simple model and approximate inference in a complex model.

no code implementations • 13 Dec 2014 • Jacob Steinhardt, Stefan Wager, Percy Liang

We present a sparse analogue to stochastic gradient descent that is guaranteed to perform well under similar conditions to the lasso.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.