Search Results for author: Jacob Steinhardt

Found 79 papers, 47 papers with code

Statistical Bias in Dataset Replication

no code implementations • ICML 2020 • Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry

Dataset replication is a useful tool for assessing whether models have overfit to a specific validation set or the exact circumstances under which it was generated.

Paper
Add Code

Approaching Human-Level Forecasting with Language Models

no code implementations • 28 Feb 2024 • Danny Halawi, Fred Zhang, Chen Yueh-Han, Jacob Steinhardt

In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters.

Decision Making Retrieval

Paper
Add Code

Feedback Loops With Language Models Drive In-Context Reward Hacking

1 code implementation • 9 Feb 2024 • Alexander Pan, Erik Jones, Meena Jagadeesan, Jacob Steinhardt

Language models influence the external world: they query APIs that read and write to web pages, generate content that shapes human behavior, and run system commands as autonomous agents.

Paper
Code

Describing Differences in Image Sets with Natural Language

1 code implementation • 5 Dec 2023 • Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy

To aid in this discovery process, we explore the task of automatically describing the differences between two $\textbf{sets}$ of images, which we term Set Difference Captioning.

Language Modelling

Paper
Code

How do Language Models Bind Entities in Context?

no code implementations • 26 Oct 2023 • Jiahai Feng, Jacob Steinhardt

To correctly use in-context information, language models (LMs) must bind entities to their attributes.

Paper
Add Code

Interpreting CLIP's Image Representation via Text-Based Decomposition

1 code implementation • 9 Oct 2023 • Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

We decompose the image representation as a sum across individual image patches, model layers, and attention heads, and use CLIP's text representation to interpret the summands.

101

Paper
Code

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

1 code implementation • 18 Jul 2023 • Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt

The first phenomenon, overthinking, appears when we decode predictions from intermediate layers, given correct vs. incorrect few-shot demonstrations.

Few-Shot Learning

Paper
Code

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

no code implementations • 17 Jul 2023 • Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown

To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input.

counterfactual

Paper
Add Code

Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations

no code implementations • 29 Jun 2023 • Yongyi Yang, Jacob Steinhardt, Wei Hu

This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution.

Paper
Add Code

Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition

1 code implementation • NeurIPS 2023 • Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt, Nika Haghtalab

As the scale of machine learning models increases, trends such as scaling laws anticipate consistent downstream improvements in predictive accuracy.

Paper
Code

Mass-Producing Failures of Multimodal Systems with Language Models

1 code implementation • NeurIPS 2023 • Shengbang Tong, Erik Jones, Jacob Steinhardt

Because CLIP is the backbone for most state-of-the-art multimodal systems, these inputs produce failures in Midjourney 5. 1, DALL-E, VideoFusion, and others.

Language Modelling Self-Driving Cars

Paper
Code

Incentivizing High-Quality Content in Online Recommender Systems

no code implementations • 13 Jun 2023 • Xinyan Hu, Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt

For content recommender systems such as TikTok and YouTube, the platform's decision algorithm shapes the incentives of content producers, including how much effort the content producers invest in the quality of their content.

Recommendation Systems

Paper
Add Code

Eliciting Latent Predictions from Transformers with the Tuned Lens

2 code implementations • 14 Mar 2023 • Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer.

Language Modelling

890

Paper
Code

Automatically Auditing Large Language Models via Discrete Optimization

1 code implementation • 8 Mar 2023 • Erik Jones, Anca Dragan, aditi raghunathan, Jacob Steinhardt

Auditing large language models for unexpected behaviors is critical to preempt catastrophic deployments, yet remains challenging.

Paper
Code

Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws

no code implementations • 23 Feb 2023 • Kush Bhatia, Wenshuo Guo, Jacob Steinhardt

We specifically show that the well-studied problem of Gaussian process (GP) bandit optimization is a special case of our framework, and that our bounds either improve or are competitive with known regret guarantees for the Mat\'ern kernel.

Paper
Add Code

Progress measures for grokking via mechanistic interpretability

1 code implementation • 12 Jan 2023 • Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt

Based on this understanding, we define progress measures that allow us to study the dynamics of training and split training into three continuous phases: memorization, circuit formation, and cleanup.

Memorization

Paper
Code

Discovering Latent Knowledge in Language Models Without Supervision

1 code implementation • 7 Dec 2022 • Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt

Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect.

Imitation Learning Language Modelling +2

231

Paper
Code

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

3 code implementations • 1 Nov 2022 • Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt

Research in mechanistic interpretability seeks to explain behaviors of machine learning models in terms of their internal components.

Language Modelling

3,703

Paper
Code

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

1 code implementation • 18 Oct 2022 • Mantas Mazeika, Eric Tang, Andy Zou, Steven Basart, Jun Shern Chan, Dawn Song, David Forsyth, Jacob Steinhardt, Dan Hendrycks

In experiments, we show how video models that are primarily trained to recognize actions and find contours of objects can be repurposed to understand human preferences and the emotional content of videos.

Video Understanding

Paper
Code

Forecasting Future World Events with Neural Networks

1 code implementation • 30 Jun 2022 • Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks

We test language models on our forecasting task and find that performance is far below a human expert baseline.

Decision Making Language Modelling

173

Paper
Code

Auditing Visualizations: Transparency Methods Struggle to Detect Anomalous Behavior

1 code implementation • 27 Jun 2022 • Jean-Stanislas Denain, Jacob Steinhardt

Model visualizations provide information that outputs alone might miss.

Paper
Code

Supply-Side Equilibria in Recommender Systems

1 code implementation • NeurIPS 2023 • Meena Jagadeesan, Nikhil Garg, Jacob Steinhardt

Producers seek to create content that will be shown by the recommendation algorithm, which can impact both the diversity and quality of their content.

Recommendation Systems

Paper
Code

More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize

no code implementations • 11 Mar 2022 • Alexander Wei, Wei Hu, Jacob Steinhardt

On the other hand, we find that the classical GCV estimator (Craven and Wahba, 1978) accurately predicts generalization risk even in such overparameterized settings.

regression

Paper
Add Code

Capturing Failures of Large Language Models via Human Cognitive Biases

no code implementations • 24 Feb 2022 • Erik Jones, Jacob Steinhardt

Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code.

Code Generation

Paper
Add Code

Predicting Out-of-Distribution Error with the Projection Norm

1 code implementation • 11 Feb 2022 • Yaodong Yu, Zitong Yang, Alexander Wei, Yi Ma, Jacob Steinhardt

Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels.

Pseudo Label text-classification +1

Paper
Code

Describing Differences between Text Distributions with Natural Language

1 code implementation • 28 Jan 2022 • Ruiqi Zhong, Charlie Snell, Dan Klein, Jacob Steinhardt

We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier.

Binary Classification Re-Ranking

Paper
Code

The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models

1 code implementation • ICLR 2022 • Alexander Pan, Kush Bhatia, Jacob Steinhardt

Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied.

Anomaly Detection

Paper
Code

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

2 code implementations • CVPR 2022 • Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, Jacob Steinhardt

In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy.

Adversarial Robustness Anomaly Detection +1

Paper
Code

The Effect of Model Size on Worst-Group Generalization

no code implementations • 8 Dec 2021 • Alan Pham, Eunice Chan, Vikranth Srivatsa, Dhruba Ghosh, Yaoqing Yang, Yaodong Yu, Ruiqi Zhong, Joseph E. Gonzalez, Jacob Steinhardt

Overparameterization is shown to result in poor test accuracy on rare subgroups under a variety of settings where subgroup information is known.

Paper
Add Code

Grounding Representation Similarity Through Statistical Testing

no code implementations • NeurIPS 2021 • Frances Ding, Jean-Stanislas Denain, Jacob Steinhardt

To understand neural network behavior, recent works quantitatively compare different networks' learned representations using canonical correlation analysis (CCA), centered kernel alignment (CKA), and other dissimilarity measures.

Specificity

Paper
Add Code

What Would Jiminy Cricket Do? Towards Agents That Behave Morally

1 code implementation • 25 Oct 2021 • Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt

When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong.

Paper
Code

Improving and Assessing Anomaly Detectors for Large-Scale Settings

no code implementations • 29 Sep 2021 • Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joseph Kwon, Mohammadreza Mostajabi, Jacob Steinhardt

We conduct extensive experiments in these more realistic settings for out-of-distribution detection and find that a surprisingly simple detector based on the maximum logit outperforms prior methods in all the large-scale multi-class, multi-label, and segmentation tasks, establishing a simple new baseline for future work.

Out-of-Distribution Detection Segmentation +1

Paper
Add Code

Unsolved Problems in ML Safety

no code implementations • 28 Sep 2021 • Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings.

Paper
Add Code

Learning Equilibria in Matching Markets from Bandit Feedback

no code implementations • NeurIPS 2021 • Meena Jagadeesan, Alexander Wei, Yixin Wang, Michael I. Jordan, Jacob Steinhardt

Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data.

Paper
Add Code

Grounding Representation Similarity with Statistical Testing

3 code implementations • 3 Aug 2021 • Frances Ding, Jean-Stanislas Denain, Jacob Steinhardt

Specificity

Paper
Code

Measuring Coding Challenge Competence With APPS

3 code implementations • 20 May 2021 • Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt

Recent models such as GPT-Neo can pass approximately 20% of the test cases of introductory problems, so we find that machine learning models are now beginning to learn how to code.

Ranked #8 on Code Generation on APPS

BIG-bench Machine Learning Code Generation

3,291

Paper
Code

Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level

1 code implementation • Findings (ACL) 2021 • Ruiqi Zhong, Dhruba Ghosh, Dan Klein, Jacob Steinhardt

We develop statistically rigorous methods to address this, and after accounting for pretraining and finetuning noise, we find that our BERT-Large is worse than BERT-Mini on at least 1-4% of instances across MNLI, SST-2, and QQP, compared to the overall accuracy improvement of 2-10%.

QQP SST-2

Paper
Code

Agnostic learning with unknown utilities

no code implementations • 17 Apr 2021 • Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt

This raises an interesting question whether learning is even possible in our setup, given that obtaining a generalizable estimate of utility $u^*$ might not be possible from finitely many samples.

Paper
Add Code

Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition

1 code implementation • 17 Mar 2021 • Yaodong Yu, Zitong Yang, Edgar Dobriban, Jacob Steinhardt, Yi Ma

To investigate this gap, we decompose the test risk into its bias and variance components and study their behavior as a function of adversarial training perturbation radii ($\varepsilon$).

Paper
Code

Approximating How Single Head Attention Learns

1 code implementation • 13 Mar 2021 • Charlie Snell, Ruiqi Zhong, Dan Klein, Jacob Steinhardt

Our approximation explains why models sometimes attend to salient words, and inspires a toy example where a multi-head attention model can overcome the above hard training distribution by improving learning dynamics rather than expressiveness.

Paper
Code

Limitations of Post-Hoc Feature Alignment for Robustness

1 code implementation • CVPR 2021 • Collin Burns, Jacob Steinhardt

Feature alignment is an approach to improving robustness to distribution shift that matches the distribution of feature activations between the training distribution and test distribution.

Unsupervised Domain Adaptation

Paper
Code

Measuring Mathematical Problem Solving With the MATH Dataset

4 code implementations • 5 Mar 2021 • Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.

Ranked #95 on Math Word Problem Solving on MATH

Math Math Word Problem Solving +1

711

Paper
Code

A Rigorous Evaluation of Real-World Distribution Shifts

no code implementations • 1 Jan 2021 • Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer

Motivated by this, we introduce a new data augmentation method which advances the state-of-the-art and outperforms models pretrained with 1000x more labeled data.

Data Augmentation

Paper
Add Code

Towards Machine Ethics with Language Models

no code implementations • ICLR 2021 • Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

We show how to assess a language model’s knowledge of basic concepts of morality.

Ethics World Knowledge

Paper
Add Code

How Multipurpose Are Language Models?

no code implementations • ICLR 2021 • Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

By comprehensively evaluating the breadth and depth of a model's academic and professional understanding, our test can be used to analyze models across many tasks and to identify important shortcomings.

Elementary Mathematics World Knowledge

Paper
Add Code

Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming

2 code implementations • NeurIPS 2020 • Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin, aditi raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli

In this work, we propose a first-order dual SDP algorithm that (1) requires memory only linear in the total number of network activations, (2) only requires a fixed number of forward/backward passes through the network per iteration.

306

Paper
Code

Measuring Massive Multitask Language Understanding

12 code implementations • 7 Sep 2020 • Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

Ranked #60 on Multi-task Language Understanding on MMLU

Elementary Mathematics Multi-task Language Understanding +1

5,634

Paper
Code

Understanding Attention Training via Output Relevance

no code implementations • 16 Aug 2020 • Charlie Snell, Ruiqi Zhong, Jacob Steinhardt, Dan Klein

If we ablate attention by fixing it to uniform, the output relevance still correlates with the attention of a normally trained model; but if we instead ablate output relevance, attention cannot be learned.

Translation

Paper
Add Code

Aligning AI With Shared Human Values

2 code implementations • 5 Aug 2020 • Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

We show how to assess a language model's knowledge of basic concepts of morality.

Ranked #1 on Average on hendrycks2020ethics

Ethics reinforcement-learning +2

935

Paper
Code

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

1 code implementation • ICCV 2021 • Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer

We find that using larger models and artificial data augmentations can improve robustness on real-world distribution shifts, contrary to claims in prior work.

Ranked #29 on Domain Generalization on ImageNet-R

Data Augmentation Domain Generalization +1

239

Paper
Code

Robust estimation via generalized quasi-gradients

no code implementations • 28 May 2020 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients".

regression

Paper
Add Code

Identifying Statistical Bias in Dataset Replication

1 code implementation • 19 May 2020 • Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry

We study ImageNet-v2, a replication of the ImageNet dataset on which models exhibit a significant (11-14%) drop in accuracy, even after controlling for a standard human-in-the-loop measure of data quality.

Paper
Code

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

1 code implementation • ICML 2020 • Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, Yi Ma

We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network.

Paper
Code

When does the Tukey median work?

no code implementations • 21 Jan 2020 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

We show that under TV corruptions, the breakdown point reduces to 1/4 for the same set of distributions.

Paper
Add Code

Scaling Out-of-Distribution Detection for Real-World Settings

3 code implementations • 25 Nov 2019 • Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song

Out-of-Distribution Detection Segmentation +2

152

Paper
Code

Generalized Resilience and Robust Statistics

no code implementations • 19 Sep 2019 • Banghua Zhu, Jiantao Jiao, Jacob Steinhardt

This generalizes a property called resilience previously employed in the special case of mean estimation with outliers.

Paper
Add Code

Testing Robustness Against Unforeseen Adversaries

3 code implementations • 21 Aug 2019 • Max Kaufmann, Daniel Kang, Yi Sun, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks

To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a framework for evaluating model robustness against a range of unforeseen adversaries, including eighteen new non-L_p attacks.

Adversarial Defense Adversarial Robustness

Paper
Code

Natural Adversarial Examples

3 code implementations • CVPR 2021 • Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, Dawn Song

We also curate an adversarial out-of-distribution detection dataset called ImageNet-O, which is the first out-of-distribution detection dataset created for ImageNet models.

Ranked #39 on Domain Generalization on ImageNet-A

Adversarial Attack Data Augmentation +2

571

Paper
Code

Transfer of Adversarial Robustness Between Perturbation Types

no code implementations • 3 May 2019 • Daniel Kang, Yi Sun, Tom Brown, Dan Hendrycks, Jacob Steinhardt

We study the transfer of adversarial robustness of deep neural networks between different perturbation types.

Adversarial Robustness

Paper
Add Code

FrAngel: Component-Based Synthesis with Control Structures

2 code implementations • 13 Nov 2018 • Kensen Shi, Jacob Steinhardt, Percy Liang

We present FrAngel, a new approach to component-based synthesis that can synthesize short Java functions with control structures when given a desired signature, a set of input-output examples, and a collection of libraries (without formal specifications).

Programming Languages

Paper
Code

Semidefinite relaxations for certifying robustness to adversarial examples

3 code implementations • NeurIPS 2018 • Aditi Raghunathan, Jacob Steinhardt, Percy Liang

One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family.

Paper
Code

Stronger Data Poisoning Attacks Break Data Sanitization Defenses

2 code implementations • 2 Nov 2018 • Pang Wei Koh, Jacob Steinhardt, Percy Liang

In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition.

Data Poisoning Sentiment Analysis +2

Paper
Code

Troubling Trends in Machine Learning Scholarship

no code implementations • 9 Jul 2018 • Zachary C. Lipton, Jacob Steinhardt

Collectively, machine learning (ML) researchers are engaged in the creation and dissemination of knowledge about data-driven algorithms.

BIG-bench Machine Learning

Paper
Add Code

Sever: A Robust Meta-Algorithm for Stochastic Optimization

1 code implementation • 7 Mar 2018 • Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, Alistair Stewart

In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers.

Stochastic Optimization

Paper
Code

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

no code implementations • 20 Feb 2018 • Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Filar, Hyrum Anderson, Heather Roff, Gregory C. Allen, Jacob Steinhardt, Carrick Flynn, Seán Ó hÉigeartaigh, Simon Beard, Haydn Belfield, Sebastian Farquhar, Clare Lyle, Rebecca Crootof, Owain Evans, Michael Page, Joanna Bryson, Roman Yampolskiy, Dario Amodei

This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats.

Paper
Add Code

Certified Defenses against Adversarial Examples

4 code implementations • ICLR 2018 • Aditi Raghunathan, Jacob Steinhardt, Percy Liang

While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs.

Adversarial Attack Adversarial Defense +1

113

Paper
Code

Better Agnostic Clustering Via Relaxed Tensor Norms

no code implementations • 20 Nov 2017 • Pravesh K. Kothari, Jacob Steinhardt

As an immediate corollary, for any $\gamma > 0$, we obtain an efficient algorithm for learning the means of a mixture of $k$ arbitrary \Poincare distributions in $\mathbb{R}^d$ in time $d^{O(1/\gamma)}$ so long as the means have separation $\Omega(k^{\gamma})$.

Clustering

Paper
Add Code

Certified Defenses for Data Poisoning Attacks

2 code implementations • NeurIPS 2017 • Jacob Steinhardt, Pang Wei Koh, Percy Liang

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model.

Data Poisoning

Paper
Code

Does robustness imply tractability? A lower bound for planted clique in the semi-random model

no code implementations • 17 Apr 2017 • Jacob Steinhardt

This matches the conjectured computational threshold for the classical planted clique problem, and thus raises the intriguing possibility that, once we require robustness, there is no computational-statistical gap for planted clique.

Paper
Add Code

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

no code implementations • 15 Mar 2017 • Jacob Steinhardt, Moses Charikar, Gregory Valiant

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data.

Paper
Add Code

Learning from Untrusted Data

no code implementations • 7 Nov 2016 • Moses Charikar, Jacob Steinhardt, Gregory Valiant

For example, given a dataset of $n$ points for which an unknown subset of $\alpha n$ points are drawn from a distribution of interest, and no assumptions are made about the remaining $(1-\alpha)n$ points, is it possible to return a list of $\operatorname{poly}(1/\alpha)$ answers, one of which is correct?

Stochastic Optimization

Paper
Add Code

Concrete Problems in AI Safety

1 code implementation • 21 Jun 2016 • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society.

BIG-bench Machine Learning Safe Exploration

Paper
Code

Unsupervised Risk Estimation Using Only Conditional Independence Structure

no code implementations • NeurIPS 2016 • Jacob Steinhardt, Percy Liang

We show how to estimate a model's test error from unlabeled data, on distributions very different from the training distribution, while assuming only that certain conditional independencies are preserved between train and test.

Paper
Add Code

Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction

no code implementations • NeurIPS 2016 • Jacob Steinhardt, Gregory Valiant, Moses Charikar

We consider a crowdsourcing model in which $n$ workers are asked to rate the quality of $n$ items previously generated by other workers.

Paper
Add Code

Learning with Relaxed Supervision

1 code implementation • NeurIPS 2015 • Jacob Steinhardt, Percy S. Liang

For weakly-supervised problems with deterministic constraints between the latent variables and observed output, learning necessitates performing inference over latent variables conditioned on the output, which can be intractable no matter how simple the model family is.

valid

Paper
Code

Learning Where to Sample in Structured Prediction

1 code implementation • 9 May 2015 • Tianlin Shi, Jacob Steinhardt, Percy Liang

In structured prediction, most inference algorithms allocate a homogeneous amount of computation to all parts of the output, which can be wasteful when different parts vary widely in terms of difficulty.

Reinforcement Learning (RL) Structured Prediction

Paper
Code

Reified Context Models

1 code implementation • 24 Feb 2015 • Jacob Steinhardt, Percy Liang

A classic tension exists between exact inference in a simple model and approximate inference in a complex model.

Paper
Code

Learning Fast-Mixing Models for Structured Prediction

1 code implementation • 24 Feb 2015 • Jacob Steinhardt, Percy Liang

Markov Chain Monte Carlo (MCMC) algorithms are often used for approximate inference inside learning, but their slow mixing can be difficult to diagnose and the approximations can seriously degrade learning.

Structured Prediction

Paper
Code

The Statistics of Streaming Sparse Regression

no code implementations • 13 Dec 2014 • Jacob Steinhardt, Stefan Wager, Percy Liang

We present a sparse analogue to stochastic gradient descent that is guaranteed to perform well under similar conditions to the lasso.

regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.