Search Results for author: Chris Olah

Found 18 papers, 10 papers with code

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

4 code implementations • 14 Mar 2016 • Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, Xiaoqiang Zheng

TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms.

BIG-bench Machine Learning Clustering +4

182,281

Paper
Code

Concrete Problems in AI Safety

1 code implementation • 21 Jun 2016 • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society.

BIG-bench Machine Learning Safe Exploration

Paper
Code

The Building Blocks of Interpretability

1 code implementation • Distill 2018 • Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye, Alexander Mordvintsev

In this article, we treat existing interpretability methods as fundamental and composable building blocks for rich user interfaces.

4,612

Paper
Code

Differentiable Image Parameterizations

2 code implementations • Distill 2018 • Alexander Mordvintsev, Nicola Pezzotti, Ludwig Schubert, Chris Olah

Typically, we parameterize the input image as the RGB values of each pixel, but that isn’t the only way.

Image Generation

4,612

Paper
Code

Activation Atlas

1 code implementation • Distill 2019 • Shan Carter, Zan Armstrong, Ludwig Schubert, Ian Johnson, Chris Olah

By using feature inversion to visualize millions of activations from an image classification network, we create an explorable activation atlas of features the network has learned which can reveal how the network typically represents some concepts.

General Classification Image Classification

4,612

Paper
Code

Feature Visualization

1 code implementation • Distill 2020 • Chris Olah, Alexander Mordvintsev, Ludwig Schubert

There is a growing sense that neural networks need to be interpretable to humans.

4,612

Paper
Code

Thread: Circuits

no code implementations • Distill 2020 • Nick Cammarata, Shan Carter, Gabriel Goh, Chris Olah, Michael Petrov, Ludwig Schubert, Chelsea Voss, Ben Egan, Swee Kiat Lim

To facilitate exploration of this direction, Distill is inviting a “thread” of short articles on circuits, interspersed with critical commentary by experts in adjacent fields.

Paper
Add Code

Curve Detectors

no code implementations • Distill 2020 • Nick Cammarata, Gabriel Goh, Shan Carter, Ludwig Schubert, Michael Petrov, Chris Olah

Every vision model we've explored in detail contains neurons which detect curves.

Paper
Add Code

High-Low Frequency Detectors

no code implementations • Distill 2021 • Ludwig Schubert, Chelsea Voss, Nick Cammarata, Gabriel Goh, Chris Olah

Yet, when systematically characterizing the early layers of InceptionV1, we found a full fifteen neurons of mixed3a that appear to detect a high frequency pattern on one side, and a low frequency pattern on the other.

Vocal Bursts Intensity Prediction

Paper
Add Code

Visualizing Weights

no code implementations • Distill 2021 • Chelsea Voss, Nick Cammarata, Gabriel Goh, Michael Petrov, Ludwig Schubert, Ben Egan, Swee Kiat Lim, Chris Olah

Trying to understand artificial neural networks also has a lot in common with neuroscience, which tries to understand biological neural networks.

Paper
Add Code

Weight Banding

no code implementations • Distill 2021 • Michael Petrov, Chelsea Voss, Ludwig Schubert, Nick Cammarata, Gabriel Goh, Chris Olah

Open up any ImageNet conv net and look at the weights in the last layer.

Paper
Add Code

Multimodal Neurons in Artificial Neural Networks

1 code implementation • Distill 2021 • Gabriel Goh, Nick Cammarata, Chelsea Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, Chris Olah

It’s the fact that you plug visual information into the rich tapestry of memory that brings it to life."

301

Paper
Code

A General Language Assistant as a Laboratory for Alignment

1 code implementation • 1 Dec 2021 • Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Jared Kaplan

We find that the benefits from modest interventions increase with model size, generalize to a variety of alignment evaluations, and do not compromise the performance of large models.

Imitation Learning

3,951

Paper
Code

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

3 code implementations • 12 Apr 2022 • Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Ben Mann, Jared Kaplan

We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants.

Code Generation Out of Distribution (OOD) Detection +2

1,433

Paper
Code

Scaling Laws and Interpretability of Learning from Repeated Data

no code implementations • 21 May 2022 • Danny Hernandez, Tom Brown, Tom Conerly, Nova DasSarma, Dawn Drain, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Tom Henighan, Tristan Hume, Scott Johnston, Ben Mann, Chris Olah, Catherine Olsson, Dario Amodei, Nicholas Joseph, Jared Kaplan, Sam McCandlish

To do this, we train a family of models where most of the data is unique but a small fraction of it is repeated many times.

Memorization Sentence

Paper
Add Code

Language Models (Mostly) Know What They Know

no code implementations • 11 Jul 2022 • Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, Jared Kaplan

We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly.

Multiple-choice

Paper
Add Code

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

2 code implementations • 23 Aug 2022 • Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, Jack Clark

We provide our own analysis of the data and find a variety of harmful outputs, which range from offensive language to more subtly harmful non-violent unethical outputs.

Language Modelling

1,433

Paper
Code

In-context Learning and Induction Heads

no code implementations • 24 Sep 2022 • Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah

In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i. e. decreasing loss at increasing token indices).

In-Context Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.