Search Results for author: Dario Amodei

Found 30 papers, 21 papers with code

In-context Learning and Induction Heads

no code implementations24 Sep 2022 Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah

In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i. e. decreasing loss at increasing token indices).

In-Context Learning

Toy Models of Superposition

1 code implementation21 Sep 2022 Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, Christopher Olah

Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging.

Learning to summarize with human feedback

1 code implementation NeurIPS 2020 Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul F. Christiano

We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning.

Learning to summarize from human feedback

1 code implementation2 Sep 2020 Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning.

Fine-Tuning Language Models from Human Preferences

7 code implementations18 Sep 2019 Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, Geoffrey Irving

Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and safe for real-world tasks.

Descriptive Language Modelling +1

Language Models are Unsupervised Multitask Learners

17 code implementations Preprint 2019 Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets.

 Ranked #1 on Language Modelling on enwik8 (using extra training data)

Common Sense Reasoning Coreference Resolution +10

An Empirical Model of Large-Batch Training

11 code implementations14 Dec 2018 Sam McCandlish, Jared Kaplan, Dario Amodei, OpenAI Dota Team

In an increasing number of domains it has been demonstrated that deep learning models can be trained using relatively large batch sizes without sacrificing data efficiency.

Dota 2

Supervising strong learners by amplifying weak experts

3 code implementations19 Oct 2018 Paul Christiano, Buck Shlegeris, Dario Amodei

Many real world learning tasks involve complex or hard-to-specify objectives, and using an easier-to-specify proxy can lead to poor performance or misaligned behavior.

Variational Option Discovery Algorithms

no code implementations26 Jul 2018 Joshua Achiam, Harrison Edwards, Dario Amodei, Pieter Abbeel

We explore methods for option discovery based on variational inference and make two algorithmic contributions.

Decoder Variational Inference

AI safety via debate

2 code implementations2 May 2018 Geoffrey Irving, Paul Christiano, Dario Amodei

To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences.

Deep reinforcement learning from human preferences

6 code implementations NeurIPS 2017 Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems.

Atari Games reinforcement-learning +1

Learning a Natural Language Interface with Neural Programmer

2 code implementations28 Nov 2016 Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei

The main experimental result in this paper is that a single Neural Programmer model achieves 34. 2% accuracy using only 10, 000 examples with weak supervision.

Natural Language Queries Program induction +1

Concrete Problems in AI Safety

1 code implementation21 Jun 2016 Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society.

BIG-bench Machine Learning Safe Exploration

Physical Principles for Scalable Neural Recording

no code implementations24 Jun 2013 Adam H. Marblestone, Bradley M. Zamft, Yael G. Maguire, Mikhail G. Shapiro, Thaddeus R. Cybulski, Joshua I. Glaser, Dario Amodei, P. Benjamin Stranges, Reza Kalhor, David A. Dalrymple, Dongjin Seo, Elad Alon, Michel M. Maharbiz, Jose M. Carmena, Jan M. Rabaey, Edward S. Boyden, George M. Church, Konrad P. Kording

Simultaneously measuring the activities of all neurons in a mammalian brain at millisecond resolution is a challenge beyond the limits of existing techniques in neuroscience.

Neurons and Cognition Biological Physics

Cannot find the paper you are looking for? You can Submit a new open access paper.