no code implementations • ICML 2020 • Heewoo Jun, Rewon Child, Mark Chen, John Schulman, Aditya Ramesh, Alec Radford, Ilya Sutskever
We present conditional augmentation (CondAugment), a simple and powerful method of regularizing generative models.
2 code implementations • Preprint 2023 • Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe
We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.
4 code implementations • 2 Mar 2023 • Yang song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever
Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3. 55 on CIFAR-10 and 6. 20 on ImageNet 64x64 for one-step generation.
Ranked #9 on
Image Generation
on ImageNet 64x64
6 code implementations • Preprint 2022 • Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
Ranked #1 on
Speech Recognition
on Common Voice English
(using extra training data)
1 code implementation • 3 Feb 2022 • Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, Ilya Sutskever
We explore the use of expert iteration in the context of language modeling applied to formal mathematics.
Ranked #3 on
Automated Theorem Proving
on miniF2F-test
(using extra training data)
2 code implementations • 20 Dec 2021 • Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen
Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity.
Ranked #32 on
Text-to-Image Generation
on COCO
(using extra training data)
no code implementations • 11 Oct 2021 • Jesse Michael Han, Igor Babuschkin, Harrison Edwards, Arvind Neelakantan, Tao Xu, Stanislas Polu, Alex Ray, Pranav Shyam, Aditya Ramesh, Alec Radford, Ilya Sutskever
We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models.
13 code implementations • 7 Jul 2021 • Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities.
Ranked #1 on
Multi-task Language Understanding
on BBH-alg
46 code implementations • 26 Feb 2021 • Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories.
Ranked #1 on
Zero-Shot Learning
on COCO-MLT
(using extra training data)
11 code implementations • 24 Feb 2021 • Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset.
Ranked #44 on
Text-to-Image Generation
on COCO
(using extra training data)
no code implementations • 7 Sep 2020 • Stanislas Polu, Ilya Sutskever
We explore the application of transformer-based language models to automated theorem proving.
Ranked #1 on
Automated Theorem Proving
on Metamath set.mm
4 code implementations • ICML 2020 • Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever
Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images.
Ranked #16 on
Image Classification
on STL-10
(using extra training data)
Representation Learning
Self-Supervised Image Classification
40 code implementations • NeurIPS 2020 • Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei
By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do.
12 code implementations • Preprint 2020 • Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever
We introduce Jukebox, a model that generates music with singing in the raw audio domain.
1 code implementation • 13 Dec 2019 • Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang
On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game.
3 code implementations • ICLR 2020 • Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever
We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.
6 code implementations • Preprint 2019 • Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever
Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length.
15 code implementations • Preprint 2019 • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets.
Ranked #1 on
Language Modelling
on enwik8
(using extra training data)
no code implementations • NeurIPS 2018 • Bradly Stadie, Ge Yang, Rein Houthooft, Peter Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever
Results are presented on a new environment we call `Krazy World': a difficult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning.
7 code implementations • ICLR 2019 • Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud
The result is a continuous-time invertible generative model with unbiased density estimation and one-pass sampling, while allowing unrestricted neural network architectures.
Ranked #1 on
Density Estimation
on CIFAR-10
(NLL metric)
11 code implementations • Preprint 2018 • Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever
We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task.
Ranked #3 on
Natural Language Inference
on SciTail
1 code implementation • ICLR 2019 • Daniel Huang, Prafulla Dhariwal, Dawn Song, Ilya Sutskever
In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant.
7 code implementations • ICLR 2018 • Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever
We consider the problem of exploration in meta reinforcement learning.
no code implementations • ICLR 2018 • Dustin Tran, Yura Burda, Ilya Sutskever
We examine how learning from unaligned data can improve both the data efficiency of supervised tasks as well as enable alignments without any supervision.
2 code implementations • ICLR 2018 • Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch
In this paper, we point out that a competitive multi-agent environment trained with self-play can produce behaviors that are far more complex than the environment itself.
1 code implementation • ICLR 2018 • Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, Pieter Abbeel
Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence.
no code implementations • 16 Jun 2017 • Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly
This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.
2 code implementations • ICLR 2018 • Alec Radford, Rafal Jozefowicz, Ilya Sutskever
We explore the properties of byte-level recurrent language models.
Ranked #9 on
Subjectivity Analysis
on SUBJ
no code implementations • NeurIPS 2017 • Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba
A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration.
23 code implementations • 10 Mar 2017 • Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever
We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients.
Ranked #1 on
Atari Games
on Atari 2600 Pong
1 code implementation • 6 Mar 2017 • Bradly C. Stadie, Pieter Abbeel, Ilya Sutskever
A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize.
1 code implementation • NeurIPS 2016 • Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio
However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.
2 code implementations • NeurIPS 2016 • Durk P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables.
Ranked #40 on
Image Generation
on CIFAR-10
(bits/dimension metric)
18 code implementations • 9 Nov 2016 • Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel
The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP.
no code implementations • 8 Nov 2016 • Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel
Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification.
1 code implementation • 2 Nov 2016 • Eric Price, Wojciech Zaremba, Ilya Sutskever
We find that these techniques increase the set of algorithmic problems that can be solved by the Neural GPU: we have been able to learn to perform all the arithmetic operations (and generalize to arbitrarily long numbers) when the arguments are given in the decimal representation (which, surprisingly, has not been possible before).
no code implementations • 3 Aug 2016 • Yuping Luo, Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever
Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition.
8 code implementations • 15 Jun 2016 • Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables.
37 code implementations • NeurIPS 2016 • Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner.
Ranked #3 on
Image Generation
on Stanford Dogs
4 code implementations • 14 Mar 2016 • Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, Xiaoqiang Zheng
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms.
8 code implementations • 2 Mar 2016 • Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine
In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks.
5 code implementations • 25 Nov 2015 • Łukasz Kaiser, Ilya Sutskever
Unlike the NTM, the Neural GPU is highly parallel which makes it easier to train and efficient to run.
4 code implementations • 21 Nov 2015 • Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens
This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks.
no code implementations • 19 Nov 2015 • Ilya Sutskever, Rafal Jozefowicz, Karol Gregor, Danilo Rezende, Tim Lillicrap, Oriol Vinyals
Supervised learning is successful because it can be solved by the minimization of the training error cost function.
no code implementations • 19 Nov 2015 • Karol Kurach, Marcin Andrychowicz, Ilya Sutskever
In this paper, we propose and investigate a new neural network architecture called Neural Random Access Machine.
no code implementations • 19 Nov 2015 • Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser
This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder can be shared, as in the case of translation and image caption generation, and (c) the many-to-many setting - where multiple encoders and decoders are shared, which is the case with unsupervised objectives and translation.
2 code implementations • 16 Nov 2015 • Shixiang Gu, Sergey Levine, Ilya Sutskever, andriy mnih
Deep neural networks are powerful parametric models that can be trained efficiently using the backpropagation algorithm.
no code implementations • 16 Nov 2015 • Arvind Neelakantan, Quoc V. Le, Ilya Sutskever
In this work, we propose Neural Programmer, an end-to-end differentiable neural network augmented with a small set of basic arithmetic and logic operations.
no code implementations • 16 Nov 2015 • Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio
However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.
1 code implementation • 4 May 2015 • Wojciech Zaremba, Ilya Sutskever
The capabilities of a model can be extended by providing it with proper Interfaces that interact with the world.
8 code implementations • NeurIPS 2015 • Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton
Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades.
Ranked #23 on
Constituency Parsing
on Penn Treebank
1 code implementation • 20 Dec 2014 • Chris J. Maddison, Aja Huang, Ilya Sutskever, David Silver
The game of Go is more challenging than other board games, due to the difficulty of constructing a position or move evaluation function.
5 code implementations • IJCNLP 2015 • Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba
Our experiments on the WMT14 English to French translation task show that this method provides a substantial improvement of up to 2. 8 BLEU points over an equivalent NMT system that does not use this technique.
Ranked #40 on
Machine Translation
on WMT2014 English-French
6 code implementations • 17 Oct 2014 • Wojciech Zaremba, Ilya Sutskever
Recurrent Neural Networks (RNNs) with Long Short-Term Memory units (LSTM) are widely used because they are expressive and are easy to train.
69 code implementations • NeurIPS 2014 • Ilya Sutskever, Oriol Vinyals, Quoc V. Le
Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Ranked #4 on
Traffic Prediction
on PeMS-M
(using extra training data)
20 code implementations • 8 Sep 2014 • Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals
We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units.
Ranked #35 on
Language Modelling
on Penn Treebank (Word Level)
no code implementations • Journal of Machine Learning Research 2014 • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov
The key idea is to randomly drop units (along with their connections) from the neural network during training.
11 code implementations • 21 Dec 2013 • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus
Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks.
no code implementations • 16 Dec 2013 • David Eigen, Marc'Aurelio Ranzato, Ilya Sutskever
In addition, we see that the different combinations are in use when the model is applied to a dataset of speech monophones.
49 code implementations • NeurIPS 2013 • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean
Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
8 code implementations • 17 Sep 2013 • Tomas Mikolov, Quoc V. Le, Ilya Sutskever
Dictionaries and phrase tables are the basis of modern statistical machine translation systems.
no code implementations • Proceedings of the 30th International Conference on Machine Learning 2013 • Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton
Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum.
17 code implementations • NeurIPS 2012 • Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
We trained a large, deep convolutional neural network to classify the 1. 3 million high-resolution images in the LSVRC-2010 ImageNet training set into the 1000 different classes.
Ranked #4 on
Graph Classification
on HIV-fMRI-77
no code implementations • NeurIPS 2012 • Kevin Swersky, Ilya Sutskever, Daniel Tarlow, Richard S. Zemel, Ruslan R. Salakhutdinov, Ryan P. Adams
The Restricted Boltzmann Machine (RBM) is a popular density model that is also good for extracting features.
10 code implementations • 3 Jul 2012 • Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data.
Ranked #208 on
Image Classification
on CIFAR-10
no code implementations • NeurIPS 2009 • Ilya Sutskever, Joshua B. Tenenbaum, Ruslan R. Salakhutdinov
We consider the problem of learning probabilistic models for complex relational structures between various types of objects.
no code implementations • NeurIPS 2008 • Ilya Sutskever, Geoffrey E. Hinton
We describe a way of learning matrix representations of objects and relationships.
no code implementations • NeurIPS 2008 • Ilya Sutskever, Geoffrey E. Hinton, Graham W. Taylor
The Temporal Restricted Boltzmann Machine (TRBM) is a probabilistic model for sequences that is able to successfully model (i. e., generate nice-looking samples of) several very high dimensional sequences, such as motion capture data and the pixels of low resolution videos of balls bouncing in a box.