Search Results for author: Lukasz Kaiser

Found 18 papers, 14 papers with code

Training Verifiers to Solve Math Word Problems

3 code implementations • 27 Oct 2021 • Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.

GSM8K Math +1

874

Paper
Code

Evaluating Large Language Models Trained on Code

13 code implementations • 7 Jul 2021 • Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities.

Ranked #1 on Multi-task Language Understanding on BBH-alg

Code Generation Language Modelling +1

7,767

Paper
Code

Rethinking Attention with Performers

12 code implementations • ICLR 2021 • Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness.

Ranked #7 on Offline RL on D4RL

D4RL Image Generation +2

76,589

Paper
Code

Parallel Scheduled Sampling

no code implementations • 11 Jun 2019 • Daniel Duckworth, Arvind Neelakantan, Ben Goodrich, Lukasz Kaiser, Samy Bengio

Experimentally, we find the proposed technique leads to equivalent or better performance on image generation, summarization, dialog generation, and translation compared to teacher-forced training.

Image Generation Response Generation

Paper
Add Code

Sample Efficient Text Summarization Using a Single Pre-Trained Transformer

2 code implementations • 21 May 2019 • Urvashi Khandelwal, Kevin Clark, Dan Jurafsky, Lukasz Kaiser

Language model (LM) pre-training has resulted in impressive performance and sample efficiency on a variety of language understanding tasks.

Ranked #1 on Text Summarization on DUC 2004 Task 1 (ROUGE-2 metric)

Abstractive Text Summarization Language Modelling

14,881

Paper
Code

Model-Based Reinforcement Learning for Atari

2 code implementations • 1 Mar 2019 • Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski

We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting.

Ranked #12 on Atari Games 100k on Atari 100k

Atari Games Atari Games 100k +4

14,881

Paper
Code

Area Attention

1 code implementation • ICLR 2019 • Yang Li, Lukasz Kaiser, Samy Bengio, Si Si

We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e. g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences.

Image Captioning Machine Translation +1

Paper
Code

Generating Wikipedia by Summarizing Long Sequences

4 code implementations • ICLR 2018 • Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer

We show that generating English Wikipedia articles can be approached as a multi- document summarization of source documents.

Document Summarization Extractive Summarization +2

14,881

Paper
Code

Unsupervised Cipher Cracking Using Discrete GANs

1 code implementation • ICLR 2018 • Aidan N. Gomez, Sicong Huang, Ivan Zhang, Bryan M. Li, Muhammad Osama, Lukasz Kaiser

This work details CipherGAN, an architecture inspired by CycleGAN used for inferring the underlying cipher mapping given banks of unpaired ciphertext and plaintext.

122

Paper
Code

Large Scale Multi-Domain Multi-Task Learning with MultiModel

no code implementations • ICLR 2018 • Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit

We present a single model that yields good results on a number of problems spanning multiple domains.

Image Captioning Image Classification +4

Paper
Add Code

One Model To Learn Them All

1 code implementation • 16 Jun 2017 • Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit

We present a single model that yields good results on a number of problems spanning multiple domains.

Image Captioning Image Classification +3

14,881

Paper
Code

Attention Is All You Need

567 code implementations • NeurIPS 2017 • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.

Ranked #2 on Multimodal Machine Translation on Multi30K (BLUE (DE-EN) metric)

Abstractive Text Summarization Coreference Resolution +8

124,889

Paper
Code

Depthwise Separable Convolutions for Neural Machine Translation

2 code implementations • ICLR 2018 • Lukasz Kaiser, Aidan N. Gomez, Francois Chollet

In this work, we study how depthwise separable convolutions can be applied to neural machine translation.

Ranked #63 on Machine Translation on WMT2014 English-German

Machine Translation Translation

14,881

Paper
Code

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

4 code implementations • 14 Mar 2016 • Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, Xiaoqiang Zheng

TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms.

BIG-bench Machine Learning Clustering +4

182,425

Paper
Code

Adding Gradient Noise Improves Learning for Very Deep Networks

4 code implementations • 21 Nov 2015 • Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens

This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks.

Question Answering

Paper
Code

Multi-task Sequence to Sequence Learning

no code implementations • 19 Nov 2015 • Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser

This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder can be shared, as in the case of translation and image caption generation, and (c) the many-to-many setting - where multiple encoders and decoders are shared, which is the case with unsupervised objectives and translation.

Caption Generation Machine Translation +2

Paper
Add Code

Sentence Compression by Deletion with LSTMs

no code implementations • EMNLP 2015 • Katja Filippova, Enrique Alfonseca, Carlos A. Colmenares, Lukasz Kaiser, Oriol Vinyals

Ranked #4 on Sentence Compression on Google Dataset

Sentence Sentence Compression +1

Paper
Add Code

Grammar as a Foreign Language

8 code implementations • NeurIPS 2015 • Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton

Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades.

Ranked #23 on Constituency Parsing on Penn Treebank

Constituency Parsing

1,219

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.