Search Results for author: Łukasz Kaiser

Found 17 papers, 10 papers with code

Sparse is Enough in Scaling Transformers

no code implementations NeurIPS 2021 Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Łukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva

We study sparse variants for all layers in the Transformer and propose Scaling Transformers, a family of next generation Transformer models that use sparse layers to scale efficiently and perform unbatched decoding much faster than the standard Transformer as we scale up the model size.

Text Summarization

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

no code implementations12 Feb 2021 Piotr Kozakowski, Łukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, Katarzyna Kańska

QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline setting, but has low sample efficiency and struggles with high-dimensional observation spaces.

Atari Games Continuous Control +2

Model Based Reinforcement Learning for Atari

no code implementations ICLR 2020 Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błażej Osiński, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski

We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting.

Atari Games Model-based Reinforcement Learning +2

Reformer: The Efficient Transformer

13 code implementations ICLR 2020 Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences.

Image Generation Language Modelling +1

Forecasting Deep Learning Dynamics with Applications to Hyperparameter Tuning

no code implementations25 Sep 2019 Piotr Kozakowski, Łukasz Kaiser, Afroz Mohiuddin

Concretely, we introduce a forecasting model that, given a hyperparameter schedule (e. g., learning rate, weight decay) and a history of training observations (such as loss and accuracy), predicts how the training will continue.

Language Modelling

Universal Transformers

7 code implementations ICLR 2019 Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser

Feed-forward and convolutional architectures have recently been shown to achieve superior results on some sequence modeling tasks such as machine translation, with the added advantage that they concurrently process all inputs in the sequence, leading to easy parallelization and faster training times.

Language Modelling Learning to Execute +2

Tensor2Tensor for Neural Machine Translation

15 code implementations WS 2018 Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit

Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.

Machine Translation Translation

Fast Decoding in Sequence Models using Discrete Latent Variables

no code implementations ICML 2018 Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer

Finally, we evaluate our model end-to-end on the task of neural machine translation, where it is an order of magnitude faster at decoding than comparable autoregressive models.

Machine Translation Translation

Image Transformer

no code implementations15 Feb 2018 Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

Image generation has been successfully cast as an autoregressive sequence generation or transformation problem.

Ranked #6 on Image Generation on ImageNet 32x32 (bpd metric)

Image Generation Image Super-Resolution

Discrete Autoencoders for Sequence Models

1 code implementation ICLR 2018 Łukasz Kaiser, Samy Bengio

We propose to improve the representation in sequence models by augmenting current approaches with an autoencoder that is forced to compress the sequence through an intermediate discrete latent space.

Language Modelling Machine Translation +1

Can Active Memory Replace Attention?

2 code implementations NeurIPS 2016 Łukasz Kaiser, Samy Bengio

Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years.

Image Captioning Machine Translation +1

Machine Learning with Guarantees using Descriptive Complexity and SMT Solvers

no code implementations9 Sep 2016 Charles Jordan, Łukasz Kaiser

There are many efficient approaches to machine learning that do not provide strong theoretical guarantees, and a beautiful general learning theory.

Board Games Learning Theory

Neural GPUs Learn Algorithms

5 code implementations25 Nov 2015 Łukasz Kaiser, Ilya Sutskever

Unlike the NTM, the Neural GPU is highly parallel which makes it easier to train and efficient to run.

Cannot find the paper you are looking for? You can Submit a new open access paper.