567 code implementations • NeurIPS 2017 • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.
Ranked #2 on Multimodal Machine Translation on Multi30K (BLUE (DE-EN) metric)
8 code implementations • NeurIPS 2019 • Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens
The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions.
3 code implementations • 22 Sep 2021 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler
The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.
13 code implementations • CVPR 2021 • Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani
Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.
Ranked #52 on Instance Segmentation on COCO minival
7 code implementations • CVPR 2021 • Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, Jonathon Shlens
Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolutional models such as ResNet-50.
Ranked #212 on Image Classification on ImageNet
11 code implementations • NAACL 2018 • Peter Shaw, Jakob Uszkoreit, Ashish Vaswani
On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1. 3 BLEU and 0. 3 BLEU over absolute position representations, respectively.
Ranked #22 on Machine Translation on WMT2014 English-French
14 code implementations • WS 2018 • Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit
Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.
1 code implementation • 16 Jun 2017 • Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit
We present a single model that yields good results on a number of problems spanning multiple domains.
31 code implementations • 4 Jun 2018 • Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, Razvan Pascanu
As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.
1 code implementation • NeurIPS 2018 • Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman
We use Mesh-TensorFlow to implement an efficient data-parallel, model-parallel version of the Transformer sequence-to-sequence model.
Ranked #10 on Language Modelling on One Billion Word
14 code implementations • ICCV 2019 • Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le
Convolutional networks have been the paradigm of choice in many computer vision applications.
Ranked #114 on Image Classification on CIFAR-100 (using extra training data)
12 code implementations • ICLR 2019 • Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck
This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length.
Ranked #3 on Music Modeling on JSB Chorales
2 code implementations • 12 Mar 2020 • Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier
This work builds upon two lines of research: it combines the modeling flexibility of prior work on content-based sparse attention with the efficiency gains from approaches based on local, temporal sparse attention.
Ranked #5 on Image Generation on ImageNet 64x64 (Bits per dim metric)
2 code implementations • WS 2016 • Ke Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, Kevin Knight
In this work, we present the first results for neuralizing an Unsupervised Hidden Markov Model.
2 code implementations • 28 May 2018 • Aurko Roy, Ashish Vaswani, Arvind Neelakantan, Niki Parmar
Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks.
1 code implementation • TACL 2016 • Ashish Vaswani, Kenji Sagae
Transition-based approaches based on local classification are attractive for dependency parsing due to their simplicity and speed, despite producing results slightly below the state-of-the-art.
no code implementations • 15 Feb 2018 • Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran
Image generation has been successfully cast as an autoregressive sequence generation or transformation problem.
Ranked #3 on Density Estimation on CIFAR-10
no code implementations • ICML 2018 • Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer
Finally, we evaluate our model end-to-end on the task of neural machine translation, where it is an order of magnitude faster at decoding than comparable autoregressive models.
no code implementations • ICLR 2019 • Aurko Roy, Ashish Vaswani, Niki Parmar, Arvind Neelakantan
Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks.
no code implementations • ICLR 2018 • Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit
We present a single model that yields good results on a number of problems spanning multiple domains.
no code implementations • ACL 2019 • Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Ie, Jason Baldridge
We also show that the existing paths in the dataset are not ideal for evaluating instruction following because they are direct-to-goal shortest paths.
no code implementations • EMNLP (MRQA) 2021 • Vidhisha Balachandran, Ashish Vaswani, Yulia Tsvetkov, Niki Parmar
Dense retrieval has been shown to be effective for retrieving relevant documents for Open Domain QA, surpassing popular sparse retrieval methods like BM25.
no code implementations • ICLR 2022 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler
The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.
no code implementations • ICLR 2022 • Mostafa Dehghani, Anurag Arnab, Lucas Beyer, Ashish Vaswani, Yi Tay
We further present suggestions to improve reporting of efficiency metrics.