Search Results for author: Niki Parmar

Found 19 papers, 12 papers with code

Attention Is All You Need

567 code implementations • NeurIPS 2017 • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.

Ranked #2 on Multimodal Machine Translation on Multi30K (BLUE (DE-EN) metric)

Abstractive Text Summarization Coreference Resolution +8

124,527

Paper
Code

One Model To Learn Them All

1 code implementation • 16 Jun 2017 • Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit

We present a single model that yields good results on a number of problems spanning multiple domains.

Image Captioning Image Classification +3

14,865

Paper
Code

Large Scale Multi-Domain Multi-Task Learning with MultiModel

no code implementations • ICLR 2018 • Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit

We present a single model that yields good results on a number of problems spanning multiple domains.

Image Captioning Image Classification +4

Paper
Add Code

Image Transformer

no code implementations • 15 Feb 2018 • Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

Image generation has been successfully cast as an autoregressive sequence generation or transformation problem.

Ranked #3 on Density Estimation on CIFAR-10

Density Estimation Image Generation +1

Paper
Add Code

Fast Decoding in Sequence Models using Discrete Latent Variables

no code implementations • ICML 2018 • Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer

Finally, we evaluate our model end-to-end on the task of neural machine translation, where it is an order of magnitude faster at decoding than comparable autoregressive models.

Machine Translation Translation

Paper
Add Code

Tensor2Tensor for Neural Machine Translation

14 code implementations • WS 2018 • Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit

Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.

Machine Translation Translation

14,865

Paper
Code

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

3 code implementations • ACL 2018 • Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Niki Parmar, Mike Schuster, Zhifeng Chen, Yonghui Wu, Macduff Hughes

Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures.

Ranked #26 on Machine Translation on WMT2014 English-French

Machine Translation Translation

2,781

Paper
Code

Theory and Experiments on Vector Quantized Autoencoders

2 code implementations • 28 May 2018 • Aurko Roy, Ashish Vaswani, Arvind Neelakantan, Niki Parmar

Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks.

Image Generation Knowledge Distillation +2

Paper
Code

Weakly Supervised Grammatical Error Correction using Iterative Decoding

no code implementations • 31 Oct 2018 • Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar

We describe an approach to Grammatical Error Correction (GEC) that is effective at making use of models trained on large amounts of weakly supervised bitext.

Grammatical Error Correction

Paper
Add Code

Mesh-TensorFlow: Deep Learning for Supercomputers

1 code implementation • NeurIPS 2018 • Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman

We use Mesh-TensorFlow to implement an efficient data-parallel, model-parallel version of the Transformer sequence-to-sequence model.

Ranked #10 on Language Modelling on One Billion Word

Language Modelling

1,554

Paper
Code

Corpora Generation for Grammatical Error Correction

no code implementations • NAACL 2019 • Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, Simon Tong

We provide systematic analysis that compares the two approaches to data generation and highlights the effectiveness of ensembling.

Grammatical Error Correction Machine Translation +1

Paper
Add Code

Towards a better understanding of Vector Quantized Autoencoders

no code implementations • ICLR 2019 • Aurko Roy, Ashish Vaswani, Niki Parmar, Arvind Neelakantan

Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks.

Knowledge Distillation Machine Translation +1

Paper
Add Code

Stand-Alone Self-Attention in Vision Models

8 code implementations • NeurIPS 2019 • Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens

The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions.

object-detection Object Detection

32,745

Paper
Code

High Resolution Medical Image Analysis with Spatial Partitioning

1 code implementation • 6 Sep 2019 • Le Hou, Youlong Cheng, Noam Shazeer, Niki Parmar, Yeqing Li, Panagiotis Korfiatis, Travis M. Drucker, Daniel J. Blezek, Xiaodan Song

It is infeasible to train CNN models directly on such high resolution images, because neural activations of a single image do not fit in the memory of a single GPU/TPU, and naive data and model parallelism approaches do not work.

Vocal Bursts Intensity Prediction

1,554

Paper
Code

Conformer: Convolution-augmented Transformer for Speech Recognition

24 code implementations • 16 May 2020 • Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

Ranked #12 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

10,095

Paper
Code

Bottleneck Transformers for Visual Recognition

13 code implementations • CVPR 2021 • Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.

Ranked #52 on Instance Segmentation on COCO minival

Image Classification Instance Segmentation +3

29,671

Paper
Code

Scaling Local Self-Attention for Parameter Efficient Visual Backbones

7 code implementations • CVPR 2021 • Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, Jonathon Shlens

Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolutional models such as ResNet-50.

Ranked #209 on Image Classification on ImageNet

Image Classification Instance Segmentation +4

29,671

Paper
Code

Simple and Efficient ways to Improve REALM

no code implementations • EMNLP (MRQA) 2021 • Vidhisha Balachandran, Ashish Vaswani, Yulia Tsvetkov, Niki Parmar

Dense retrieval has been shown to be effective for retrieving relevant documents for Open Domain QA, surpassing popular sparse retrieval methods like BM25.

Retrieval

Paper
Add Code

Decoder Denoising Pretraining for Semantic Segmentation

1 code implementation • 23 May 2022 • Emmanuel Brempong Asiedu, Simon Kornblith, Ting Chen, Niki Parmar, Matthias Minderer, Mohammad Norouzi

We propose a decoder pretraining approach based on denoising, which can be combined with supervised pretraining of the encoder.

Denoising Segmentation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.