Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolutional models such as ResNet-50.
Ranked #109 on Image Classification on ImageNet
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.
Convolutional neural networks are among the most successful architectures in deep learning with this success at least partially attributable to the efficacy of spatial invariance as an inductive bias.
The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions.
In this work, we describe a method to speed up generation in convolutional autoregressive models.
This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet.
We apply this method to challenging benchmarks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models.
Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip.