306 papers with code • 20 benchmarks • 45 datasets
Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years.
Ranked #49 on Machine Translation on WMT2014 English-French
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing.
This paper introduces a convolutional recurrent network with attention for speech command recognition.
Ranked #11 on Keyword Spotting on Google Speech Commands
Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers.
Ranked #3 on Image Captioning on COCO
In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base.
In this work we present Ludwig, a flexible, extensible and easy to use toolbox which allows users to train deep learning models and use them for obtaining predictions without writing code.
Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.
Ranked #3 on Image Retrieval with Multi-Modal Query on MIT-States
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.
Ranked #48 on Visual Question Answering on VQA v2 test-std