Image Captioning

306 papers with code • 20 benchmarks • 45 datasets

Greatest papers with code

Can Active Memory Replace Attention?

tensorflow/models NeurIPS 2016

Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years.

Image Captioning Machine Translation +1

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

tensorflow/models 21 Sep 2016

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing.

Image Captioning Translation

A neural attention model for speech command recognition

google-research/google-research 27 Aug 2018

This paper introduces a convolutional recurrent network with attention for speech command recognition.

Image Captioning

One Model To Learn Them All

tensorflow/tensor2tensor 16 Jun 2017

We present a single model that yields good results on a number of problems spanning multiple domains.

Image Captioning Image Classification +2

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

deepinsight/insightface 27 Jul 2016

In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base.

Face Recognition Image Captioning

Ludwig: a type-based declarative deep learning toolbox

uber/ludwig 17 Sep 2019

In this work we present Ludwig, a flexible, extensible and easy to use toolbox which allows users to train deep learning models and use them for obtaining predictions without writing code.

Image Captioning Image Classification +12

Image Captioning

karpathy/neuraltalk2 13 May 2018

This paper discusses and demonstrates the outcomes from our experimentation on Image Captioning.

General Classification Image Captioning

Show and Tell: A Neural Image Caption Generator

karpathy/neuraltalk CVPR 2015

Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.

Image Captioning Image Retrieval with Multi-Modal Query +2

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

facebookresearch/pythia CVPR 2018

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.

Image Captioning Visual Question Answering