Browse > Computer Vision > Image Captioning

Image Captioning

80 papers with code · Computer Vision

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

Can Active Memory Replace Attention?

NeurIPS 2016 tensorflow/models

Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years. Attention has improved image classification, image captioning, speech recognition, generative models, and learning algorithmic tasks, but it had probably the largest impact on neural machine translation.

IMAGE CAPTIONING MACHINE TRANSLATION

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

21 Sep 2016tensorflow/models

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.

IMAGE CAPTIONING

One Model To Learn Them All

16 Jun 2017tensorflow/tensor2tensor

We present a single model that yields good results on a number of problems spanning multiple domains. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks.

IMAGE CAPTIONING IMAGE CLASSIFICATION MULTI-TASK LEARNING

Deep Visual-Semantic Alignments for Generating Image Descriptions

CVPR 2015 karpathy/neuraltalk

Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. We then describe a Multimodal Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions.

IMAGE CAPTIONING

Show and Tell: A Neural Image Caption Generator

CVPR 2015 karpathy/neuraltalk

Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. For instance, while the current state-of-the-art BLEU-1 score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69.

IMAGE CAPTIONING TEXT GENERATION

Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

7 Oct 2016facebookresearch/fairseq-py

Equally ubiquitous is the usage of beam search (BS) as an approximate inference algorithm to decode output sequences from these models. We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.

IMAGE CAPTIONING MACHINE TRANSLATION TIME SERIES

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

27 Jul 2016deepinsight/insightface

In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base. The rich information provided by the knowledge base helps to conduct disambiguation and improve the recognition accuracy, and contributes to various real-world applications, such as image captioning and news video analysis.

FACE RECOGNITION IMAGE CAPTIONING

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 Feb 2015kelvinxu/arctic-captions

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound.

IMAGE CAPTIONING

Recurrent Neural Network Regularization

8 Sep 2014wojzaremba/lstm

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs.

IMAGE CAPTIONING LANGUAGE MODELLING MACHINE TRANSLATION SPEECH RECOGNITION

SPICE: Semantic Propositional Image Caption Evaluation

29 Jul 2016tylin/coco-caption

There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging.

IMAGE CAPTIONING