Meshed-Memory Transformer for Image Captioning

Transformer-based architectures represent the state of the art in sequence modeling tasks like machine translation and language understanding. Their applicability to multi-modal contexts like image captioning, however, is still largely under-explored... (read more)

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Image Captioning COCO M2 Transformer CIDEr 131.2 # 1

Methods used in the Paper


METHOD TYPE
Residual Connection
Skip Connections
BPE
Subword Segmentation
Dense Connections
Feedforward Networks
Label Smoothing
Regularization
ReLU
Activation Functions
Adam
Stochastic Optimization
Softmax
Output Functions
Dropout
Regularization
Multi-Head Attention
Attention Modules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
Transformer
Transformers