End-to-End Dense Video Captioning with Masked Transformer

Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events... (read more)

PDF Abstract CVPR 2018 PDF CVPR 2018 Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Video Captioning YouCook2 Zhou BLEU-3 7.53 # 3
BLEU-4 4.38 # 3
METEOR 11.55 # 4
ROUGE-L 27.44 # 4
CIDEr 0.38 # 4

Methods used in the Paper


METHOD TYPE
Residual Connection
Skip Connections
BPE
Subword Segmentation
Dense Connections
Feedforward Networks
Label Smoothing
Regularization
ReLU
Activation Functions
Adam
Stochastic Optimization
Softmax
Output Functions
Dropout
Regularization
Multi-Head Attention
Attention Modules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
Transformer
Transformers