Multi-modal Dense Video Captioning

17 Mar 2020Vladimir IashinEsa Rahtu

Dense video captioning is a task of localizing interesting events from an untrimmed video and producing textual description (captions) for each localized event. Most of the previous works in dense video captioning are solely based on visual information and completely ignore the audio track... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Dense Video Captioning ActivityNet Captions MDVC METEOR 7.31 # 3
BLEU-3 2.6 # 2
BLEU-4 1.07 # 2

Methods used in the Paper