Video Description

26 papers with code • 0 benchmarks • 7 datasets

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Description

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Most implemented papers

Most implemented Social Latest No code

Memory-augmented Attention Modelling for Videos

rasoolfa/videocap • • 7 Nov 2016

We present a method to improve video description generation by modeling higher-order interactions between video frames and described concepts.

Paper
Code

Egocentric Video Description based on Temporally-Linked Sequences

MarcBS/TMA • 7 Apr 2017

We propose a novel methodology that exploits information from temporally neighboring events, matching precisely the nature of egocentric sequences.

Paper
Code

Predicting Visual Features from Text for Image and Video Caption Retrieval

danieljf24/w2vv • • 5 Sep 2017

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video.

Paper
Code

Adversarial Inference for Multi-Sentence Video Description

jamespark3922/adv-inf • • CVPR 2019

Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.

Paper
Code

VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

facebookresearch/vizseq • IJCNLP 2019

Automatic evaluation of text generation tasks (e. g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE.

Paper
Code

Delving Deeper into the Decoder for Video Captioning

WingsBrokenAngel/delving-deeper-into-the-decoder-for-video-captioning • • 16 Jan 2020

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence.

Paper
Code

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

L-YeZhu/Video-Description-via-Dialog-Agents-ECCV2020 • • ECCV 2020

With the arising concerns for the AI systems provided with direct access to abundant sensitive information, researchers seek to develop more reliable AI with implicit information sources.

Paper
Code

Identity-Aware Multi-Sentence Video Description

jamespark3922/lsmdc-fillin • • ECCV 2020

This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description.

Paper
Code

Learn to Understand Negation in Video Retrieval

ruc-aimc-lab/nt2vr • • 30 Apr 2022

We propose a learning based method for training a negation-aware video retrieval model.

Paper
Code

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

cannylab/vdtk • 12 May 2022

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.

Paper
Code

Video Description

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result