Video Description

17 papers with code • 0 benchmarks • 6 datasets

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Greatest papers with code

VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

facebookresearch/vizseq IJCNLP 2019

Automatic evaluation of text generation tasks (e. g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE.

Image Captioning Machine Translation +3

Describing Videos by Exploiting Temporal Structure

yaoli/arctic-capgen-vid ICCV 2015

In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions.

Action Recognition Video Description

Grounded Video Description

facebookresearch/grounded-video-description CVPR 2019

Our dataset, ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase.

Video Description

TGIF: A New Dataset and Benchmark on Animated GIF Description

raingo/TGIF-Release CVPR 2016

The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips.

Image Captioning Machine Translation +2

Predicting Visual Features from Text for Image and Video Caption Retrieval

danieljf24/w2vv 5 Sep 2017

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video.

Video Description

Video Description using Bidirectional Recurrent Neural Networks

lvapeab/ABiViRNet 12 Apr 2016

Although traditionally used in the machine translation field, the encoder-decoder framework has been recently applied for the generation of video and image descriptions.

Text Generation Video Captioning +1

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

hudaAlamri/DSTC7-Audio-Visual-Scene-Aware-Dialog-AVSD-Challenge 1 Jun 2018

Scene-aware dialog systems will be able to have conversations with users about the objects and events around them.

Video Description Visual Dialog

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

eric-xw/Video-guided-Machine-Translation ICCV 2019

We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context.

Machine Translation Video Captioning +1

Delving Deeper into the Decoder for Video Captioning

WingsBrokenAngel/delving-deeper-into-the-decoder-for-video-captioning 16 Jan 2020

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence.

Video Captioning Video Description