Video Description

25 papers with code • 0 benchmarks • 7 datasets

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models

cmooncs/semeval-2024_multimodal_ecpe 5 Mar 2024

However, the complexities of these diverse modalities pose challenges for developing an efficient multimodal emotion cause analysis (ECA) system.

1
05 Mar 2024

FunQA: Towards Surprising Video Comprehension

jingkang50/funqa 26 Jun 2023

Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention.

87
26 Jun 2023

MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian

willyfh/msvd-indonesian 20 Jun 2023

Since the availability of the pretraining resources with Indonesian sentences is relatively limited, the applicability of those approaches to our dataset is still questionable.

3
20 Jun 2023

Fine-grained Audible Video Description

opennlplab/favdbench CVPR 2023

We explore a new task for audio-visual-language modeling called fine-grained audible video description (FAVD).

68
27 Mar 2023

Thinking Hallucination for Video Captioning

nasib-ullah/THVC 28 Sep 2022

In video captioning, there are two kinds of hallucination: object and action hallucination.

11
28 Sep 2022

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

cannylab/vdtk 12 May 2022

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.

10
12 May 2022

Learn to Understand Negation in Video Retrieval

ruc-aimc-lab/nt2vr 30 Apr 2022

We propose a learning based method for training a negation-aware video retrieval model.

4
30 Apr 2022

Identity-Aware Multi-Sentence Video Description

jamespark3922/lsmdc-fillin ECCV 2020

This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description.

13
22 Aug 2020

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

L-YeZhu/Video-Description-via-Dialog-Agents-ECCV2020 ECCV 2020

With the arising concerns for the AI systems provided with direct access to abundant sensitive information, researchers seek to develop more reliable AI with implicit information sources.

5
18 Aug 2020

Delving Deeper into the Decoder for Video Captioning

WingsBrokenAngel/delving-deeper-into-the-decoder-for-video-captioning 16 Jan 2020

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence.

37
16 Jan 2020