TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Question Answering (Level 4)	DramaQA	Long Story Short	Accuracy	79.28	# 1
Video Question Answering (Level 3)	DramaQA	Long Story Short	Accuracy	75.78	# 1
Video Story QA	MovieQA	Long Story Short	Accuracy	51.49	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/long-story-short-a-summarize-then-search/video-question-answering-level-4-on-dramaqa)](https://paperswithcode.com/sota/video-question-answering-level-4-on-dramaqa?p=long-story-short-a-summarize-then-search)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/long-story-short-a-summarize-then-search/video-question-answering-level-3-on-dramaqa)](https://paperswithcode.com/sota/video-question-answering-level-3-on-dramaqa?p=long-story-short-a-summarize-then-search)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/long-story-short-a-summarize-then-search/video-story-qa-on-movieqa)](https://paperswithcode.com/sota/video-story-qa-on-movieqa?p=long-story-short-a-summarize-then-search)`

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

2 Nov 2023 · Jiwan Chung, Youngjae Yu ·

Large language models such as GPT-3 have demonstrated an impressive capability to adapt to new tasks without requiring task-specific training data. This capability has been particularly effective in settings such as narrative question answering, where the diversity of tasks is immense, but the available supervision data is small. In this work, we investigate if such language models can extend their zero-shot reasoning abilities to long multimodal narratives in multimedia content such as drama, movies, and animation, where the story plays an essential role. We propose Long Story Short, a framework for narrative video QA that first summarizes the narrative of the video to a short plot and then searches parts of the video relevant to the question. We also propose to enhance visual matching with CLIPCheck. Our model outperforms state-of-the-art supervised models by a large margin, highlighting the potential of zero-shot QA for long videos.

PDF Abstract

Code

Add Remove Mark official

JiwanChung/long-story-short official

Tasks

Add Remove

Question Answering

Video Question Answering

Video Question Answering (Level 3)

Video Question Answering (Level 4)

Video Story QA

Datasets

MovieQA DramaQA

Results from the Paper

Edit

Ranked #1 on Video Question Answering (Level 4) on DramaQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Question Answering (Level 4)	DramaQA	Long Story Short	Accuracy	79.28	# 1	Compare
Video Question Answering (Level 3)	DramaQA	Long Story Short	Accuracy	75.78	# 1	Compare
Video Story QA	MovieQA	Long Story Short	Accuracy	51.49	# 1	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove