VLEP (Video-and-Language Event Prediction)

Introduced by Lei et al. in What is More Likely to Happen Next? Video-and-Language Future Event Prediction

VLEP contains 28,726 future event prediction examples (along with their rationales) from 10,234 diverse TV Show and YouTube Lifestyle Vlog video clips. Each example (see Figure 1) consists of a Premise Event (a short video clip with dialogue), a Premise Summary (a text summary of the premise event), and two potential natural language Future Events (along with Rationales) written by people. These clips are on average 6.1 seconds long and are harvested from diverse event-rich sources, i.e., TV show and YouTube Lifestyle Vlog videos.

Source: What is More Likely to Happen Next? Video-and-Language Future Event Prediction

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Video Question Answering	VLEP	LLaMA-VQA

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

jayleicn/VideoLanguageFuturePred

Tasks

Video Question Answering

VLEP (Video-and-Language Event Prediction)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

TVQA

STAR Benchmark

How2QA

Violin

Usage

License

Modalities

Languages

VLEP (Video-and-Language Event Prediction)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

TVQA

STAR Benchmark

How2QA

Violin

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages