VLEP (Video-and-Language Event Prediction)

Introduced by Lei et al. in What is More Likely to Happen Next? Video-and-Language Future Event Prediction

VLEP contains 28,726 future event prediction examples (along with their rationales) from 10,234 diverse TV Show and YouTube Lifestyle Vlog video clips. Each example (see Figure 1) consists of a Premise Event (a short video clip with dialogue), a Premise Summary (a text summary of the premise event), and two potential natural language Future Events (along with Rationales) written by people. These clips are on average 6.1 seconds long and are harvested from diverse event-rich sources, i.e., TV show and YouTube Lifestyle Vlog videos.

Source: What is More Likely to Happen Next? Video-and-Language Future Event Prediction

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages