MUSES (MUlti-Shot EventS)

Introduced by Liu et al. in Multi-shot Temporal Event Localization: a Benchmark

MUSES is a large-scale dataset for temporal event (action) localization. It focuses on the temporal localization of multi-shot events, which are captured with multiple shots. Such events often appear in edited videos, such as TV shows and movies.

What’s included in MUSES:

  • 3,697 videos of TV and movie dramas
  • 716 hours of duration
  • 25 event categories
  • 652k shots
  • 31,477 annotated event instances


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • Unknown