MUSES is a large-scale dataset for temporal event (action) localization. It focuses on the temporal localization of multi-shot events, which are captured with multiple shots. Such events often appear in edited videos, such as TV shows and movies.
What’s included in MUSES:
|Trend||Task||Dataset Variant||Best Model||Paper||Code|