audio-visual event localization
7 papers with code • 0 benchmarks • 1 datasets
Benchmarks
These leaderboards are used to track progress in audio-visual event localization
Most implemented papers
Audio-Visual Event Localization in Unconstrained Videos
In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.
Dual-modality seq2seq network for audio-visual event localization
Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).
Positive Sample Propagation along the Audio-Visual Event Line
To encourage the network to extract high correlated features for positive samples, a new audio-visual pair similarity loss is proposed.
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Recognizing and localizing events in videos is a fundamental task for video understanding.
Cross-Modal Background Suppression for Audio-Visual Event Localization
Audiovisual Event (AVE) localization requires the model to jointly localize an event by observing audio and visual information.
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video.
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i. e., the audio and visual modality are both assumed to signal the prediction target.