audio-visual event localization

7 papers with code • 0 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?


Most implemented papers

Audio-Visual Event Localization in Unconstrained Videos

YapengTian/AVE-ECCV18 ECCV 2018

In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.

Dual-modality seq2seq network for audio-visual event localization

YapengTian/AVE-ECCV18 20 Feb 2019

Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).

Positive Sample Propagation along the Audio-Visual Event Line

jasongief/PSP_CVPR_2021 CVPR 2021

To encourage the network to extract high correlated features for positive samples, a new audio-visual pair similarity loss is proposed.

MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing

JustinYuu/MM_Pyramid 24 Nov 2021

Recognizing and localizing events in videos is a fundamental task for video understanding.

Cross-Modal Background Suppression for Audio-Visual Event Localization

marmot-xy/cmbs CVPR 2022

Audiovisual Event (AVE) localization requires the model to jointly localize an event by observing audio and visual information.

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

ttgeng233/UnAV CVPR 2023

To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video.

Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

franklin905/valor 27 May 2023

Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i. e., the audio and visual modality are both assumed to signal the prediction target.