Audio-Visual Event Localization in Unconstrained Videos

YapengTian/AVE-ECCV18 ECCV 2018

In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.

Dual-modality seq2seq network for audio-visual event localization

YapengTian/AVE-ECCV18 20 Feb 2019

Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).

Positive Sample Propagation along the Audio-Visual Event Line

jasongief/PSP_CVPR_2021 CVPR 2021

To encourage the network to extract high correlated features for positive samples, a new audio-visual pair similarity loss is proposed.

MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing

JustinYuu/MM_Pyramid 24 Nov 2021

Recognizing and localizing events in videos is a fundamental task for video understanding.

Cross-Modal Background Suppression for Audio-Visual Event Localization

marmot-xy/cmbs CVPR 2022

Audiovisual Event (AVE) localization requires the model to jointly localize an event by observing audio and visual information.

ActionFormer: Localizing Moments of Actions with Transformers

happyharrycn/actionformer_release 16 Feb 2022

Self-attention based Transformer models have demonstrated impressive results for image classification and object detection, and more recently for video understanding.

Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization

bravo5542/vscg 11 Oct 2022

In contrast to existing methods, we propose a novel video-level semantic consistency guidance network for the AVE localization task.

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

ttgeng233/UnAV CVPR 2023

To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video.

UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

ttgeng233/UniAV 4 Apr 2024

Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL).

Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling

jasongief/VPLAN 3 Jun 2024

The Audio-Visual Video Parsing task aims to identify and temporally localize the events that occur in either or both the audio and visual streams of audible videos.