Audio-Visual Active Speaker Detection

10 papers with code • 2 benchmarks • 3 datasets

Determine if and when each visible person in the video is speaking.

Most implemented papers

Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection

TaoRuijie/TalkNet_ASD 14 Jul 2021

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

sra2/spell 15 Jul 2022

Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows.

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

github-zbx/ava_datasets 5 Jan 2019

The dataset contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible.

Active Speakers in Context

fuankarion/active-speakers-context CVPR 2020

Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker.

MAAS: Multi-modal Assignation for Active Speaker Detection

fuankarion/maas ICCV 2021

Active speaker detection requires a solid integration of multi-modal cues.

How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild

okankop/ASDNet ICCV 2021

Successful active speaker detection requires a three-stage pipeline: (i) audio-visual encoding for all speakers in the clip, (ii) inter-speaker relation modeling between a reference speaker and the background speakers within each frame, and (iii) temporal modeling for the reference speaker.

End-to-End Active Speaker Detection

fuankarion/end-to-end-asd 27 Mar 2022

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation.

Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection

rash1993/movie-asd 1 Dec 2022

Active speaker detection in videos addresses associating a source face, visible in the video frames, with the underlying speech in the audio modality.

A Light Weight Model for Active Speaker Detection

junhua-liao/light-asd CVPR 2023

Experimental results on the AVA-ActiveSpeaker dataset show that our framework achieves competitive mAP performance (94. 1% vs. 94. 2%), while the resource costs are significantly lower than the state-of-the-art method, especially in model parameters (1. 0M vs. 22. 5M, about 23x) and FLOPs (0. 6G vs. 2. 6G, about 4x).