TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Audio-Visual Active Speaker Detection	AVA-ActiveSpeaker	MAAS-TAN	validation mean average precision	88.8%	# 13
Audio-Visual Active Speaker Detection	AVA-ActiveSpeaker	MAAS-LAN	validation mean average precision	85.1%	# 16

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/maas-multi-modal-assignation-for-active/audio-visual-active-speaker-detection-on-ava)](https://paperswithcode.com/sota/audio-visual-active-speaker-detection-on-ava?p=maas-multi-modal-assignation-for-active)`

MAAS: Multi-modal Assignation for Active Speaker Detection

ICCV 2021 · Juan León-Alcázar, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem ·

Active speaker detection requires a solid integration of multi-modal cues. While individual modalities can approximate a solution, accurate predictions can only be achieved by explicitly fusing the audio and visual features and modeling their temporal progression. Despite its inherent muti-modal nature, current methods still focus on modeling and fusing short-term audiovisual features for individual speakers, often at frame level. In this paper we present a novel approach to active speaker detection that directly addresses the multi-modal nature of the problem, and provides a straightforward strategy where independent visual features from potential speakers in the scene are assigned to a previously detected speech event. Our experiments show that, an small graph data structure built from a single frame, allows to approximate an instantaneous audio-visual assignment problem. Moreover, the temporal extension of this initial graph achieves a new state-of-the-art on the AVA-ActiveSpeaker dataset with a mAP of 88.8\%.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

fuankarion/maas official

Tasks

Add Remove

Audio-Visual Active Speaker Detection

Datasets

AVA

AVA-ActiveSpeaker

Results from the Paper

Edit

Ranked #13 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Audio-Visual Active Speaker Detection	AVA-ActiveSpeaker	MAAS-TAN	validation mean average precision	88.8%	# 13		Compare
Audio-Visual Active Speaker Detection	AVA-ActiveSpeaker	MAAS-LAN	validation mean average precision	85.1%	# 16		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

MAAS: Multi-modal Assignation for Active Speaker Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove