TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Audio-Visual Active Speaker Detection	AVA-ActiveSpeaker	Extended UniCon	validation mean average precision	93.6%	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ictcas-ucas-tal-submission-to-the-ava/audio-visual-active-speaker-detection-on-ava)](https://paperswithcode.com/sota/audio-visual-active-speaker-detection-on-ava?p=ictcas-ucas-tal-submission-to-the-ava)`

ICTCAS-UCAS-TAL Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2021

The ActivityNet Large-Scale Activity Recognition Challenge Workshop, CVPR 2021 · Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan ·

This report presents a brief description of our method for the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2021. Our solution, the Extended Unified Context Network (Extended UniCon) is based on a novel Unified Context Network (UniCon) designed for robust ASD, which combines multiple types of contextual information to optimize all candidates jointly. We propose a few changes to the original UniCon in terms of audio features, temporal modeling architecture, and loss function design. Together, our best model ensemble sets a new state-of-the-art at 93.4% mAP on the AVA-ActiveSpeaker test set without any form of pretraining, and currently ranks first on the ActivityNet challenge leaderboard.

PDF