VPCD (Video Person-Clustering)

Introduced by Brown et al. in Face, Body, Voice: Video Person-Clustering with Multiple Modalities

VPCD contains multi-modal annotations (face, body and voice) for all primary and secondary characters from a range of diverse TV-shows and movies. It is used for evaluating multi-modal person-clustering. It contains body-tracks for each annotated character, face-tracks when visible, and voice-tracks when speaking, with their associated features.

It consists of more than 30,000 face and body tracks of 300+ characters, from over 23 hours of video.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Audio-Visual Active Speaker Detection	VPCD	GSCMIA

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Audio-Visual Active Speaker Detection

VPCD (Video Person-Clustering)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Aesthetic Visual Analysis

Usage

License

Modalities

Languages

VPCD (Video Person-Clustering)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

Aesthetic Visual Analysis

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages