SPEECH-COCO

Introduced by Havard et al. in SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set

SPEECH-COCO contains speech captions that are generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (more than 600h) paired with images.

Source: SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Speech Recognition
Intent Detection
Fairness

Similar Datasets

HandNet

Unite the People

IPN Hand

Usage

License

CC BY 4.0

Modalities

Speech

SPEECH-COCO

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit