2 dataset results for Text Generation AND Images AND Chinese

COCO-CN is a bilingual image description dataset enriching MS-COCO with manually written Chinese sentences and tags. The new dataset can be used for multiple tasks including image tagging, captioning and retrieval, all in a cross-lingual setting.

20 PAPERS • 3 BENCHMARKS

PTVD

PTVD is a plot-oriented multimodal dataset in the TV domain. It is also the first non-English dataset of its kind. Additionally, PTVD contains more than 26 million bullet screen comments (BSCs), powering large-scale pre-training.

1 PAPER • NO BENCHMARKS YET

Datasets

2 dataset results for Text Generation AND Images AND Chinese