CelebV-Text comprises 70,000 in-the-wild face video clips with diverse visual content, each paired with 20 texts generated using the proposed semi-automatic text generation strategy.
4 PAPERS • NO BENCHMARKS YET
Hugging Face Datasets (New!) | Website | Github Repository | arXiv e-Print <!
1 PAPER • NO BENCHMARKS YET