Paper | Github | Dataset| Model
6 PAPERS • 1 BENCHMARK
MTTN is a large scale derived and synthesized dataset built with on real prompts and indexed with popular image-text datasets like MS-COCO, Flickr, etc. MTTN consists of over 2.4M sentences that are divided over 5 stages creating a combination amounting to over 12M pairs, along with a vocab size of consisting more than 300 thousands unique words that creates an abundance of variations.
1 PAPER • NO BENCHMARKS YET