LAION-400M is a dataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddings and kNN indices that allow efficient similarity search.
133 PAPERS • 1 BENCHMARK
Crisscrossed Captions (CxC) contains 247,315 human-labeled annotations including positive and negative associations between image pairs, caption pairs and image-caption pairs.
21 PAPERS • 3 BENCHMARKS
The data used in - "Radio Galaxy Zoo EMU: Towards a Semantic Radio Galaxy Morphology Taxonomy" (Bowles et al. submitted) - "A New Task: Deriving Semantic Class Targets for the Physical Sciences" (Bowles et al. 2022: https://arxiv.org/abs/2210.14760) accepted at the Fifth Workshop on Machine Learning and the Physical Sciences, Neural Information Processing Systems 2022.
1 PAPER • NO BENCHMARKS YET