Image-text Classification
6 papers with code • 0 benchmarks • 1 datasets
Benchmarks
These leaderboards are used to track progress in Image-text Classification
Most implemented papers
Context-Aware Compilation of DNN Training Pipelines across Edge and Cloud
Experimental results show that our system not only adapts well to, but also draws on the varying contexts, delivering a practical and efficient solution to edge-cloud model training.
GLAMI-1M: A Multilingual Image-Text Fashion Dataset
We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark.
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion
Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations.
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
Medical vision-and-language pre-training (Med-VLP) has shown promising improvements on many downstream medical tasks owing to its applicability to extracting generic representations from medical images and texts.
UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning
Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks.
GIST: Generating Image-Specific Text for Fine-grained Object Classification
We demonstrate the utility of GIST by fine-tuning vision-language models on the image-and-generated-text pairs to learn an aligned vision-language representation space for improved classification.