GLAMI-1M: A Multilingual Image-Text Fashion Dataset

We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text. The dataset, source code and model checkpoints are published at https://github.com/glami/glami-1m

PDF Abstract BMVC 2022 PDF

Datasets


Introduced in the Paper:

GLAMI-1M

Results from the Paper


 Ranked #1 on Multilingual Image-Text Classification on GLAMI-1M (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Multilingual Image-Text Classification GLAMI-1M EmbraceNet (image+text) Top 1 Accuracy % 69.7 # 1
Top 5 Accuracy % 94.0 # 1
Multilingual Image-Text Classification GLAMI-1M CLIP (zero-shot image+text) Top 1 Accuracy % 32.3 # 2
Top 5 Accuracy % 74.5 # 2

Methods