Automatic Spatially-aware Fashion Concept Discovery

This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites. We first fine-tune GoogleNet by jointly modeling clothing images and their corresponding descriptions in a visual-semantic embedding space. Then, for each attribute (word), we generate its spatially-aware representation by combining its semantic word vector representation with its spatial representation derived from the convolutional maps of the fine-tuned network. The resulting spatially-aware representations are further used to cluster attributes into multiple groups to form spatially-aware concepts (e.g., the neckline concept might consist of attributes like v-neck, round-neck, etc). Finally, we decompose the visual-semantic embedding space into multiple concept-specific subspaces, which facilitates structured browsing and attribute-feedback product retrieval by exploiting multimodal linguistic regularities. We conducted extensive experiments on our newly collected Fashion200K dataset, and results on clustering quality evaluation and attribute-feedback product retrieval task demonstrate the effectiveness of our automatically discovered spatially-aware concepts.

PDF Abstract ICCV 2017 PDF ICCV 2017 Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Image Retrieval with Multi-Modal Query Fashion200k FashionConcept Recall@1 6.3 # 8
Recall@10 19.9 # 8
Recall@50 38.3 # 8

Methods


No methods listed for this paper. Add relevant methods here