SeeDS: Semantic Separable Diffusion Synthesizer for Zero-shot Food Detection

7 Oct 2023  ·  Pengfei Zhou, Weiqing Min, Yang Zhang, Jiajun Song, Ying Jin, Shuqiang Jiang ·

Food detection is becoming a fundamental task in food computing that supports various multimedia applications, including food recommendation and dietary monitoring. To deal with real-world scenarios, food detection needs to localize and recognize novel food objects that are not seen during training, demanding Zero-Shot Detection (ZSD). However, the complexity of semantic attributes and intra-class feature diversity poses challenges for ZSD methods in distinguishing fine-grained food classes. To tackle this, we propose the Semantic Separable Diffusion Synthesizer (SeeDS) framework for Zero-Shot Food Detection (ZSFD). SeeDS consists of two modules: a Semantic Separable Synthesizing Module (S$^3$M) and a Region Feature Denoising Diffusion Model (RFDDM). The S$^3$M learns the disentangled semantic representation for complex food attributes from ingredients and cuisines, and synthesizes discriminative food features via enhanced semantic information. The RFDDM utilizes a novel diffusion model to generate diversified region features and enhances ZSFD via fine-grained synthesized features. Extensive experiments show the state-of-the-art ZSFD performance of our proposed method on two food datasets, ZSFooD and UECFOOD-256. Moreover, SeeDS also maintains effectiveness on general ZSD datasets, PASCAL VOC and MS COCO. The code and dataset can be found at

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Generalized Zero-Shot Object Detection MS-COCO SeeDS HM(mAP) 26.8 # 1
HM(Recall) 61 # 3
Zero-Shot Object Detection MS-COCO SeeDS mAP 20.6 # 1
Recall 64 # 2
Zero-Shot Object Detection PASCAL VOC'07 SeeDS mAP 68.9 # 1