Vision-based Food Nutrition Estimation via RGB-D Fusion Network

With the development of deep learning technology, vision-based food nutrition estimation is gradually entering the public view for its advantage in accuracy and efficiency. In this paper, we designed one RGB-D fusion network, which integrated multimodal feature fusion (MMFF) and multi-scale fusion for visioin-based nutrition assessment. MMFF performed effective feature fusion by a balanced feature pyramid and convolutional block attention module. Multi-scale fusion fused different resolution features through feature pyramid network. Both enhanced feature representation to improve the performance of the model. Compared with state-of-the-art methods, the mean value of the percentage mean absolute error (PMAE) for our method reached 18.5%. The PMAE of calories and mass reached 15.0% and 10.8% via the RGB-D fusion network, improved by 3.8% and 8.1%, respectively. Furthermore, this study visualized the estimation results of four nutrients and verified the validity of the method. This research contributed to the development of automated food nutrient analysis (Code and models can be found at http://123.57.42.89/codes/RGB-DNet/nutrition.html).

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods