K3M

Introduced by Zhu et al. in Knowledge Perceived Multi-modal Pretraining in E-commerce

K3M is a multi-modal pretraining method for e-commerce product data that introduces knowledge modality to correct the noise and supplement the missing of image and text modalities. The modal-encoding layer extracts the features of each modality. The modal-interaction layer is capable of effectively modeling the interaction of multiple modalities, where an initial-interactive feature fusion model is designed to maintain the independence of image modality and text modality, and a structure aggregation module is designed to fuse the information of image, text, and knowledge modalities. K3M is pre-trained with three pretraining tasks, including masked object modeling (MOM), masked language modeling (MLM), and link prediction modeling (LPM).

Source: Knowledge Perceived Multi-modal Pretraining in E-commerce

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	1	50.00%
Link Prediction	1	50.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Language Model Pre-Training

K3M

Papers

Tasks

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove