Language Model Pre-Training

K3M is a multi-modal pretraining method for e-commerce product data that introduces knowledge modality to correct the noise and supplement the missing of image and text modalities. The modal-encoding layer extracts the features of each modality. The modal-interaction layer is capable of effectively modeling the interaction of multiple modalities, where an initial-interactive feature fusion model is designed to maintain the independence of image modality and text modality, and a structure aggregation module is designed to fuse the information of image, text, and knowledge modalities. K3M is pre-trained with three pretraining tasks, including masked object modeling (MOM), masked language modeling (MLM), and link prediction modeling (LPM).

Source: Knowledge Perceived Multi-modal Pretraining in E-commerce

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 1 50.00%
Link Prediction 1 50.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories