LightGT: A Light Graph Transformer for Multimedia Recommendation
Multimedia recommendation methods aim to discover the user preference on the multi-modal information to enhance the collaborative filtering (CF) based recommender system. Nevertheless, they seldom consider the impact of feature extraction on the user preference modeling and prediction of the user-item interaction, as the extracted features contain excessive information irrelevant to the recommendation. To capture the informative features from the extracted ones, we resort to Transformer model to establish the correlation between the items historically interacted by the same user. Considering its challenges in effectiveness and efficiency, we propose a novel Transformer-based recommendation model, termed as Light Graph Transformer model (LightGT). Therein, we develop a modal-specific embedding and a layer-wise position encoder for the effective similarity measurement, and present a light self-attention block to improve the efficiency of self-attention scoring. Based on these designs, we can effectively and efficiently learn the user preference from the off-the-shelf items' features to predict the user-item interactions. Conducting extensive experiments on Movielens, Tiktok and Kwai datasets, we demonstrate that LigthGT significantly outperforms the state-of-the-art baselines with less time. Our code is publicly available at: https://github.com/Liuwq-bit/LightGT.
PDF AbstractCode
Datasets
Results from the Paper
Ranked #1 on Multi-Media Recommendation on Kwai (Recall@10 metric)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Multi-Media Recommendation | Kwai | LightGT | Recall@10 | 0.0546 | # 1 | |
nDCG@10 | 0.0441 | # 1 | ||||
Multi-Media Recommendation | MovieLens | LightGT | Recall@10 | 0.2650 | # 1 | |
nDCG@10 | 0.1771 | # 1 | ||||
Multi-Media Recommendation | Tiktok | LightGT | Recall@10 | 0.1213 | # 1 | |
nDCG@10 | 0.0751 | # 1 |