Multimodal Recommendation

31 papers with code • 5 benchmarks • 6 datasets

The multimodal recommendation task involves developing systems that leverage and integrate multiple types of data—such as text, images, audio, and user interactions—to predict and suggest items that align with a user's preferences. Unlike traditional recommendation approaches that rely on a single data modality, multimodal recommendation harnesses the diverse information from various sources to create richer and more nuanced representations of both users and items. This integration enables the system to understand and capture complex relationships and attributes across different data types, thereby enhancing the accuracy and relevance of the recommendations. The primary goal is to provide personalized suggestions by effectively merging and processing heterogeneous data to better match users with items they are likely to engage with or find valuable.

Libraries

Use these libraries to find Multimodal Recommendation models and implementations
2 papers
428

Most implemented papers

A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation

enoche/freedom 13 Nov 2022

Based on this finding, we propose a simple yet effective model, dubbed as FREEDOM, that FREEzes the item-item graph and DenOises the user-item interaction graph simultaneously for Multimodal recommendation.

A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions

enoche/mmrec 9 Feb 2023

Recommendation systems have become popular and effective tools to help users discover their interesting items by modeling the user preference and item property based on implicit interactions (e. g., purchasing and clicking).

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video

weiyinwei/mmgcn ACM International Conference on Multimedia 2019

Existing works on multimedia recommendation largely exploit multi-modal contents to enrich item representations, while less effort is made to leverage information interchange between users and items to enhance user representations and further capture user's fine-grained preferences on different modalities.

Mining Latent Structures for Multimedia Recommendation

CRIPAC-DIG/LATTICE 19 Apr 2021

To be specific, in the proposed LATTICE model, we devise a novel modality-aware structure learning layer, which learns item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs.

Enhancing Dyadic Relations with Homogeneous Graphs for Multimodal Recommendation

hongyurain/DRAGON 28 Jan 2023

On top of the finding, we propose a model that enhances the dyadic relations by learning Dual RepresentAtions of both users and items via constructing homogeneous Graphs for multimOdal recommeNdation.

MMRec: Simplifying Multimodal Recommendation

enoche/mmrec 2 Feb 2023

This paper presents an open-source toolbox, MMRec for multimodal recommendation.

Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark

LYX0501/SURE 26 May 2023

Existing multimodal task-oriented dialog data fails to demonstrate the diverse expressions of user subjective preferences and recommendation acts in the real-life shopping scenario.

Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation

sisinflab/ducho 29 Jun 2023

Motivated by the outlined aspects, we propose \framework, a unified framework for the extraction of multimodal features in recommendation.

LightGT: A Light Graph Transformer for Multimedia Recommendation

Liuwq-bit/LightGT SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023

Considering its challenges in effectiveness and efficiency, we propose a novel Transformer-based recommendation model, termed as Light Graph Transformer model (LightGT).

Semantic-Guided Feature Distillation for Multimodal Recommendation

huilinchenjn/sgfd 6 Aug 2023

The teacher model first extracts rich modality features from the generic modality feature by considering both the semantic information of items and the complementary information of multiple modalities.