1 code implementation • 5 Jun 2024 • Tingjia Shen, Hao Wang, Jiaqing Zhang, Sirui Zhao, Liangyue Li, Zulong Chen, Defu Lian, Enhong Chen
To this end, we propose a novel framework named URLLM, which aims to improve the CDSR performance by exploring the User Retrieval approach and domain grounding on LLM simultaneously.
no code implementations • 31 May 2024 • Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei LI, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun
With Video-MME, we extensively evaluate various state-of-the-art MLLMs, including GPT-4 series and Gemini 1. 5 Pro, as well as open-source image models like InternVL-Chat-V1. 5 and video models like LLaVA-NeXT-Video.
1 code implementation • 28 May 2024 • Mingjia Yin, Hao Wang, Wei Guo, Yong liu, Suojuan Zhang, Sirui Zhao, Defu Lian, Enhong Chen
The sequential recommender (SR) system is a crucial component of modern recommender systems, as it aims to capture the evolving preferences of users.
no code implementations • 21 May 2024 • Mingjia Yin, Hao Wang, Wei Guo, Yong liu, Zhi Li, Sirui Zhao, Zhen Wang, Defu Lian, Enhong Chen
Cross-domain sequential recommendation (CDSR) aims to uncover and transfer users' sequential preferences across multiple recommendation domains.
2 code implementations • 19 Dec 2023 • Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun
They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.
1 code implementation • 6 Nov 2023 • Mingjia Yin, Hao Wang, Xiang Xu, Likang Wu, Sirui Zhao, Wei Guo, Yong liu, Ruiming Tang, Defu Lian, Enhong Chen
To this end, we propose a graph-driven framework, named Adaptive and Personalized Graph Learning for Sequential Recommendation (APGL4SR), that incorporates adaptive and personalized global collaborative information into sequential recommendation systems.
1 code implementation • 24 Oct 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen
Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content.
1 code implementation • 26 Jun 2023 • Chao Zhang, Shiwei Wu, Sirui Zhao, Tong Xu, Enhong Chen
In this paper, we present a solution for enhancing video alignment to improve multi-step inference.
1 code implementation • 23 Jun 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen
Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks.
1 code implementation • 16 Mar 2023 • Shukang Yin, Shiwei Wu, Tong Xu, Shifeng Liu, Sirui Zhao, Enhong Chen
Automatic Micro-Expression (ME) spotting in long videos is a crucial step in ME analysis but also a challenging task due to the short duration and low intensity of MEs.
no code implementations • 3 Jan 2023 • Sirui Zhao, Huaying Tang, Xinglong Mao, Shifeng Liu, Hanqing Tao, Hao Wang, Tong Xu, Enhong Chen
To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7, 526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years.