no code implementations • 28 Mar 2025 • Ziye Chen, Yiqun Duan, Riheng Zhu, Zhenbang Sun, Mingming Gong
To overcome these limitations, we propose an agent-centric personalized clustering framework that leverages multi-modal large language models (MLLMs) as agents to comprehensively traverse a relational graph to search for clusters based on user interests.
no code implementations • 28 Jan 2025 • Yun Li, Zhe Liu, Yajing Kong, Guangrui Li, Jiyuan Zhang, Chao Bian, Feng Liu, Lina Yao, Zhenbang Sun
Using STE, we systematically compare implicit and explicit temporal modeling across dimensions such as overall performance, token compression effectiveness, and temporal-specific understanding.
no code implementations • 22 Nov 2024 • Feng Chen, Chenhui Gou, Jing Liu, Yang Yang, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu
To address this, we introduce \textbf{AbilityLens}, a unified benchmark designed to evaluate MLLMs across six key perception abilities, focusing on both accuracy and stability, with each ability encompassing diverse question types, domains, and metrics.
no code implementations • 4 Oct 2024 • Sicheng Yu, Chengkai Jin, Huanyu Wang, Zhenghao Chen, Sheng Jin, Zhongrong Zuo, Xiaolei Xu, Zhenbang Sun, Bingni Zhang, Jiawei Wu, Hao Zhang, Qianru Sun
Video Large Language Models (Video-LLMs) have made remarkable progress in video understanding tasks.
no code implementations • 22 Sep 2024 • Minyi Zhao, Jie Wang, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Shuigeng Zhou
may change model output and make the output hallucinate again.
no code implementations • 8 Sep 2023 • Hongyu Hu, Tiancheng Lin, Jie Wang, Zhenbang Sun, Yi Xu
To achieve this, we introduce a pre-trained LLM to generate context descriptions, and we encourage the prompts to learn from the LLM's knowledge by alignment, as well as the alignment between prompts and local image features.
no code implementations • 5 Sep 2023 • Hongyu Hu, Jiyuan Zhang, Minyi Zhao, Zhenbang Sun
Nowadays, the research on Large Vision-Language Models (LVLMs) has been significantly promoted thanks to the success of Large Language Models (LLM).
1 code implementation • 26 May 2022 • Minghao Xu, Yuanfan Guo, Xuanyu Zhu, Jiawen Li, Zhenbang Sun, Jian Tang, Yi Xu, Bingbing Ni
This framework aims to learn multiple semantic representations for each image, and these representations are structured to encode image semantics from fine-grained to coarse-grained.
2 code implementations • CVPR 2022 • Yuanfan Guo, Minghao Xu, Jiawen Li, Bingbing Ni, Xuanyu Zhu, Zhenbang Sun, Yi Xu
In this framework, a set of hierarchical prototypes are constructed and also dynamically updated to represent the hierarchical semantic structures underlying the data in the latent space.
1 code implementation • ICCV 2021 • Minghao Xu, Hang Wang, Bingbing Ni, Riheng Zhu, Zhenbang Sun, Changhu Wang
For tackling such practical problem, we propose a Dual-Learner-based Video Highlight Detection (DL-VHD) framework.