Search Results for author: Zhenbang Sun

Found 10 papers, 3 papers with code

Agent-Centric Personalized Multiple Clustering with Multi-Modal LLMs

no code implementations28 Mar 2025 Ziye Chen, Yiqun Duan, Riheng Zhu, Zhenbang Sun, Mingming Gong

To overcome these limitations, we propose an agent-centric personalized clustering framework that leverages multi-modal large language models (MLLMs) as agents to comprehensively traverse a relational graph to search for clusters based on user interests.

Clustering

Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding

no code implementations28 Jan 2025 Yun Li, Zhe Liu, Yajing Kong, Guangrui Li, Jiyuan Zhang, Chao Bian, Feng Liu, Lina Yao, Zhenbang Sun

Using STE, we systematically compare implicit and explicit temporal modeling across dimensions such as overall performance, token compression effectiveness, and temporal-specific understanding.

Decoder Video Understanding

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

no code implementations22 Nov 2024 Feng Chen, Chenhui Gou, Jing Liu, Yang Yang, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu

To address this, we introduce \textbf{AbilityLens}, a unified benchmark designed to evaluate MLLMs across six key perception abilities, focusing on both accuracy and stability, with each ability encompassing diverse question types, domains, and metrics.

Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment

no code implementations8 Sep 2023 Hongyu Hu, Tiancheng Lin, Jie Wang, Zhenbang Sun, Yi Xu

To achieve this, we introduce a pre-trained LLM to generate context descriptions, and we encourage the prompts to learn from the LLM's knowledge by alignment, as well as the alignment between prompts and local image features.

Language Modeling Language Modelling +2

CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning

no code implementations5 Sep 2023 Hongyu Hu, Jiyuan Zhang, Minyi Zhao, Zhenbang Sun

Nowadays, the research on Large Vision-Language Models (LVLMs) has been significantly promoted thanks to the success of Large Language Models (LLM).

Hallucination

HIRL: A General Framework for Hierarchical Image Representation Learning

1 code implementation26 May 2022 Minghao Xu, Yuanfan Guo, Xuanyu Zhu, Jiawen Li, Zhenbang Sun, Jian Tang, Yi Xu, Bingbing Ni

This framework aims to learn multiple semantic representations for each image, and these representations are structured to encode image semantics from fine-grained to coarse-grained.

Image Clustering Representation Learning +3

HCSC: Hierarchical Contrastive Selective Coding

2 code implementations CVPR 2022 Yuanfan Guo, Minghao Xu, Jiawen Li, Bingbing Ni, Xuanyu Zhu, Zhenbang Sun, Yi Xu

In this framework, a set of hierarchical prototypes are constructed and also dynamically updated to represent the hierarchical semantic structures underlying the data in the latent space.

Contrastive Learning Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.