Search Results for author: Zikang Liu

Found 9 papers, 5 papers with code

Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models

no code implementations17 Feb 2025 Zikang Liu, Kun Zhou, Wayne Xin Zhao, Dawei Gao, Yaliang Li, Ji-Rong Wen

Despite the success, as visual instructions require images as the input, it would leave the gap in inheriting the task-solving capabilities from the backbone LLMs, and make it costly to collect a large-scale dataset.

visual instruction following Visual Reasoning

VRoPE: Rotary Position Embedding for Video Large Language Models

1 code implementation17 Feb 2025 Zikang Liu, Longteng Guo, Yepeng Tang, Junxian Cai, Kai Ma, Xi Chen, Jing Liu

Rotary Position Embedding (RoPE) has shown strong performance in text-based Large Language Models (LLMs), but extending it to video remains a challenge due to the intricate spatiotemporal structure of video frames.

Position Video Understanding

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

2 code implementations3 Jan 2025 Yifan Du, Zikang Liu, YiFan Li, Wayne Xin Zhao, Yuqi Huo, Bingning Wang, WeiPeng Chen, Zheng Liu, Zhongyuan Wang, Ji-Rong Wen

Moreover, it seems that such textual reasoning data can be even more effective than visual reasoning data in eliciting the slow-thinking capacities of MLLMs.

Language Modeling Language Modelling +1

Less is More: High-value Data Selection for Visual Instruction Tuning

no code implementations14 Mar 2024 Zikang Liu, Kun Zhou, Wayne Xin Zhao, Dawei Gao, Yaliang Li, Ji-Rong Wen

To investigate this issue, we conduct a series of empirical studies, which reveal a significant redundancy within the visual instruction datasets, and show that greatly reducing the amount of instructions from several tasks even do not affect the performance.

Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study

1 code implementation16 Jul 2023 Peiyu Liu, Zikang Liu, Ze-Feng Gao, Dawei Gao, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen

Different from previous studies focused on overall performance, this work aims to investigate the impact of quantization on \emph{emergent abilities}, which are important characteristics that distinguish LLMs from small language models.

In-Context Learning Instruction Following +1

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

no code implementations22 May 2023 Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng

Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.

 Ranked #1 on Visual Question Answering (VQA) on MSVD-QA (using extra training data)

Question Answering Text Retrieval +5

Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner

1 code implementation19 May 2023 Zikang Liu, Sihan Chen, Longteng Guo, Handong Li, Xingjian He, Jing Liu

In this paper, we propose a novel method called Joint QA and DC GEneration (JADE), which utilizes a pre-trained multimodal model and easily-crawled image-text pairs to automatically generate and filter large-scale VQA and dense captioning datasets.

Dense Captioning Image Captioning +4

A Survey of Vision-Language Pre-Trained Models

no code implementations18 Feb 2022 Yifan Du, Zikang Liu, Junyi Li, Wayne Xin Zhao

In this paper, we review the recent progress in Vision-Language Pre-Trained Models (VL-PTMs).

Survey

Cannot find the paper you are looking for? You can Submit a new open access paper.