Search Results for author: Ruyang Liu

Found 6 papers, 6 papers with code

ST-LLM: Large Language Models Are Effective Temporal Learners

1 code implementation • 30 Mar 2024 • Ruyang Liu, Chen Li, Haoran Tang, Yixiao Ge, Ying Shan, Ge Li

In this paper, we investigate a straightforward yet unexplored question: Can we feed all spatial-temporal tokens into the LLM, thus delegating the task of video sequence modeling to the LLMs?

Ranked #1 on Video-based Generative Performance Benchmarking (Correctness of Information) on VideoInstruct

Reading Comprehension Video Understanding

Paper
Code

Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding

1 code implementation • 25 Nov 2023 • Ruyang Liu, Jingjia Huang, Wei Gao, Thomas H. Li, Ge Li

Large-scale image-language pretrained models, e. g., CLIP, have demonstrated remarkable proficiency in acquiring general multi-modal knowledge through web-scale image-text data.

Video Understanding

Paper
Code

One For All: Video Conversation is Feasible Without Video Instruction Tuning

1 code implementation • 27 Sep 2023 • Ruyang Liu, Chen Li, Yixiao Ge, Ying Shan, Thomas H. Li, Ge Li

Without bells and whistles, BT-Adapter achieves (1) state-of-the-art zero-shot results on various video tasks using thousands of fewer GPU hours.

Ranked #5 on Zero-Shot Video Retrieval on LSMDC

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +6

Paper
Code

Causality Compensated Attention for Contextual Biased Visual Recognition

1 code implementation • ICLR 2023 • Ruyang Liu, Jingjia Huang, Ge Li, Thomas H. Li

Visual attention does not always capture the essential object representation desired for robust predictions.

Ranked #1 on Multi-Label Image Classification on MSCOCO

Multi-Label Classification Multi-Label Image Classification +1

Paper
Code

Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring

1 code implementation • CVPR 2023 • Ruyang Liu, Jingjia Huang, Ge Li, Jiashi Feng, Xinglong Wu, Thomas H. Li

In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.

Ranked #7 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Representation Learning Retrieval +3

Paper
Code

Contextual Debiasing for Visual Recognition With Causal Mechanisms

1 code implementation • CVPR 2022 • Ruyang Liu, Hao liu, Ge Li, Haodi Hou, TingHao Yu, Tao Yang

As a common problem in the visual world, contextual bias means the recognition may depend on the co-occurrence context rather than the objects themselves, which is even more severe in multi-label tasks due to multiple targets and the absence of location.

Ranked #9 on Multi-Label Classification on MS-COCO

Causal Inference counterfactual +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.