Search Results for author: Ruyang Liu

Found 6 papers, 6 papers with code

ST-LLM: Large Language Models Are Effective Temporal Learners

1 code implementation30 Mar 2024 Ruyang Liu, Chen Li, Haoran Tang, Yixiao Ge, Ying Shan, Ge Li

In this paper, we investigate a straightforward yet unexplored question: Can we feed all spatial-temporal tokens into the LLM, thus delegating the task of video sequence modeling to the LLMs?

Reading Comprehension Video Understanding

Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding

1 code implementation25 Nov 2023 Ruyang Liu, Jingjia Huang, Wei Gao, Thomas H. Li, Ge Li

Large-scale image-language pretrained models, e. g., CLIP, have demonstrated remarkable proficiency in acquiring general multi-modal knowledge through web-scale image-text data.

Video Understanding

Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring

1 code implementation CVPR 2023 Ruyang Liu, Jingjia Huang, Ge Li, Jiashi Feng, Xinglong Wu, Thomas H. Li

In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.

Ranked #7 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Representation Learning Retrieval +3

Contextual Debiasing for Visual Recognition With Causal Mechanisms

1 code implementation CVPR 2022 Ruyang Liu, Hao liu, Ge Li, Haodi Hou, TingHao Yu, Tao Yang

As a common problem in the visual world, contextual bias means the recognition may depend on the co-occurrence context rather than the objects themselves, which is even more severe in multi-label tasks due to multiple targets and the absence of location.

Causal Inference counterfactual +2

Cannot find the paper you are looking for? You can Submit a new open access paper.