1 code implementation • 30 Mar 2024 • Ruyang Liu, Chen Li, Haoran Tang, Yixiao Ge, Ying Shan, Ge Li
In this paper, we investigate a straightforward yet unexplored question: Can we feed all spatial-temporal tokens into the LLM, thus delegating the task of video sequence modeling to the LLMs?
1 code implementation • 25 Nov 2023 • Ruyang Liu, Jingjia Huang, Wei Gao, Thomas H. Li, Ge Li
Large-scale image-language pretrained models, e. g., CLIP, have demonstrated remarkable proficiency in acquiring general multi-modal knowledge through web-scale image-text data.
1 code implementation • 27 Sep 2023 • Ruyang Liu, Chen Li, Yixiao Ge, Ying Shan, Thomas H. Li, Ge Li
Without bells and whistles, BT-Adapter achieves (1) state-of-the-art zero-shot results on various video tasks using thousands of fewer GPU hours.
Ranked #5 on Zero-Shot Video Retrieval on LSMDC
Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +6
1 code implementation • ICLR 2023 • Ruyang Liu, Jingjia Huang, Ge Li, Thomas H. Li
Visual attention does not always capture the essential object representation desired for robust predictions.
Ranked #1 on Multi-Label Image Classification on MSCOCO
Multi-Label Classification Multi-Label Image Classification +1
1 code implementation • CVPR 2023 • Ruyang Liu, Jingjia Huang, Ge Li, Jiashi Feng, Xinglong Wu, Thomas H. Li
In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.
Ranked #7 on Video Retrieval on MSR-VTT-1kA (using extra training data)
1 code implementation • CVPR 2022 • Ruyang Liu, Hao liu, Ge Li, Haodi Hou, TingHao Yu, Tao Yang
As a common problem in the visual world, contextual bias means the recognition may depend on the co-occurrence context rather than the objects themselves, which is even more severe in multi-label tasks due to multiple targets and the absence of location.
Ranked #9 on Multi-Label Classification on MS-COCO