no code implementations • 16 Oct 2023 • Qianli Ma, Haotian Zhou, Tingkai Liu, Jianbo Yuan, PengFei Liu, Yang You, Hongxia Yang
Recent years have seen considerable advancements in multi-step reasoning with Large Language Models (LLMs).
no code implementations • 16 Oct 2023 • Haotian Zhou, Tingkai Liu, Qianli Ma, Jianbo Yuan, PengFei Liu, Yang You, Hongxia Yang
In this paper, we introduce a new dimension in SFT data selection: learnability.
no code implementations • 8 Oct 2023 • Haogeng Liu, Qihang Fan, Tingkai Liu, Linjie Yang, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.
1 code implementation • 8 Oct 2023 • Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang
Finally, we benchmarked a wide range of current video-language models on DeVAn, and we aim for DeVAn to serve as a useful evaluation set in the age of large language models and complex multi-modal tasks.
1 code implementation • 5 Oct 2023 • Yiren Jian, Tingkai Liu, Yunzhe Tao, Chunhui Zhang, Soroush Vosoughi, Hongxia Yang
Our experimental findings demonstrate that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.