Search Results for author: Shuhuai Ren

Found 19 papers, 16 papers with code

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

1 code implementation28 Mar 2024 Sishuo Chen, Lei LI, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu sun, Lu Hou

Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries.

Data Augmentation Video Understanding

TempCompass: Do Video LLMs Really Understand Videos?

1 code implementation1 Mar 2024 Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei LI, Sishuo Chen, Xu sun, Lu Hou

Motivated by these two problems, we propose the \textbf{TempCompass} benchmark, which introduces a diversity of temporal aspects and task formats.

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

1 code implementation21 Feb 2024 Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Xiangdi Meng, Tianyu Liu, Baobao Chang

To address this, we introduce Embodied-Instruction-Evolution (EIE), an automatic framework for synthesizing instruction tuning examples in multimodal embodied environments.

Autonomous Driving Decision Making

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

1 code implementation4 Dec 2023 Shuhuai Ren, Linli Yao, Shicheng Li, Xu sun, Lu Hou

This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.

Dense Captioning Highlight Detection +5

TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

1 code implementation29 Oct 2023 Shuhuai Ren, Sishuo Chen, Shicheng Li, Xu sun, Lu Hou

TESTA can reduce the number of visual tokens by 75% and thus accelerate video encoding.

 Ranked #1 on Video Retrieval on Condensed Movies (using extra training data)

Language Modelling Retrieval +2

M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

no code implementations7 Jun 2023 Lei LI, Yuwei Yin, Shicheng Li, Liang Chen, Peiyi Wang, Shuhuai Ren, Mukai Li, Yazheng Yang, Jingjing Xu, Xu sun, Lingpeng Kong, Qi Liu

To tackle this challenge and promote research in the vision-language field, we introduce the Multi-Modal, Multilingual Instruction Tuning (M$^3$IT) dataset, designed to optimize VLM alignment with human instructions.

World Knowledge

Delving into the Openness of CLIP

1 code implementation4 Jun 2022 Shuhuai Ren, Lei LI, Xuancheng Ren, Guangxiang Zhao, Xu sun

However, evaluating the openness of CLIP-like models is challenging, as the models are open to arbitrary vocabulary in theory, but their accuracy varies in practice.

Image Classification Text Matching

Dynamic Knowledge Distillation for Pre-trained Language Models

1 code implementation EMNLP 2021 Lei LI, Yankai Lin, Shuhuai Ren, Peng Li, Jie zhou, Xu sun

Knowledge distillation~(KD) has been proved effective for compressing large-scale pre-trained language models.

Knowledge Distillation

Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

1 code implementation EMNLP 2021 Shuhuai Ren, Jinchao Zhang, Lei LI, Xu sun, Jie zhou

Data augmentation aims to enrich training samples for alleviating the overfitting issue in low-resource or class-imbalanced situations.

Bayesian Optimization Data Augmentation +2

Learning Relation Alignment for Calibrated Cross-modal Retrieval

1 code implementation ACL 2021 Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu sun, Hongxia Yang

To bridge the semantic gap between the two modalities, previous studies mainly focus on word-region alignment at the object level, lacking the matching between the linguistic relation among the words and the visual relation among the regions.

Cross-Modal Retrieval Image-to-Text Retrieval +4

DCA: Diversified Co-Attention towards Informative Live Video Commenting

no code implementations7 Nov 2019 Zhihan Zhang, Zhiyi Yin, Shuhuai Ren, Xinhang Li, Shicheng Li

In this paper, we aim to collect diversified information from video and text for informative comment generation.

Comment Generation Metric Learning

Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency

1 code implementation ACL 2019 Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che

Experiments on three popular datasets using convolutional as well as LSTM models show that PWWS reduces the classification accuracy to the most extent, and keeps a very low word substitution rate.

Adversarial Attack General Classification +5

Cannot find the paper you are looking for? You can Submit a new open access paper.