Search Results for author: Shuhuai Ren

Found 19 papers, 16 papers with code

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

1 code implementation • 16 Apr 2024 • Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu sun

Diffusion models have exhibited remarkable capabilities in text-to-image generation.

Image Captioning Text Generation +1

Paper
Code

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

1 code implementation • 28 Mar 2024 • Sishuo Chen, Lei LI, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu sun, Lu Hou

Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries.

Data Augmentation Video Understanding

Paper
Code

TempCompass: Do Video LLMs Really Understand Videos?

1 code implementation • 1 Mar 2024 • Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei LI, Sishuo Chen, Xu sun, Lu Hou

Motivated by these two problems, we propose the \textbf{TempCompass} benchmark, which introduces a diversity of temporal aspects and task formats.

Paper
Code

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

1 code implementation • 21 Feb 2024 • Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Xiangdi Meng, Tianyu Liu, Baobao Chang

To address this, we introduce Embodied-Instruction-Evolution (EIE), an automatic framework for synthesizing instruction tuning examples in multimodal embodied environments.

Autonomous Driving Decision Making

Paper
Code

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

1 code implementation • 4 Dec 2023 • Shuhuai Ren, Linli Yao, Shicheng Li, Xu sun, Lu Hou

This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.

Dense Captioning Highlight Detection +5

168

Paper
Code

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

1 code implementation • 29 Nov 2023 • Shicheng Li, Lei LI, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu sun, Lu Hou

The ability to perceive how objects change over time is a crucial ingredient in human intelligence.

counterfactual

Paper
Code

FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation

1 code implementation • NeurIPS 2023 • Yuanxin Liu, Lei LI, Shuhuai Ren, Rundong Gao, Shicheng Li, Sishuo Chen, Xu sun, Lu Hou

The multi-aspect categorization of FETV enables fine-grained analysis of the metrics' reliability in different scenarios.

Text-to-Video Generation Video Generation

Paper
Code

TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

1 code implementation • 29 Oct 2023 • Shuhuai Ren, Sishuo Chen, Shicheng Li, Xu sun, Lu Hou

TESTA can reduce the number of visual tokens by 75% and thus accelerate video encoding.

Ranked #1 on Video Retrieval on Condensed Movies (using extra training data)

Language Modelling Retrieval +2

Paper
Code

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

1 code implementation • 3 Oct 2023 • Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Tianyu Liu, Baobao Chang

In this study, we explore the potential of Multimodal Large Language Models (MLLMs) in improving embodied decision-making processes for agents.

Decision Making Language Modelling +2

Paper
Code

M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

no code implementations • 7 Jun 2023 • Lei LI, Yuwei Yin, Shicheng Li, Liang Chen, Peiyi Wang, Shuhuai Ren, Mukai Li, Yazheng Yang, Jingjing Xu, Xu sun, Lingpeng Kong, Qi Liu

To tackle this challenge and promote research in the vision-language field, we introduce the Multi-Modal, Multilingual Instruction Tuning (M$^3$IT) dataset, designed to optimize VLM alignment with human instructions.

World Knowledge

Paper
Add Code

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

1 code implementation • NeurIPS 2023 • Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu sun

This work proposes POMP, a prompt pre-training method for vision-language models.

Ranked #1 on Open Vocabulary Semantic Segmentation on COCO-Stuff-171

Image Classification object-detection +3

244

Paper
Code

Delving into the Openness of CLIP

1 code implementation • 4 Jun 2022 • Shuhuai Ren, Lei LI, Xuancheng Ren, Guangxiang Zhao, Xu sun

However, evaluating the openness of CLIP-like models is challenging, as the models are open to arbitrary vocabulary in theory, but their accuracy varies in practice.

Image Classification Text Matching

Paper
Code

CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

no code implementations • 27 Dec 2021 • Yuan YAO, Qingxiu Dong, Jian Guan, Boxi Cao, Zhengyan Zhang, Chaojun Xiao, Xiaozhi Wang, Fanchao Qi, Junwei Bao, Jinran Nie, Zheni Zeng, Yuxian Gu, Kun Zhou, Xuancheng Huang, Wenhao Li, Shuhuai Ren, Jinliang Lu, Chengqiang Xu, Huadong Wang, Guoyang Zeng, Zile Zhou, Jiajun Zhang, Juanzi Li, Minlie Huang, Rui Yan, Xiaodong He, Xiaojun Wan, Xin Zhao, Xu sun, Yang Liu, Zhiyuan Liu, Xianpei Han, Erhong Yang, Zhifang Sui, Maosong Sun

We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic.

Paper
Add Code

Dynamic Knowledge Distillation for Pre-trained Language Models

1 code implementation • EMNLP 2021 • Lei LI, Yankai Lin, Shuhuai Ren, Peng Li, Jie zhou, Xu sun

Knowledge distillation~(KD) has been proved effective for compressing large-scale pre-trained language models.

Knowledge Distillation

Paper
Code

Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

1 code implementation • EMNLP 2021 • Shuhuai Ren, Jinchao Zhang, Lei LI, Xu sun, Jie zhou

Data augmentation aims to enrich training samples for alleviating the overfitting issue in low-resource or class-imbalanced situations.

Bayesian Optimization Data Augmentation +2

124

Paper
Code

Learning Relation Alignment for Calibrated Cross-modal Retrieval

1 code implementation • ACL 2021 • Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu sun, Hongxia Yang

To bridge the semantic gap between the two modalities, previous studies mainly focus on word-region alignment at the object level, lacking the matching between the linguistic relation among the words and the visual relation among the regions.

Ranked #4 on Image-to-Text Retrieval on MS COCO

Cross-Modal Retrieval Image-to-Text Retrieval +4

Paper
Code

CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade

1 code implementation • Findings (EMNLP) 2021 • Lei LI, Yankai Lin, Deli Chen, Shuhuai Ren, Peng Li, Jie zhou, Xu sun

On the other hand, the exiting decisions made by internal classifiers are unreliable, leading to wrongly emitted early predictions.

Knowledge Distillation Model Selection

Paper
Code

DCA: Diversified Co-Attention towards Informative Live Video Commenting

no code implementations • 7 Nov 2019 • Zhihan Zhang, Zhiyi Yin, Shuhuai Ren, Xinhang Li, Shicheng Li

In this paper, we aim to collect diversified information from video and text for informative comment generation.

Comment Generation Metric Learning

Paper
Add Code

Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency

1 code implementation • ACL 2019 • Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che

Experiments on three popular datasets using convolutional as well as LSTM models show that PWWS reduces the classification accuracy to the most extent, and keeps a very low word substitution rate.

Adversarial Attack General Classification +5

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.