Search Results for author: Shicheng Li

Found 21 papers, 11 papers with code

Rethinking Denoised Auto-Encoding in Language Pre-Training

no code implementations EMNLP 2021 Fuli Luo, Pengcheng Yang, Shicheng Li, Xuancheng Ren, Xu sun, Songfang Huang, Fei Huang

Pre-trained self-supervised models such as BERT have achieved striking success in learning sequence representations, especially for natural language processing.

Natural Language Understanding Sentence

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

2 code implementations24 Apr 2025 Linli Yao, Yicheng Li, Yuancheng Wei, Lei LI, Shuhuai Ren, Yuanxin Liu, Kun Ouyang, Lean Wang, Shicheng Li, Sida Li, Lingpeng Kong, Qi Liu, Yuanxing Zhang, Xu sun

Remarkably, our experiments demonstrate that DTD achieves an 82. 8% reduction in video tokens while maintaining 98% performance on StreamingBench, revealing that over 80% of visual content in streaming videos is naturally redundant without requiring language guidance.

MME Video MME +1

QAMA: Quantum annealing multi-head attention operator with classical deep learning framework

no code implementations15 Apr 2025 Peng Du, Shuolei Wang, Shicheng Li, Jinjing Shi

As large language models scale up, the conventional attention mechanism faces critical challenges of exponential growth in memory consumption and energy costs.

Computational Efficiency Specificity

TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment

1 code implementation21 Mar 2025 Shicheng Li, Lei LI, Kun Ouyang, Shuhuai Ren, Yuanxin Liu, Yuanxing Zhang, Fuzheng Zhang, Lingpeng Kong, Qi Liu, Xu sun

We further analyze the transferability of DPO data across architectures and the role of difficulty scheduling in optimization.

Scheduling

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

no code implementations16 Dec 2024 Kun Ouyang, Yuanxin Liu, Shicheng Li, Yi Liu, Hao Zhou, Fandong Meng, Jie zhou, Xu sun

To provide a comprehensive evaluation, PunchBench incorporates diverse question formats and image-captions from various domains.

Benchmarking Image Captioning +1

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

3 code implementations11 Jul 2024 Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Ziyu Guo, Shicheng Li, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Bin Wei, Shanghang Zhang, Peng Gao, Chunyuan Li, Hongsheng Li

The mathematical capabilities of Multi-modal Large Language Models (MLLMs) remain under-explored with three areas to be improved: visual encoding of math diagrams, diagram-language alignment, and chain-of-thought (CoT) reasoning.

Contrastive Learning Language Modelling +4

TempCompass: Do Video LLMs Really Understand Videos?

1 code implementation1 Mar 2024 Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei LI, Sishuo Chen, Xu sun, Lu Hou

Motivated by these two problems, we propose the \textbf{TempCompass} benchmark, which introduces a diversity of temporal aspects and task formats.

Diversity

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

2 code implementations CVPR 2024 Shuhuai Ren, Linli Yao, Shicheng Li, Xu sun, Lu Hou

This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.

Ranked #2 on Video-Text Retrieval on Test-of-Time (using extra training data)

Dense Captioning Highlight Detection +9

RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge

no code implementations14 Nov 2023 Yi Liu, Lianzhe Huang, Shicheng Li, Sishuo Chen, Hao Zhou, Fandong Meng, Jie zhou, Xu sun

Therefore, to evaluate the ability of LLMs to discern the reliability of external knowledge, we create a benchmark from existing knowledge bases.

counterfactual Knowledge Graphs +2

TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

1 code implementation29 Oct 2023 Shuhuai Ren, Sishuo Chen, Shicheng Li, Xu sun, Lu Hou

TESTA can reduce the number of visual tokens by 75% and thus accelerate video encoding.

 Ranked #1 on Video Retrieval on Condensed Movies (using extra training data)

Form Language Modelling +3

An Iterative Approach to Data-Driven Inference for Decoding Oscillator Network Structures

no code implementations24 Oct 2023 Shicheng Li, Bharat Singhal, Jr-Shin Li

In complex networks, interactions between multiple agents give rise to an array of intricate global dynamics, ranging from synchronization to cluster formations.

M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

no code implementations7 Jun 2023 Lei LI, Yuwei Yin, Shicheng Li, Liang Chen, Peiyi Wang, Shuhuai Ren, Mukai Li, Yazheng Yang, Jingjing Xu, Xu sun, Lingpeng Kong, Qi Liu

To tackle this challenge and promote research in the vision-language field, we introduce the Multi-Modal, Multilingual Instruction Tuning (M$^3$IT) dataset, designed to optimize VLM alignment with human instructions.

World Knowledge

CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations

no code implementations13 Oct 2020 Fuli Luo, Pengcheng Yang, Shicheng Li, Xuancheng Ren, Xu sun

Pre-trained self-supervised models such as BERT have achieved striking success in learning sequence representations, especially for natural language processing.

Natural Language Understanding Sentence

DCA: Diversified Co-Attention towards Informative Live Video Commenting

no code implementations7 Nov 2019 Zhihan Zhang, Zhiyi Yin, Shuhuai Ren, Xinhang Li, Shicheng Li

In this paper, we aim to collect diversified information from video and text for informative comment generation.

Comment Generation Metric Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.