Search Results for author: Tianshuo Peng

Found 7 papers, 6 papers with code

Video-R1: Reinforcing Video Reasoning in MLLMs

1 code implementation27 Mar 2025 Kaituo Feng, Kaixiong Gong, Bohao Li, Zonghao Guo, Yibing Wang, Tianshuo Peng, Benyou Wang, Xiangyu Yue

However, directly applying RL training with the GRPO algorithm to video reasoning presents two primary challenges: (i) a lack of temporal modeling for video reasoning, and (ii) the scarcity of high-quality video-reasoning data.

MVBench Reinforcement Learning (RL) +1

HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States

1 code implementation20 Feb 2025 Yilei Jiang, Xinyan Gao, Tianshuo Peng, Yingshui Tan, Xiaoyong Zhu, Bo Zheng, Xiangyu Yue

The integration of additional modalities increases the susceptibility of large vision-language models (LVLMs) to safety risks, such as jailbreak attacks, compared to their language-only counterparts.

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

2 code implementations16 Dec 2024 Renqiu Xia, Mingsheng Li, Hancheng Ye, Wenjie Wu, Hongbin Zhou, Jiakang Yuan, Tianshuo Peng, Xinyu Cai, Xiangchao Yan, Bin Wang, Conghui He, Botian Shi, Tao Chen, Junchi Yan, Bo Zhang

Given the significant differences between geometric diagram-symbol and natural image-text, we introduce unimodal pre-training to develop a diagram encoder and symbol decoder, enhancing the understanding of geometric images and corpora.

Geometry Problem Solving

Chimera: Improving Generalist Model with Domain-Specific Experts

no code implementations8 Dec 2024 Tianshuo Peng, Mingsheng Li, Hongbin Zhou, Renqiu Xia, Renrui Zhang, Lei Bai, Song Mao, Bin Wang, Conghui He, Aojun Zhou, Botian Shi, Tao Chen, Bo Zhang, Xiangyu Yue

This results in a versatile model that excels across the chart, table, math, and document domains, achieving state-of-the-art performance on multi-modal reasoning and visual content extraction tasks, both of which are challenging tasks for assessing existing LMMs.

Math model

Multi-modal Auto-regressive Modeling via Visual Words

1 code implementation12 Mar 2024 Tianshuo Peng, Zuchao Li, Lefei Zhang, Hai Zhao, Ping Wang, Bo Du

Large Language Models (LLMs), benefiting from the auto-regressive modelling approach performed on massive unannotated texts corpora, demonstrates powerful perceptual and reasoning capabilities.

Visual Question Answering

A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis

1 code implementation13 Dec 2023 Tianshuo Peng, Zuchao Li, Ping Wang, Lefei Zhang, Hai Zhao

However, previous methods still have certain limitations: (i) They ignore the difference in the focus of visual information between different analysis targets (aspect or sentiment).

Aspect-Based Sentiment Analysis Sentiment Analysis

FSUIE: A Novel Fuzzy Span Mechanism for Universal Information Extraction

1 code implementation19 Jun 2023 Tianshuo Peng, Zuchao Li, Lefei Zhang, Bo Du, Hai Zhao

To address these deficiencies, we propose the Fuzzy Span Universal Information Extraction (FSUIE) framework.

UIE

Cannot find the paper you are looking for? You can Submit a new open access paper.