Search Results for author: Linzhuang Sun

Found 17 papers, 10 papers with code

MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification

1 code implementation19 Feb 2025 Linzhuang Sun, Hao Liang, Jingxuan Wei, Bihui Yu, Tianpeng Li, Fan Yang, Zenan Zhou, Wentao Zhang

Finally, our approach achieves strong performance when combining MM-Reasoner and MM-Verifier, reaching an accuracy of 65. 3 on MathVista, surpassing GPT-4o (63. 8) with 12 rollouts.

Multimodal Reasoning

Synth-Empathy: Towards High-Quality Synthetic Empathy Data

1 code implementation31 Jul 2024 Hao Liang, Linzhuang Sun, Jingxuan Wei, Xijie Huang, Linkun Sun, Bihui Yu, Conghui He, Wentao Zhang

In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capabilities has become a crucial prerequisite.

Diversity

SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

1 code implementation30 Jul 2024 Zheng Liu, Hao Liang, Xijie Huang, Wentao Xiong, Qinhan Yu, Linzhuang Sun, Chong Chen, Conghui He, Bin Cui, Wentao Zhang

Crucially, our method's reliance on purely generated data ensures the preservation of privacy, achieving SoTA performance with just 100k data points (only 18% of the official dataset size).

Caption Generation Question Answering

KeyVideoLLM: Towards Large-scale Video Keyframe Selection

no code implementations3 Jul 2024 Hao Liang, Jiapeng Li, Tianyi Bai, Xijie Huang, Linzhuang Sun, Zhengren Wang, Conghui He, Bin Cui, Chong Chen, Wentao Zhang

Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important.

Data Compression Management +3

Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data

no code implementations2 Jul 2024 Linzhuang Sun, Hao Liang, Jingxuan Wei, Linkun Sun, Bihui Yu, Bin Cui, Wentao Zhang

By integrating sensibility and rationality data with a MoE structure, we achieve even higher performance, demonstrating the effectiveness of our Efficient-Empathy algorithm.

Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning

no code implementations31 May 2024 Cheng Tan, Jingxuan Wei, Linzhuang Sun, Zhangyang Gao, Siyuan Li, Bihui Yu, Ruifeng Guo, Stan Z. Li

Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases.

Answer Generation Multimodal Reasoning +2

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation

no code implementations23 Apr 2024 Jingxuan Wei, Linzhuang Sun, Yichong Leng, Xu Tan, Bihui Yu, Ruifeng Guo

To substantiate our hypothesis, we systematically analyze the performance of distillation methods by varying the model size of student models, the complexity of text, and the difficulty of decoding procedure.

Knowledge Distillation Machine Translation +1

Rational Sensibility: LLM Enhanced Empathetic Response Generation Guided by Self-presentation Theory

no code implementations14 Dec 2023 Linzhuang Sun, Yao Dong, Nan Xu, Jingxuan Wei, Bihui Yu, Yin Luo

However, the rationality information within the conversation is restricted, and previous methods of extending knowledge are subject to semantic conflict and single-role view.

Attribute Empathetic Response Generation +2

Unraveling Key Factors of Knowledge Distillation

no code implementations14 Dec 2023 Jingxuan Wei, Linzhuang Sun, Xu Tan, Bihui Yu, Ruifeng Guo

Knowledge distillation, a technique for model compression and performance enhancement, has gained significant traction in Neural Machine Translation (NMT).

Knowledge Distillation Machine Translation +3

Brain-inspired Computing Based on Deep Learning for Human-computer Interaction: A Review

1 code implementation12 Dec 2023 Bihui Yu, Sibo Zhang, Lili Zhou, Jingxuan Wei, Linzhuang Sun, Liping Bu

Focusing on the application scenarios of decoding text and speech from brain signals in human-computer interaction, this paper presents a comprehensive review of the brain-inspired computing models based on deep learning (DL), tracking its evolution, application value, challenges and potential research trends.

Deep Learning

Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

1 code implementation23 Nov 2023 Cheng Tan, Jingxuan Wei, Zhangyang Gao, Linzhuang Sun, Siyuan Li, Ruifeng Guo, Bihui Yu, Stan Z. Li

Remarkably, we show that even smaller base models, when equipped with our proposed approach, can achieve results comparable to those of larger models, illustrating the potential of our approach in harnessing the power of rationales for improved multimodal reasoning.

Multimodal Reasoning Science Question Answering +1

A Survey on Image-text Multimodal Models

1 code implementation23 Sep 2023 Ruifeng Guo, Jingxuan Wei, Linzhuang Sun, Bihui Yu, Guiyong Chang, Dawei Liu, Sibo Zhang, Zhengbing Yao, Mingjun Xu, Liping Bu

With the significant advancements of Large Language Models (LLMs) in the field of Natural Language Processing (NLP), the development of image-text multimodal models has garnered widespread attention.

Survey

Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework

1 code implementation24 Jul 2023 Jingxuan Wei, Cheng Tan, Zhangyang Gao, Linzhuang Sun, Siyuan Li, Bihui Yu, Ruifeng Guo, Stan Z. Li

Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence, especially when tackling complex tasks.

Contrastive Learning Multimodal Reasoning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.