Search Results for author: Zhiheng Xi

Found 30 papers, 21 papers with code

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

1 code implementation19 Mar 2025 Shuo Li, Jiajun Sun, Guodong Zheng, Xiaoran Fan, Yujiong Shen, Yi Lu, Zhiheng Xi, Yuming Yang, Wenming Tan, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

Recently, multimodal large language models (MLLMs) have demonstrated remarkable performance in visual-language tasks.

Better Process Supervision with Bi-directional Rewarding Signals

no code implementations6 Mar 2025 Wenxiang Chen, wei he, Zhiheng Xi, Honglin Guo, Boyang Hong, Jiazheng Zhang, Rui Zheng, Nijun Li, Tao Gui, Yun Li, Qi Zhang, Xuanjing Huang

Process supervision, i. e., evaluating each step, is critical for complex large language model (LLM) reasoning and test-time searching with increased inference compute.

Large Language Model Math +1

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

1 code implementation20 Jan 2025 Siyu Yuan, Zehui Chen, Zhiheng Xi, Junjie Ye, Zhengyin Du, Jiecao Chen

To further explore the scalability of this self-improvement paradigm, we investigate iterative refinement of both error correction capabilities and dataset construction.

Language Modeling Language Modelling

ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use

no code implementations5 Jan 2025 Junjie Ye, Zhengyin Du, Xuesong Yao, Weijian Lin, Yufei Xu, Zehui Chen, Zaiyuan Wang, Sining Zhu, Zhiheng Xi, Siyu Yuan, Tao Gui, Qi Zhang, Xuanjing Huang, Jiecao Chen

Effective evaluation of multi-hop tool use is critical for analyzing the understanding, reasoning, and function-calling capabilities of large language models (LLMs).

Code Generation

Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling

1 code implementation1 Nov 2024 Yiwen Ding, Zhiheng Xi, wei he, Zhuoyuan Li, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang

Self-improvement methods enable large language models (LLMs) to generate solutions themselves and iteratively train on filtered, high-quality rationales.

Multi-Programming Language Sandbox for LLMs

1 code implementation30 Oct 2024 Shihan Dou, Jiazheng Zhang, Jianxiang Zang, Yunbo Tao, Weikang Zhou, Haoxiang Jia, Shichun Liu, Yuming Yang, Zhiheng Xi, Shenxi Wu, Shaoqing Zhang, Muling Wu, Changze Lv, Limao Xiong, WenYu Zhan, Lin Zhang, Rongxiang Weng, Jingang Wang, Xunliang Cai, Yueming Wu, Ming Wen, Rui Zheng, Tao Ji, Yixin Cao, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

We introduce MPLSandbox, an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for Large Language Models (LLMs).

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

2 code implementations24 Oct 2024 wei he, Zhiheng Xi, Wanxu Zhao, Xiaoran Fan, Yiwen Ding, Zifei Shan, Tao Gui, Qi Zhang, Xuanjing Huang

Specifically, we employ text-based synthesizing techniques to construct chart-plotting code and produce ReachQA, a dataset containing 3k reasoning-intensive charts and 20k Q&A pairs to enhance both recognition and reasoning abilities.

Multimodal Reasoning Visual Reasoning

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

no code implementations15 Oct 2024 Shuo Li, Tao Ji, Xiaoran Fan, Linsheng Lu, Leyi Yang, Yuming Yang, Zhiheng Xi, Rui Zheng, Yuran Wang, Xiaohui Zhao, Tao Gui, Qi Zhang, Xuanjing Huang

Our findings indicate that the ability to prevent sycophancy is predominantly observed in higher layers of the model.

Hallucination

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

1 code implementation13 Oct 2024 Enyu Zhou, Guodong Zheng, Binghai Wang, Zhiheng Xi, Shihan Dou, Rong Bao, Wei Shen, Limao Xiong, Jessica Fan, Yurong Mou, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

However, the current evaluation of RMs may not directly correspond to their alignment performance due to the limited distribution of evaluation data and evaluation methods that are not closely related to alignment objectives.

Benchmarking

Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data

no code implementations27 Aug 2024 Han Xia, Songyang Gao, Qiming Ge, Zhiheng Xi, Qi Zhang, Xuanjing Huang

Reinforcement Learning from Human Feedback (RLHF) has proven effective in aligning large language models with human intentions, yet it often relies on complex methodologies like Proximal Policy Optimization (PPO) that require extensive hyper-parameter tuning and present challenges in sample efficiency and stability.

reinforcement-learning Reinforcement Learning

Toward Optimal LLM Alignments Using Two-Player Games

1 code implementation16 Jun 2024 Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang, Hang Li, Yang Liu

We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.

reinforcement-learning Reinforcement Learning

Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models

1 code implementation1 Apr 2024 wei he, Shichun Liu, Jun Zhao, Yiwen Ding, Yi Lu, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang

The generated demos strategically interpolate between existing demos and the given query, transforming the query from OOD to ID.

In-Context Learning Math

Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals

no code implementations24 Mar 2024 Rui Zheng, Yuhao Zhou, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang

We first empirically show that the features of either clean signals or adversarial perturbations are redundant and span in low-dimensional linear subspaces respectively with minimal overlap, and the classical low-dimensional subspace projection can suppress perturbation features out of the subspace of clean signals.

Adversarial Defense

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions

no code implementations26 Feb 2024 Yuansen Zhang, Xiao Wang, Zhiheng Xi, Han Xia, Tao Gui, Qi Zhang, Xuanjing Huang

In this paper, drawing inspiration from recent works that LLMs are sensitive to the design of the instructions, we utilize instructions in code style, which are more structural and less ambiguous, to replace typically natural language instructions.

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

1 code implementation8 Feb 2024 Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, wei he, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang

In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.

GSM8K reinforcement-learning +1

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

1 code implementation15 Dec 2023 Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, ShiLiang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks.

Language Modelling Mixture-of-Experts +2

RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior Mechanisms

no code implementations17 Oct 2023 Enyu Zhou, Rui Zheng, Zhiheng Xi, Songyang Gao, Xiaoran Fan, Zichu Fei, Jingting Ye, Tao Gui, Qi Zhang, Xuanjing Huang

Reports of human-like behaviors in foundation models are growing, with psychological theories providing enduring tools to investigate these behaviors.

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement

1 code implementation23 May 2023 Zhiheng Xi, Senjie Jin, Yuhao Zhou, Rui Zheng, Songyang Gao, Tao Gui, Qi Zhang, Xuanjing Huang

To enhance the multi-step reasoning capabilities of large language models, researchers have extensively explored prompting methods, notably the Chain-of-Thought (CoT) method which explicitly elicits human-like rationales.

GSM8K

Efficient Adversarial Training with Robust Early-Bird Tickets

1 code implementation14 Nov 2022 Zhiheng Xi, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Adversarial training is one of the most powerful methods to improve the robustness of pre-trained language models (PLMs).

Cannot find the paper you are looking for? You can Submit a new open access paper.