Search Results for author: Tian Liang

Found 20 papers, 14 papers with code

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

no code implementations20 May 2025 Mengru Wang, Xingyu Chen, Yue Wang, Zhiwei He, Jiahao Xu, Tian Liang, Qiuzhi Liu, Yunzhi Yao, Wenxuan Wang, Ruotian Ma, Haitao Mi, Ningyu Zhang, Zhaopeng Tu, Xiaolong Li, Dong Yu

Mixture-of-Experts (MoE) architectures within Large Reasoning Models (LRMs) have achieved impressive reasoning capabilities by selectively activating experts to facilitate structured cognitive processes.

All Domain Generalization +2

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

1 code implementation19 May 2025 Xiaoyuan Liu, Tian Liang, Zhiwei He, Jiahao Xu, Wenxuan Wang, Pinjia He, Zhaopeng Tu, Haitao Mi, Dong Yu

Large Language Models (LLMs) show great promise in complex reasoning, with Reinforcement Learning with Verifiable Rewards (RLVR) being a key enhancement strategy.

Mathematical Reasoning

Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique

no code implementations21 Mar 2025 Yansi Li, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Qiuzhi Liu, Rui Wang, Zhuosheng Zhang, Zhaopeng Tu, Haitao Mi, Dong Yu

Traditional inference time scaling methods utilize scalar reward signals from process reward models to evaluate candidate reasoning steps, but these scalar rewards lack the nuanced qualitative information essential for understanding and justifying each step.

Decision Making

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

no code implementations30 Jan 2025 Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu

To address underthinking, we propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts, encouraging deeper exploration of each reasoning path.

All

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

no code implementations30 Dec 2024 Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu

The remarkable performance of models like the OpenAI o1 can be attributed to their ability to emulate human-like long-time thinking during inference.

GSM8K

Teaching LLMs to Refine with Tools

no code implementations22 Dec 2024 Dian Yu, Yuheng Zhang, Jiahao Xu, Tian Liang, Linfeng Song, Zhaopeng Tu, Haitao Mi, Dong Yu

We propose CaP, a novel approach that uses external tools to refine chain-of-thought (CoT) responses generated by the same or other LLMs.

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

1 code implementation29 Nov 2024 Zicheng Lin, Tian Liang, Jiahao Xu, Qiuzhi Lin, Xing Wang, Ruilin Luo, Chufan Shi, Siheng Li, Yujiu Yang, Zhaopeng Tu

Our results underscore the potential of leveraging critical tokens to reduce errors in reasoning tasks, advancing the development of AI systems capable of robust logical deduction.

GSM8K Math +1

Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding

1 code implementation27 Nov 2024 Ziyin Zhang, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Rui Wang, Zhaopeng Tu

Speculative Decoding (SD) has become an important technique in accelerating the inference speed of large language models.

8k

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

2 code implementations12 Jul 2024 Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu

DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence.

Position

CriticBench: Benchmarking LLMs for Critique-Correct Reasoning

1 code implementation22 Feb 2024 Zicheng Lin, Zhibin Gou, Tian Liang, Ruilin Luo, Haowei Liu, Yujiu Yang

Utilizing CriticBench, we evaluate and dissect the performance of 17 LLMs in generation, critique, and correction reasoning, i. e., GQC reasoning.

Benchmarking

Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model

no code implementations CVPR 2024 Tian Liang, Jing Huang, Ming Kong, Luyuan Chen, Qiang Zhu

Its core innovation is a novel modality-bridging method that allows a set of modality-specific queries to be input as soft prompts into a frozen pre-trained language model.

Language Modeling Language Modelling

Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models

1 code implementation31 Oct 2023 Tian Liang, Zhiwei He, Jen-tse Huang, Wenxuan Wang, Wenxiang Jiao, Rui Wang, Yujiu Yang, Zhaopeng Tu, Shuming Shi, Xing Wang

Ideally, an advanced agent should possess the ability to accurately describe a given word using an aggressive description while concurrently maximizing confusion in the conservative description, enhancing its participation in the game.

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

1 code implementation30 May 2023 Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, Zhaopeng Tu

To address the DoT problem, we propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.

Arithmetic Reasoning Machine Translation

Exploring Human-Like Translation Strategy with Large Language Models

2 code implementations6 May 2023 Zhiwei He, Tian Liang, Wenxiang Jiao, Zhuosheng Zhang, Yujiu Yang, Rui Wang, Zhaopeng Tu, Shuming Shi, Xing Wang

Compared to typical machine translation that focuses solely on source-to-target mapping, LLM-based translation can potentially mimic the human translation process which might take preparatory steps to ensure high-quality translation.

Hallucination Machine Translation +2

ParroT: Translating during Chat using Large Language Models tuned with Human Translation and Feedback

1 code implementation5 Apr 2023 Wenxiang Jiao, Jen-tse Huang, Wenxuan Wang, Zhiwei He, Tian Liang, Xing Wang, Shuming Shi, Zhaopeng Tu

Therefore, we propose ParroT, a framework to enhance and regulate the translation abilities during chat based on open-source LLMs (e. g., LLaMA), human-written translation and feedback data.

Instruction Following Machine Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.