no code implementations • 19 May 2025 • Baohao Liao, Hanze Dong, Yuhui Xu, Doyen Sahoo, Christof Monz, Junnan Li, Caiming Xiong
Inference-time scaling techniques have significantly bolstered the reasoning capabilities of large language models (LLMs) by harnessing additional computational effort at inference without retraining.
1 code implementation • 15 May 2025 • Zhiyuan Hu, Yibo Wang, Hanze Dong, Yuhui Xu, Amrita Saha, Caiming Xiong, Bryan Hooi, Junnan Li
Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning.
1 code implementation • 8 May 2025 • Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong
Large reasoning models (LRMs) have achieved remarkable progress on complex tasks by generating extended chains of thought (CoT).
1 code implementation • 5 May 2025 • Jiarui Yao, Yifan Hao, Hanning Zhang, Hanze Dong, Wei Xiong, Nan Jiang, Tong Zhang
Chain-of-thought (CoT) reasoning in large language models (LLMs) can be formalized as a latent variable problem, where the model needs to generate intermediate reasoning steps.
1 code implementation • 15 Apr 2025 • Wei Xiong, Jiarui Yao, Yuhui Xu, Bo Pang, Lei Wang, Doyen Sahoo, Junnan Li, Nan Jiang, Tong Zhang, Caiming Xiong, Hanze Dong
In this work, we revisit GRPO from a reinforce-like algorithm perspective and analyze its core components.
no code implementations • 20 Feb 2025 • Yuhui Xu, Hanze Dong, Lei Wang, Caiming Xiong, Junnan Li
Reward models (RMs) play a crucial role in aligning large language models (LLMs) with human preferences and enhancing reasoning quality.
no code implementations • 6 Feb 2025 • Bo Pang, Hanze Dong, Jiacheng Xu, Silvio Savarese, Yingbo Zhou, Caiming Xiong
This paper introduces a novel approach to enable LLM's LongCoT capacity without distillation from o1-like models or expensive human annotations, where we bootstrap LongCoT (BOLT) from a standard instruct model.
no code implementations • 31 Jan 2025 • Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs).
2 code implementations • 20 Dec 2024 • Huaijie Wang, Shibo Hao, Hanze Dong, Shenao Zhang, Yilin Bao, Ziran Yang, Yi Wu
While Direct Preference Optimization (DPO) has shown promise in aligning LLMs with human preferences, it is less suitable for multi-step reasoning tasks because (1) DPO relies on paired preference data, which is not readily available for multi-step reasoning tasks, and (2) it treats all tokens uniformly, making it ineffective for credit assignment in multi-step reasoning tasks, which often come with sparse reward.
1 code implementation • 15 Dec 2024 • Hanning Zhang, Pengcheng Wang, Shizhe Diao, Yong Lin, Rui Pan, Hanze Dong, Dylan Zhang, Pavlo Molchanov, Tong Zhang
Our theoretical analysis shows that we could derive the optimal reward model from the initial policy sampling.
1 code implementation • 10 Oct 2024 • Zirui Zhao, Hanze Dong, Amrita Saha, Caiming Xiong, Doyen Sahoo
To mitigate hallucination and laziness in reasoning tasks, we propose Automatic Curriculum Expert Iteration (Auto-CEI) to enhance LLM reasoning and align responses to the model's capabilities--assertively answering within its limits and declining when tasks exceed them.
no code implementations • 7 Oct 2024 • Lei Wang, Shan Dong, Yuhui Xu, Hanze Dong, Yalu Wang, Amrita Saha, Ee-Peng Lim, Caiming Xiong, Doyen Sahoo
Although some recent benchmarks have been developed to evaluate the long-context capabilities of LLMs, there is a lack of benchmarks evaluating the mathematical reasoning abilities of LLMs over long contexts, which is crucial for LLMs' application in real-world scenarios.
1 code implementation • 22 Aug 2024 • Kashun Shum, Minrui Xu, Jianshu Zhang, Zixin Chen, Shizhe Diao, Hanze Dong, Jipeng Zhang, Muhammad Omer Raza
Then we further propose a brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher's knowledge to obtain a reliable language model in a cost-efficient way.
no code implementations • 30 Jul 2024 • Yuhui Xu, Zhanming Jie, Hanze Dong, Lei Wang, Xudong Lu, Aojun Zhou, Amrita Saha, Caiming Xiong, Doyen Sahoo
Large Language Models (LLMs) have revolutionized the field of natural language processing, achieving unprecedented performance across a variety of applications.
no code implementations • 27 May 2024 • Xunpeng Huang, Difan Zou, Yi-An Ma, Hanze Dong, Tong Zhang
Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems.
no code implementations • 26 May 2024 • Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yi-An Ma, Tong Zhang
To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs.
3 code implementations • 13 May 2024 • Hanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang
We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature.
no code implementations • 10 Mar 2024 • Xunpeng Huang, Hanze Dong, Difan Zou, Tong Zhang
Along this line, Freund et al. (2022) suggest that the modified Langevin algorithm with prior diffusion is able to converge dimension independently for strongly log-concave target distributions.
1 code implementation • 11 Feb 2024 • Chenlu Ye, Wei Xiong, Yuheng Zhang, Hanze Dong, Nan Jiang, Tong Zhang
We investigate Reinforcement Learning from Human Feedback (RLHF) in the context of a general preference oracle.
no code implementations • 12 Jan 2024 • Xunpeng Huang, Difan Zou, Hanze Dong, Yian Ma, Tong Zhang
Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimation.
1 code implementation • 5 Jan 2024 • Renjie Pi, Tianyang Han, Jianshu Zhang, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang
The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs.
3 code implementations • 18 Dec 2023 • Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang
We investigate its behavior in three distinct settings -- offline, online, and hybrid -- and propose efficient algorithms with finite-sample theoretical guarantees.
no code implementations • 29 Sep 2023 • Yong Lin, Lu Tan, Yifan Hao, Honam Wong, Hanze Dong, Weizhong Zhang, Yujiu Yang, Tong Zhang
Contrary to the conventional wisdom that focuses on learning invariant features for better OOD performance, our findings suggest that incorporating a large number of diverse spurious features weakens their individual contributions, leading to improved overall OOD generalization performance.
1 code implementation • 12 Sep 2023 • Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan YAO, Tong Zhang
Building on the analysis and the observation that averaging different layers of the transformer leads to significantly different alignment-forgetting trade-offs, we propose Heterogeneous Model Averaging (HMA) to Heterogeneously find various combination ratios of model layers.
no code implementations • 5 Jul 2023 • Xunpeng Huang, Hanze Dong, Yifan Hao, Yi-An Ma, Tong Zhang
We propose a Monte Carlo sampler from the reverse diffusion process.
1 code implementation • 21 Jun 2023 • Shizhe Diao, Rui Pan, Hanze Dong, Ka Shun Shum, Jipeng Zhang, Wei Xiong, Tong Zhang
As the number of available foundation models and specialized tasks keeps growing, the job of training scientific language models becomes highly nontrivial.
1 code implementation • 23 May 2023 • Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang
Overall, our proposed paradigm and DetGPT demonstrate the potential for more sophisticated and intuitive interactions between humans and machines.
1 code implementation • 13 Apr 2023 • Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang
Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently enhancing the model by fine-tuning on these filtered samples.
no code implementations • 2 Mar 2023 • Shihong Ding, Hanze Dong, Cong Fang, Zhouchen Lin, Tong Zhang
To circumvent this difficulty, we examine the problem of identifying a mixed Nash equilibrium, where strategies are randomized and characterized by probability distributions over continuous domains.
1 code implementation • 3 Jan 2023 • Yanwei Fu, Xiaomei Wang, Hanze Dong, Yu-Gang Jiang, Meng Wang, xiangyang xue, Leonid Sigal
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels.
no code implementations • 25 Nov 2022 • Hanze Dong, Xi Wang, Yong Lin, Tong Zhang
With the popularity of Stein variational gradient descent (SVGD), the focus of particle-based VI algorithms has been on the properties of functions in Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow.
1 code implementation • 21 Nov 2022 • Hanze Dong, Shizhe Diao, Weizhong Zhang, Tong Zhang
The resulting method is significantly more powerful than the standard normalization flow approach for generating data distributions with multiple modes.
no code implementations • 29 Sep 2022 • Songtao Liu, Rex Ying, Hanze Dong, Lu Lin, Jinghui Chen, Dinghao Wu
However, the analysis of implicit denoising effect in graph neural networks remains open.
no code implementations • CVPR 2022 • Yong Lin, Hanze Dong, Hao Wang, Tong Zhang
Generalization under distributional shift is an open challenge for machine learning.
1 code implementation • 8 Sep 2021 • Songtao Liu, Rex Ying, Hanze Dong, Lanqing Li, Tingyang Xu, Yu Rong, Peilin Zhao, Junzhou Huang, Dinghao Wu
To address this, we propose a simple and efficient data augmentation strategy, local augmentation, to learn the distribution of the node features of the neighbors conditioned on the central node's feature and enhance GNN's expressive power with generated features.
1 code implementation • 27 Dec 2020 • Cong Fang, Hanze Dong, Tong Zhang
Deep learning has received considerable empirical successes in recent years.
1 code implementation • 6 Oct 2020 • Xinwei Shen, Furui Liu, Hanze Dong, Qing Lian, Zhitang Chen, Tong Zhang
This paper proposes a Disentangled gEnerative cAusal Representation (DEAR) learning method under appropriate supervised information.
no code implementations • 11 Nov 2019 • Songtao Liu, Lingwei Chen, Hanze Dong, ZiHao Wang, Dinghao Wu, Zengfeng Huang
Graph Convolution Network (GCN) has been recognized as one of the most effective graph models for semi-supervised learning, but it extracts merely the first-order or few-order neighborhood information through information propagation, which suffers performance drop-off for deeper structure.
no code implementations • 25 Oct 2019 • Cong Fang, Hanze Dong, Tong Zhang
Recently, over-parameterized neural networks have been extensively analyzed in the literature.
no code implementations • ICLR 2019 • Hanze Dong, Yanwei Fu, Sung Ju Hwang, Leonid Sigal, xiangyang xue
This paper studies the problem of Generalized Zero-shot Learning (G-ZSL), whose goal is to classify instances belonging to both seen and unseen classes at the test time.
no code implementations • 28 May 2017 • Yanwei Fu, Hanze Dong, Yu-feng Ma, Zhengjun Zhang, xiangyang xue
To solve this problem, we propose the Extreme Value Learning (EVL) formulation to learn the mapping from visual feature to semantic space.