Search Results for author: Weihan Cao

Found 2 papers, 2 papers with code

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

1 code implementation10 Feb 2025 Chengqi Lyu, Songyang Gao, Yuzhe Gu, Wenwei Zhang, Jianfei Gao, Kuikun Liu, Ziyi Wang, Shuaibin Li, Qian Zhao, Haian Huang, Weihan Cao, Jiangning Liu, Hongwei Liu, Junnan Liu, Songyang Zhang, Dahua Lin, Kai Chen

To alleviate the long-existing difficulties brought by sparse rewards in RL, which are even exacerbated by the partial correctness of the long chain of thought for reasoning tasks, we further apply a token-level reward model to sample important tokens in reasoning trajectories for learning.

Math Mathematical Reasoning +1

PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient

1 code implementation5 Jul 2022 Weihan Cao, Yifan Zhang, Jianfei Gao, Anda Cheng, Ke Cheng, Jian Cheng

First, the difference in feature magnitude between the teacher and the student could enforce overly strict constraints on the student.

Knowledge Distillation object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.