1 code implementation • 18 May 2025 • Jingyue Gao, Runji Lin, Keming Lu, Bowen Yu, Junyang Lin, Jianyu Chen
These results demonstrate MARGE's effectiveness in enhancing mathematical reasoning capabilities and unlocking the potential of scaling self-generated training data.
1 code implementation • 15 May 2025 • Binghai Wang, Runji Lin, Keming Lu, Le Yu, Zhenru Zhang, Fei Huang, Chujie Zheng, Kai Dang, Yang Fan, Xingzhang Ren, An Yang, Binyuan Hui, Dayiheng Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Bowen Yu, Jingren Zhou, Junyang Lin
Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling.
1 code implementation • 24 Feb 2025 • Qin Zhu, Fei Huang, Runyu Peng, Keming Lu, Bowen Yu, Qinyuan Cheng, Xipeng Qiu, Xuanjing Huang, Junyang Lin
While logical reasoning evaluation of Large Language Models (LLMs) has attracted significant attention, existing benchmarks predominantly rely on multiple-choice formats that are vulnerable to random guessing, leading to overestimated performance and substantial performance fluctuations.
6 code implementations • 19 Dec 2024 • Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, TianHao Li, Tianyi Tang, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan Qiu
In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2. 5-Turbo and Qwen2. 5-Plus, both available from Alibaba Cloud Model Studio.
Ranked #7 on
on GPQA
1 code implementation • 9 Dec 2024 • Chujie Zheng, Zhenru Zhang, Beichen Zhang, Runji Lin, Keming Lu, Bowen Yu, Dayiheng Liu, Jingren Zhou, Junyang Lin
We conduct extensive evaluation on ProcessBench, involving two types of models: process reward models (PRMs) and critic models, where for the latter we prompt general language models to critique each solution step by step.
1 code implementation • 22 Oct 2024 • Hao Xiang, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun, Jingren Zhou, Junyang Lin
The key to automated alignment lies in providing learnable and accurate preference signals for preference learning without human annotation.
no code implementations • 17 Oct 2024 • Qiaoyu Tang, Le Yu, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun
Post-training has emerged as a crucial paradigm for adapting large-scale pre-trained models to various tasks, whose effects are fully reflected by delta parameters (i. e., the disparity between post-trained and pre-trained parameters).
no code implementations • 18 Sep 2024 • An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, Keming Lu, Mingfeng Xue, Runji Lin, Tianyu Liu, Xingzhang Ren, Zhenru Zhang
This RM is then applied to the iterative evolution of data in supervised fine-tuning (SFT).
Ranked #2 on
Math Word Problem Solving
on MATH
(using extra training data)
2 code implementations • 18 Sep 2024 • Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, Kai Dang, Yang Fan, Yichang Zhang, An Yang, Rui Men, Fei Huang, Bo Zheng, Yibo Miao, Shanghaoran Quan, Yunlong Feng, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, Junyang Lin
In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its predecessor, CodeQwen1. 5.
1 code implementation • 4 Sep 2024 • Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Shanghaoran Quan, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang
Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.
no code implementations • 20 Aug 2024 • Chenhan Yuan, Fei Huang, Ru Peng, Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou
Transformer-based large language models (LLMs) exhibit limitations such as generating unsafe responses, unreliable reasoning, etc.
6 code implementations • 15 Jul 2024 • An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, TianHao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Xuejing Liu, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zhifang Guo, Zhihao Fan
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.
Ranked #3 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
1 code implementation • 20 Jun 2024 • Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Junjie Hu, Tianyu Liu, Baobao Chang
In recent progress, mathematical verifiers have achieved success in mathematical reasoning tasks by validating the correctness of solutions generated by policy models.
1 code implementation • 19 Jun 2024 • Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, Jingren Zhou
AutoIF transforms the validation of instruction-following data quality into code verification, requiring LLMs to generate instructions, the corresponding code to check the correctness of the instruction responses, and unit test samples to verify the code's correctness.
Ranked #1 on
Instruction Following
on IFEval
2 code implementations • 4 Jun 2024 • Zefan Cai, Yichi Zhang, Bofei Gao, Yuliang Liu, Yucheng Li, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, Junjie Hu, Wen Xiao
Our experimental evaluations, utilizing the LongBench benchmark, show that PyramidKV matches the performance of models with a full KV cache while retaining only 12% of the KV cache, thus significantly reducing memory usage.
1 code implementation • 3 Jun 2024 • Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu
Alignment is the most critical step in building large language models (LLMs) that meet human needs.
1 code implementation • 28 May 2024 • Keming Lu, Bowen Yu, Fei Huang, Yang Fan, Runji Lin, Chang Zhou
Effectively aligning Large Language Models (LLMs) with human-centric values while preventing the degradation of abilities acquired through Pre-training and Supervised Fine-tuning (SFT) poses a central challenge in Reinforcement Learning from Human Feedback (RLHF).
1 code implementation • 23 Jan 2024 • Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou
Nevertheless, we posit that LLMs inherently harbor role-play capabilities, owing to the extensive knowledge of characters and potential dialogues ingrained in their vast training corpora.
no code implementations • 15 Nov 2023 • Keming Lu, Hongyi Yuan, Runji Lin, Junyang Lin, Zheng Yuan, Chang Zhou, Jingren Zhou
Zooter shows computation efficiency in inference as it introduces only a minor computation overhead of a routing function compared with reward model ranking methods.
no code implementations • 15 Nov 2023 • Hongyi Yuan, Keming Lu, Fei Huang, Zheng Yuan, Chang Zhou
Large language models~(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias.
1 code implementation • 14 Nov 2023 • Shengguang Wu, Keming Lu, Benfeng Xu, Junyang Lin, Qi Su, Chang Zhou
The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets, as the model selects new data points most distinct from any existing ones according to its current embedding space.
1 code implementation • 9 Oct 2023 • Chengpeng Li, Zheng Yuan, Hongyi Yuan, Guanting Dong, Keming Lu, Jiancan Wu, Chuanqi Tan, Xiang Wang, Chang Zhou
In this paper, we conduct an investigation for such data augmentation in math reasoning and are intended to answer: (1) What strategies of data augmentation are more effective; (2) What is the scaling relationship between the amount of augmented data and model performance; and (3) Can data augmentation incentivize generalization to out-of-domain mathematical reasoning tasks?
Ranked #60 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
2 code implementations • 9 Oct 2023 • Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, Jingren Zhou
We propose four intriguing research questions to explore the association between model performance and various factors including data amount, composition ratio, model size and SFT strategies.
2 code implementations • 28 Sep 2023 • Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu
Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.
Ranked #3 on
Multi-Label Text Classification
on CC3M-TagMask
1 code implementation • 14 Aug 2023 • Keming Lu, Hongyi Yuan, Zheng Yuan, Runji Lin, Junyang Lin, Chuanqi Tan, Chang Zhou, Jingren Zhou
Based on this observation, we propose a data selector based on InsTag to select 6K diverse and complex samples from open-source datasets and fine-tune models on InsTag-selected data.
1 code implementation • 3 Aug 2023 • Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Keming Lu, Chuanqi Tan, Chang Zhou, Jingren Zhou
We find with augmented samples containing more distinct reasoning paths, RFT improves mathematical reasoning performance more for LLMs.
Ranked #111 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
1 code implementation • 24 May 2023 • Keming Lu, Xiaoman Pan, Kaiqiang Song, Hongming Zhang, Dong Yu, Jianshu Chen
In particular, we construct INSTRUCTOPENWIKI, a substantial instruction tuning dataset for Open-world IE enriched with a comprehensive corpus, extensive annotations, and diverse instructions.
1 code implementation • 18 Mar 2023 • Hongyi Yuan, Keming Lu, Zheng Yuan
Biomedical entity linking (EL) consists of named entity recognition (NER) and named entity disambiguation (NED).
1 code implementation • 21 Dec 2022 • Keming Lu, I-Hung Hsu, Wenxuan Zhou, Mingyu Derek Ma, Muhao Chen
Relation Extraction (RE) has been extended to cross-document scenarios because many relations are not simply described in a single document.
no code implementations • 19 Oct 2022 • Yucong Lin, Hongming Xiao, Jiani Liu, Zichao Lin, Keming Lu, Feifei Wang, Wei Wei
Recently, knowledge-enhanced methods leveraging auxiliary knowledge graphs have emerged in relation extraction, surpassing traditional text-based approaches.
1 code implementation • 19 May 2022 • Keming Lu, I-Hung Hsu, Wenxuan Zhou, Mingyu Derek Ma, Muhao Chen
Considering that summarization tasks aim at acquiring concise expressions of synoptical information from the longer context, these tasks naturally align with the objective of RE, i. e., extracting a kind of synoptical information that describes the relation of entity mentions.
Ranked #7 on
Relation Extraction
on TACRED
no code implementations • 18 Mar 2022 • Sheng Yu, Zheng Yuan, Jun Xia, Shengxuan Luo, Huaiyuan Ying, Sihang Zeng, Jingyi Ren, Hongyi Yuan, Zhengyun Zhao, Yucong Lin, Keming Lu, Jing Wang, Yutao Xie, Heung-Yeung Shum
For decades, these knowledge graphs have been developed via expert curation; however, this method can no longer keep up with today's AI development, and a transition to algorithmically generated BioMedKGs is necessary.
1 code implementation • 16 Mar 2022 • Yucong Lin, Keming Lu, Sheng Yu, Tianxi Cai, Marinka Zitnik
On a dataset annotated by human experts, REMAP improves text-based disease relation extraction by 10. 0% (accuracy) and 17. 2% (F1-score) by fusing disease knowledge graphs with text information.
no code implementations • 8 Sep 2020 • Yucong Lin, Keming Lu, Yulin Chen, Chuan Hong, Sheng Yu
In this paper, we present Hi-RES, a framework for high-throughput relation extraction algorithm development.