no code implementations • 2 Jun 2025 • Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin
By examining token entropy patterns in Chain-of-Thought (CoT) reasoning, we observe that only a small fraction of tokens exhibit high entropy, and these tokens act as critical forks that steer the model toward diverse reasoning pathways.
1 code implementation • 15 May 2025 • Binghai Wang, Runji Lin, Keming Lu, Le Yu, Zhenru Zhang, Fei Huang, Chujie Zheng, Kai Dang, Yang Fan, Xingzhang Ren, An Yang, Binyuan Hui, Dayiheng Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Bowen Yu, Jingren Zhou, Junyang Lin
Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling.
4 code implementations • 14 May 2025 • An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, Le Yu, Lianghao Deng, Mei Li, Mingfeng Xue, Mingze Li, Pei Zhang, Peng Wang, Qin Zhu, Rui Men, Ruize Gao, Shixuan Liu, Shuang Luo, TianHao Li, Tianyi Tang, Wenbiao Yin, Xingzhang Ren, Xinyu Wang, Xinyu Zhang, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yinger Zhang, Yu Wan, Yuqiong Liu, Zekun Wang, Zeyu Cui, Zhenru Zhang, Zhipeng Zhou, Zihan Qiu
In this work, we present Qwen3, the latest version of the Qwen model family.
1 code implementation • 10 May 2025 • Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, Dayiheng Liu, Jingren Zhou, Junyang Lin
Gating mechanisms have been widely utilized, from early models like LSTMs and Highway Networks to recent state space models, linear attention, and also softmax attention.
no code implementations • 26 Jan 2025 • An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, Junyang Lin, Kai Dang, Kexin Yang, Le Yu, Mei Li, Minmin Sun, Qin Zhu, Rui Men, Tao He, Weijia Xu, Wenbiao Yin, Wenyuan Yu, Xiafei Qiu, Xingzhang Ren, Xinlong Yang, Yong Li, Zhiying Xu, Zipeng Zhang
By leveraging our inference framework, the Qwen2. 5-1M models achieve a remarkable 3x to 7x prefill speedup in scenarios with 1 million tokens of context.
no code implementations • 19 Jan 2025 • Zhangzhang Jiang, Zhiqiang Yuan, Chunhui Li, Le Yu, Wei Fan
Both numerical simulations and experimental validation results are provided to demonstrate the effectiveness and robustness of the proposed SIC algorithm for the MA.
1 code implementation • 5 Jan 2025 • Haozhen Zhang, Haodong Yue, Xi Xiao, Le Yu, Qing Li, Zhen Ling, Ye Zhang
With the growing significance of network security, the classification of encrypted traffic has emerged as an urgent challenge.
6 code implementations • 19 Dec 2024 • Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, TianHao Li, Tianyi Tang, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan Qiu
In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2. 5-Turbo and Qwen2. 5-Plus, both available from Alibaba Cloud Model Studio.
Ranked #7 on
on GPQA
no code implementations • 17 Oct 2024 • Qiaoyu Tang, Le Yu, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun
Post-training has emerged as a crucial paradigm for adapting large-scale pre-trained models to various tasks, whose effects are fully reflected by delta parameters (i. e., the disparity between post-trained and pre-trained parameters).
1 code implementation • 12 Feb 2024 • Haozhen Zhang, Xi Xiao, Le Yu, Qing Li, Zhen Ling, Ye Zhang
In particular, we utilize supervised contrastive learning to enhance the packet-level and flow-level representations and perform graph data augmentation on the byte-level traffic graph so that the fine-grained semantic-invariant characteristics between bytes can be captured through contrastive learning.
2 code implementations • 6 Nov 2023 • Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, Yongbin Li
We experiment with encoder- and decoder-based LMs, showing that: (1) SFT delta parameter value ranges are typically small (within 0. 002) with extreme redundancy, and DARE can effortlessly eliminate 90% or even 99% of them; (2) DARE can merge multiple task-specific LMs into one LM with diverse capabilities.
no code implementations • 20 Oct 2023 • Juepeng Zheng, Shuai Yuan, Weijia Li, Haohuan Fu, Le Yu
); (2) traditional machine learning methods (such as random forest, decision tree, etc.
1 code implementation • 19 Oct 2023 • Tao Zou, Le Yu, Yifei HUANG, Leilei Sun, Bowen Du
In many real-world scenarios (e. g., academic networks, social platforms), different types of entities are not only associated with texts but also connected by various relationships, which can be abstracted as Text-Attributed Heterogeneous Graphs (TAHGs).
no code implementations • 25 Sep 2023 • Duleep Rathgamage Don, Ying Xie, Le Yu, Simon Hughes, Yun Zhu
This paper proposes a novel method to improve the accuracy of product search in e-commerce by utilizing a cluster language model.
1 code implementation • 22 Aug 2023 • Zihang Liu, Le Yu, Tongyu Zhu, Leiei Sun
Spatial-temporal data modeling aims to mine the underlying spatial relationships and temporal dependencies of objects in a system.
1 code implementation • 10 Aug 2023 • Tao Zou, Le Yu, Junchen Ye, Leilei Sun, Bowen Du, Deqing Wang
Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions.
1 code implementation • 4 Aug 2023 • Tao Zou, Le Yu, Leilei Sun, Bowen Du, Deqing Wang, Fuzhen Zhuang
Finally, the patent application trend is predicted by aggregating the representations of the target company and classification codes from static, dynamic, and hierarchical perspectives.
1 code implementation • 31 Jul 2023 • Haozhen Zhang, Le Yu, Xi Xiao, Qing Li, Francesco Mercaldo, Xiapu Luo, Qixu Liu
Encrypted traffic classification is receiving widespread attention from researchers and industrial companies.
1 code implementation • 24 Jul 2023 • Le Yu
In this paper, we conduct an empirical evaluation of Temporal Graph Benchmark (TGB) by extending our Dynamic Graph Library (DyGLib) to TGB.
2 code implementations • NeurIPS 2023 • Le Yu, Leilei Sun, Bowen Du, Weifeng Lv
We propose DyGFormer, a new Transformer-based architecture for dynamic graph learning.
1 code implementation • 4 Dec 2022 • Kaifa Zhao, Le Yu, Shiyao Zhou, Jing Li, Xiapu Luo, Yat Fei Aemon Chiu, Yutong Liu
Privacy protection raises great attention on both legal levels and user awareness.
1 code implementation • 31 May 2022 • Le Yu, Leilei Sun, Bowen Du, Tongyu Zhu, Weifeng Lv
In recent years, several methods have been designed to additionally utilize the labels at the input.
Ranked #20 on
Node Property Prediction
on ogbn-mag
1 code implementation • 12 Apr 2022 • Le Yu, Zihang Liu, Leilei Sun, Bowen Du, Chuanren Liu, Weifeng Lv
Previous studies for temporal sets prediction mainly focus on the modelling of elements and implicitly represent each user's preference based on his/her interacted elements.
1 code implementation • 24 May 2021 • Le Yu, Leilei Sun, Bowen Du, Chuanren Liu, Weifeng Lv, Hui Xiong
Moreover, a semantic fusing module is presented to aggregate relation-aware node representations into a compact representation with the learned relation representations.
Ranked #23 on
Node Property Prediction
on ogbn-mag
1 code implementation • 29 Dec 2020 • Le Yu, Leilei Sun, Bowen Du, Chuanren Liu, Weifeng Lv, Hui Xiong
Representation learning on heterogeneous graphs aims to obtain low-dimensional node representations that could preserve both node attributes and relation information.
Ranked #25 on
Node Property Prediction
on ogbn-mag
1 code implementation • 26 Aug 2020 • Juepeng Zheng, Haohuan Fu, Weijia Li, Wenzhao Wu, Yi Zhao, Runmin Dong, Le Yu
In this paper, we propose a novel domain adaptive oil palm tree detection method, i. e., a Multi-level Attention Domain Adaptation Network (MADAN) to reap cross-regional oil palm tree counting and detection.
2 code implementations • 20 Jun 2020 • Le Yu, Leilei Sun, Bowen Du, Chuanren Liu, Hui Xiong, Weifeng Lv
Given a sequence of sets, where each set contains an arbitrary number of elements, the problem of temporal sets prediction aims to predict the elements in the subsequent set.