Search Results for author: Le Yu

Found 27 papers, 21 papers with code

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

no code implementations2 Jun 2025 Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin

By examining token entropy patterns in Chain-of-Thought (CoT) reasoning, we observe that only a small fraction of tokens exhibit high entropy, and these tokens act as critical forks that steer the model toward diverse reasoning pathways.

WorldPM: Scaling Human Preference Modeling

1 code implementation15 May 2025 Binghai Wang, Runji Lin, Keming Lu, Le Yu, Zhenru Zhang, Fei Huang, Chujie Zheng, Kai Dang, Yang Fan, Xingzhang Ren, An Yang, Binyuan Hui, Dayiheng Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Bowen Yu, Jingren Zhou, Junyang Lin

Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling.

Language Modeling Language Modelling

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

1 code implementation10 May 2025 Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, Dayiheng Liu, Jingren Zhou, Junyang Lin

Gating mechanisms have been widely utilized, from early models like LSTMs and Highway Networks to recent state space models, linear attention, and also softmax attention.

Attribute Mixture-of-Experts +1

Channel Sounding Using Multiplicative Arrays Based on Successive Interference Cancellation Principle

no code implementations19 Jan 2025 Zhangzhang Jiang, Zhiqiang Yuan, Chunhui Li, Le Yu, Wei Fan

Both numerical simulations and experimental validation results are provided to demonstrate the effectiveness and robustness of the proposed SIC algorithm for the MA.

parameter estimation

A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models

no code implementations17 Oct 2024 Qiaoyu Tang, Le Yu, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun

Post-training has emerged as a crucial paradigm for adapting large-scale pre-trained models to various tasks, whose effects are fully reflected by delta parameters (i. e., the disparity between post-trained and pre-trained parameters).

Quantization

One Train for Two Tasks: An Encrypted Traffic Classification Framework Using Supervised Contrastive Learning

1 code implementation12 Feb 2024 Haozhen Zhang, Xi Xiao, Le Yu, Qing Li, Zhen Ling, Ye Zhang

In particular, we utilize supervised contrastive learning to enhance the packet-level and flow-level representations and perform graph data augmentation on the byte-level traffic graph so that the fine-grained semantic-invariant characteristics between bytes can be captured through contrastive learning.

Classification Contrastive Learning +3

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

2 code implementations6 Nov 2023 Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, Yongbin Li

We experiment with encoder- and decoder-based LMs, showing that: (1) SFT delta parameter value ranges are typically small (within 0. 002) with extreme redundancy, and DARE can effortlessly eliminate 90% or even 99% of them; (2) DARE can merge multiple task-specific LMs into one LM with diverse capabilities.

Decoder GSM8K +1

Pretraining Language Models with Text-Attributed Heterogeneous Graphs

1 code implementation19 Oct 2023 Tao Zou, Le Yu, Yifei HUANG, Leilei Sun, Bowen Du

In many real-world scenarios (e. g., academic networks, social platforms), different types of entities are not only associated with texts but also connected by various relationships, which can be abstracted as Text-Attributed Heterogeneous Graphs (TAHGs).

Graph Neural Network Link Prediction +2

A Simple Framework for Multi-mode Spatial-Temporal Data Modeling

1 code implementation22 Aug 2023 Zihang Liu, Le Yu, Tongyu Zhu, Leiei Sun

Spatial-temporal data modeling aims to mine the underlying spatial relationships and temporal dependencies of objects in a system.

Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification

1 code implementation10 Aug 2023 Tao Zou, Le Yu, Junchen Ye, Leilei Sun, Bowen Du, Deqing Wang

Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions.

Classification Patent classification

Event-based Dynamic Graph Representation Learning for Patent Application Trend Prediction

1 code implementation4 Aug 2023 Tao Zou, Le Yu, Leilei Sun, Bowen Du, Deqing Wang, Fuzhen Zhuang

Finally, the patent application trend is predicted by aggregating the representations of the target company and classification codes from static, dynamic, and hierarchical perspectives.

Classification Graph Learning +2

An Empirical Evaluation of Temporal Graph Benchmark

1 code implementation24 Jul 2023 Le Yu

In this paper, we conduct an empirical evaluation of Temporal Graph Benchmark (TGB) by extending our Dynamic Graph Library (DyGLib) to TGB.

Graph Learning

Continuous-Time User Preference Modelling for Temporal Sets Prediction

1 code implementation12 Apr 2022 Le Yu, Zihang Liu, Leilei Sun, Bowen Du, Chuanren Liu, Weifeng Lv

Previous studies for temporal sets prediction mainly focus on the modelling of elements and implicitly represent each user's preference based on his/her interacted elements.

Prediction

Heterogeneous Graph Representation Learning with Relation Awareness

1 code implementation24 May 2021 Le Yu, Leilei Sun, Bowen Du, Chuanren Liu, Weifeng Lv, Hui Xiong

Moreover, a semantic fusing module is presented to aggregate relation-aware node representations into a compact representation with the learned relation representations.

Graph Learning Graph Neural Network +5

Hybrid Micro/Macro Level Convolution for Heterogeneous Graph Learning

1 code implementation29 Dec 2020 Le Yu, Leilei Sun, Bowen Du, Chuanren Liu, Weifeng Lv, Hui Xiong

Representation learning on heterogeneous graphs aims to obtain low-dimensional node representations that could preserve both node attributes and relation information.

Graph Learning Node Property Prediction +1

Cross-regional oil palm tree counting and detection via multi-level attention domain adaptation network

1 code implementation26 Aug 2020 Juepeng Zheng, Haohuan Fu, Weijia Li, Wenzhao Wu, Yi Zhao, Runmin Dong, Le Yu

In this paper, we propose a novel domain adaptive oil palm tree detection method, i. e., a Multi-level Attention Domain Adaptation Network (MADAN) to reap cross-regional oil palm tree counting and detection.

Domain Adaptation

Predicting Temporal Sets with Deep Neural Networks

2 code implementations20 Jun 2020 Le Yu, Leilei Sun, Bowen Du, Chuanren Liu, Hui Xiong, Weifeng Lv

Given a sequence of sets, where each set contains an arbitrary number of elements, the problem of temporal sets prediction aims to predict the elements in the subsequent set.

Prediction Time Series Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.