1 code implementation • 12 Jun 2025 • Lianghong Guo, Yanlin Wang, Caihua Li, Pengyu Yang, Jiachi Chen, Wei Tao, Yingtian Zou, Duyu Tang, Zibin Zheng
Constructing large-scale datasets for the GitHub issue resolution task is crucial for both training and evaluating the software engineering capabilities of Large Language Models (LLMs).
no code implementations • 28 Apr 2025 • Kang Yang, XinJun Mao, Shangwen Wang, Yanlin Wang, Tanghaoran Zhang, Bo Lin, Yihao Qin, Zhang Zhang, Yao Lu, Kamal Al-Sabahi
Pre-trained code models rely heavily on high-quality pre-training data, particularly human-written reference comments that bridge code and natural language.
no code implementations • 11 Apr 2025 • Yanlin Wang, Kefeng Duan, Dewu Zheng, Ensheng Shi, Fengji Zhang, Yanli Wang, Jiachi Chen, Xilin Liu, Yuchi Ma, Hongyu Zhang, Qianxiang Wang, Zibin Zheng
(1) A quantitative analysis of the research landscape, including publication trends, venues, and the explored domains; (2) A novel taxonomy of context types used in code intelligence; (3) A task-oriented analysis investigating context integration strategies across diverse code intelligence tasks; (4) A critical evaluation of evaluation methodologies for context-aware methods.
1 code implementation • 21 Mar 2025 • Linxi Liang, Jing Gong, Mingwei Liu, Chong Wang, Guangsheng Ou, Yanlin Wang, Xin Peng, Zibin Zheng
To address this gap, we present RustEvo, a novel framework for constructing dynamic benchmarks that evaluate the ability of LLMs to adapt to evolving Rust APIs.
2 code implementations • 24 Dec 2024 • Dewu Zheng, Yanlin Wang, Ensheng Shi, Xilin Liu, Yuchi Ma, Hongyu Zhang, Zibin Zheng
With the rapid advancement of large language models (LLMs), extensive research has been conducted to investigate the code generation capabilities of LLMs.
no code implementations • 23 Dec 2024 • Yanli Wang, Yanlin Wang, Suiquan Wang, Daya Guo, Jiachi Chen, John Grundy, Xilin Liu, Yuchi Ma, Mingzhi Mao, Hongyu Zhang, Zibin Zheng
However, even with this improvement, the Success@1 score of the best-performing LLM is only 21%, which may not meet the need for reliable automatic repository-level code translation.
1 code implementation • 30 Sep 2024 • Ziyao Zhang, Yanlin Wang, Chong Wang, Jiachi Chen, Zibin Zheng
In this paper, we conduct an empirical study to study the phenomena, mechanism, and mitigation of LLM hallucinations within more practical and complex development contexts in repository-level generation scenario.
1 code implementation • 23 Sep 2024 • Jiachi Chen, Qingyuan Zhong, Yanlin Wang, Kaiwen Ning, Yongkun Liu, Zenan Xu, Zhe Zhao, Ting Chen, Zibin Zheng
Despite their benefits, LLMs also pose notable risks, including the potential to generate harmful content and being abused by malicious developers to create malicious code.
1 code implementation • 13 Sep 2024 • Yanlin Wang, Wanjun Zhong, Yanxian Huang, Ensheng Shi, Min Yang, Jiachi Chen, Hui Li, Yuchi Ma, Qianxiang Wang, Zibin Zheng
In recent years, Large Language Models (LLMs) have achieved remarkable success and have been widely used in various downstream tasks, especially in the tasks of the software engineering (SE) field.
no code implementations • 7 Aug 2024 • Mingyu Zhao, Xingyu Huang, Ziyu Lyu, Yanlin Wang, Lixin Cui, Lu Bai
Based on the intrinsic properties of graphs, we design three probes to systematically investigate the graph representation learning process from different perspectives, respectively the node-wise level, the path-wise level, and the structural level.
no code implementations • 29 Jun 2024 • Yanlin Wang, Tianyue Jiang, Mingwei Liu, Jiachi Chen, Zibin Zheng
In this paper, we empirically analyze the differences in coding style between the code generated by mainstream Code LLMs and the code written by human developers, and summarize coding style inconsistency taxonomy.
1 code implementation • 17 Jun 2024 • Jing Gong, Yanghui Wu, Linxi Liang, Yanlin Wang, Jiachi Chen, Mingwei Liu, Zibin Zheng
Semantic code search, retrieving code that matches a given natural language query, is an important task to improve productivity in software engineering.
no code implementations • 26 Mar 2024 • Wei Tao, Yucheng Zhou, Yanlin Wang, Wenqiang Zhang, Hongyu Zhang, Yu Cheng
To overcome this challenge, we empirically study the reason why LLMs fail to resolve GitHub issues and analyze the major factors.
no code implementations • 28 Jan 2024 • Jianqiao Lu, Wanjun Zhong, YuFei Wang, Zhijiang Guo, Qi Zhu, Wenyong Huang, Yanlin Wang, Fei Mi, Baojun Wang, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu
With the teacher's guidance, the student learns to iteratively refine its answer with feedback, and forms a robust and comprehensive understanding of the posed questions.
1 code implementation • 16 Jan 2024 • Wei Tao, Yucheng Zhou, Yanlin Wang, Hongyu Zhang, Haofen Wang, Wenqiang Zhang
However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i. e., good-practice commits), while the rest do not.
no code implementations • 25 Nov 2023 • Sheng Zhang, Hui Li, Yanlin Wang, Zhao Wei, Yong Xiu, Juhong Wang, Rongong Ji
To mitigate biases, we develop a general debiasing framework that employs reranking to calibrate search results.
1 code implementation • 1 Oct 2023 • Jianpeng Zhou, Wanjun Zhong, Yanlin Wang, Jiahai Wang
These methods employ consistent models, sample sizes, prompting methods and levels of problem decomposition, regardless of the problem complexity.
1 code implementation • 25 Aug 2023 • Ensheng Shi, Fengji Zhang, Yanlin Wang, Bei Chen, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
To meet the demands of this dynamic field, there is a growing need for an effective software development assistant.
no code implementations • 18 Jul 2023 • Menghan Wang, Jinming Yang, Yuchen Guo, Yuming Shen, Mengying Zhu, Yanlin Wang
Inspired by recent advances on differentiable sorting, in this paper, we propose a novel multi-task framework that leverages orders of user behaviors to predict user post-click conversion in an end-to-end approach.
no code implementations • 3 Jul 2023 • Ruimin Ma, Ruitao Xie, Yanlin Wang, Jintao Meng, Yanjie Wei, Wenhui Xi, Yi Pan
Few studies conduct machine classification of ASD for participants below 5-year-old, but, with mediocre predictive accuracy.
1 code implementation • 17 May 2023 • Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, Yanlin Wang
To mimic anthropomorphic behaviors and selectively preserve memory, MemoryBank incorporates a memory updating mechanism, inspired by the Ebbinghaus Forgetting Curve theory, which permits the AI to forget and reinforce memory based on time elapsed and the relative significance of the memory, thereby offering a human-like memory mechanism.
3 code implementations • 13 Apr 2023 • Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, Nan Duan
Impressively, GPT-4 surpasses average human performance on SAT, LSAT, and math competitions, attaining a 95% accuracy rate on the SAT Math test and a 92. 5% accuracy on the English test of the Chinese national college entrance exam.
1 code implementation • 11 Apr 2023 • Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun
Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model.
1 code implementation • 21 Oct 2022 • Haochen Li, Chunyan Miao, Cyril Leung, Yanxian Huang, Yuan Huang, Hongyu Zhang, Yanlin Wang
In this paper, we explore augmentation methods that augment data (both code and query) at representation level which does not require additional data processing and training, and based on this we propose a general format of representation-level augmentation that unifies existing methods.
no code implementations • 4 Oct 2022 • Lunyiu Nie, Jiuding Sun, Yanlin Wang, Lun Du, Lei Hou, Juanzi Li, Shi Han, Dongmei Zhang, Jidong Zhai
The recent prevalence of pretrained language models (PLMs) has dramatically shifted the paradigm of semantic parsing, where the mapping from natural language utterances to structured logical forms is now formulated as a Seq2Seq task.
no code implementations • 17 Sep 2022 • Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang
Graph Neural Networks (GNNs) are popular machine learning methods for modeling graph data.
Ranked #7 on
Node Classification
on Squirrel
1 code implementation • 15 Aug 2022 • Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang
Graph Neural Networks (GNNs) have shown expressive performance on graph representation learning by aggregating information from neighbors.
no code implementations • 9 Jun 2022 • Ruimin Ma, Yanlin Wang, Yanjie Wei, Yi Pan
Accurate diagnosis of autism spectrum disorder (ASD) based on neuroimaging data has significant implications, as extracting useful information from neuroimaging data for ASD detection is challenging.
no code implementations • 18 Apr 2022 • Ruixuan Liu, Yanlin Wang, Yang Cao, Lingjuan Lyu, Weike Pan, Yun Chen, Hong Chen
Collecting and training over sensitive personal data raise severe privacy concerns in personalized recommendation systems, and federated learning can potentially alleviate the problem by training models over decentralized user data. However, a theoretically private solution in both the training and serving stages of federated recommendation is essential but still lacking. Furthermore, naively applying differential privacy (DP) to the two stages in federated recommendation would fail to achieve a satisfactory trade-off between privacy and utility due to the high-dimensional characteristics of model gradients and hidden representations. In this work, we propose a federated news recommendation method for achieving a better utility in model training and online serving under a DP guarantee. We first clarify the DP definition over behavior data for each round in the life-circle of federated recommendation systems. Next, we propose a privacy-preserving online serving mechanism under this definition based on the idea of decomposing user embeddings with public basic vectors and perturbing the lower-dimensional combination coefficients.
no code implementations • 7 Apr 2022 • Fangyu Zhang, Yanjie Wei, Jin Liu, Yanlin Wang, Wenhui Xi, Yi Pan
This paper introduces a classification framework to aid ASD diagnosis based on rs-fMRI.
no code implementations • 7 Apr 2022 • Ensheng Shi, Yanlin Wang, Wenchao Gu, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
However, there is still a lot of room for improvement in using contrastive learning for code search.
no code implementations • ACL 2022 • Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Michael R. Lyu
Code search is to search reusable code snippets from source code corpus based on natural languages queries.
2 code implementations • ACL 2022 • Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, Jian Yin
Furthermore, we propose to utilize multi-modal contents to learn representation of code fragment with contrastive learning, and then align representations among programming languages using a cross-modal generation task.
2 code implementations • 5 Mar 2022 • Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
Furthermore, RACE can boost the performance of existing Seq2Seq models in commit message generation.
no code implementations • 16 Feb 2022 • Ruixuan Liu, Fangzhao Wu, Chuhan Wu, Yanlin Wang, Lingjuan Lyu, Hong Chen, Xing Xie
In this way, all the clients can participate in the model learning in FL, and the final model can be big and powerful enough.
no code implementations • 10 Feb 2022 • Chuhan Wu, Fangzhao Wu, Tao Qi, Yanlin Wang, Yuqing Yang, Yongfeng Huang, Xing Xie
To solve the game, we propose a platform negotiation method that simulates the bargaining among platforms and locally optimizes their policies via gradient descent.
1 code implementation • 15 Jul 2021 • Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, Hongbin Sun
To achieve a profound understanding of how far we are from solving this problem and provide suggestions to future research, in this paper, we conduct a systematic and in-depth analysis of 5 state-of-the-art neural code summarization models on 6 widely used BLEU variants, 4 pre-processing operations and their combinations, and 3 widely used datasets.
1 code implementation • 12 Jul 2021 • Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang
We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods.
1 code implementation • 10 Jul 2021 • Lun Du, Xiaozhou Shi, Yanlin Wang, Ensheng Shi, Shi Han, Dongmei Zhang
On the other hand, as a specific query may focus on one or several perspectives, it is difficult for a single query representation module to represent different user intents.
no code implementations • 17 Mar 2021 • Yanlin Wang, Hui Li
Code completion has become an essential component of integrated development environments.