no code implementations • 16 May 2025 • Ratnadira Widyasari, Martin Weyssow, Ivana Clairine Irsan, Han Wei Ang, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, Hong Jin Kang, David Lo
Additionally, we show that role-specific instruction tuning in multi-agent with small data (50 pair samples) improves the performance of VulTrial further by 139. 89% and 118. 30%.
1 code implementation • 10 Apr 2025 • Keyu Liang, Zhongxin Liu, Chao Liu, Zhiyuan Wan, David Lo, Xiaohu Yang
In this work, we propose to break the query-code matching process of code search into two simpler tasks: query-comment matching and code-code matching.
1 code implementation • 7 Apr 2025 • Martin Weyssow, Chengran Yang, Junkai Chen, Yikun Li, Huihui Huang, Ratnadira Widyasari, Han Wei Ang, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, David Lo
Through RLAIF, R2Vul enables LLMs to produce structured, security-aware reasoning that is actionable and reliable while explicitly learning to distinguish valid assessments from misleading ones.
no code implementations • 20 Feb 2025 • Junwei Zhang, Xing Hu, Shan Gao, Xin Xia, David Lo, Shanping Li
To evaluate its effectiveness, we apply CleanTest on two widely-used test generation datasets, i. e., Methods2Test and Atlas.
no code implementations • 20 Feb 2025 • Ye Liu, Yuqing Niu, Chengyan Ma, Ruidong Han, Wei Ma, Yi Li, Debin Gao, David Lo
Smart contracts are highly susceptible to manipulation attacks due to the leakage of sensitive information.
no code implementations • 10 Feb 2025 • Bo Gao, YuAn Wang, Qingsong Wei, Yong liu, Rick Siow Mong Goh, David Lo
Decentralized finance (DeFi) applications depend on accurate price oracles to ensure secure transactions, yet these oracles are highly vulnerable to manipulation, enabling attackers to exploit smart contract vulnerabilities for unfair asset valuation and financial gain.
no code implementations • 10 Feb 2025 • Xin Zhou, Martin Weyssow, Ratnadira Widyasari, Ting Zhang, Junda He, Yunbo Lyu, Jianming Chang, Beiqi Zhang, Dan Huang, David Lo
To address the data leakage, we introduce \textbf{LessLeak-Bench}, a new benchmark that removes leaked samples from the 83 SE benchmarks, enabling more reliable LLM evaluations in future research.
no code implementations • 27 Jan 2025 • Yunbo Lyu, Zhou Yang, Yuqing Niu, Jing Jiang, David Lo
We evaluate seven gender bias detectors and find that none fully capture the actual level of bias in T2I models, with some detectors overestimating bias by up to 26. 95%.
1 code implementation • 7 Jan 2025 • Yindu Su, Huike Zou, Lin Sun, Ting Zhang, Haiyang Yang, Liyu Chen, David Lo, Qingheng Zhang, Shuguang Han, Jufeng Chen
Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendations, and business analytics on e-commerce platforms.
no code implementations • 14 Nov 2024 • Alexandra González, Xavier Franch, David Lo, Silverio Martínez-Fernández
Aims: We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
no code implementations • 14 Nov 2024 • Joel Castaño, Rafael Cabañas, Antonio Salmerón, David Lo, Silverio Martínez-Fernández
While previous studies have examined aspects of models hosted on platforms like HF, a comprehensive longitudinal study of how these models change remains underexplored.
1 code implementation • 13 Sep 2024 • Mouxiang Chen, Zhongxin Liu, He Tao, Yusu Hong, David Lo, Xin Xia, Jianling Sun
Our proposed approximated optimal strategy B4 significantly surpasses existing heuristics in selecting code solutions generated by large language models (LLMs) with LLM-generated tests, achieving a relative performance improvement by up to 50% over the strongest heuristic and 246% over the random selection in the most challenging scenarios.
no code implementations • 2 Aug 2024 • Zhensu Sun, Haotian Zhu, Bowen Xu, Xiaoning Du, Li Li, David Lo
Inspired by their remarkable capabilities in understanding and generating code, we propose to deal with the runtime errors in a real-time manner using LLMs.
no code implementations • 23 Jul 2024 • Xin Zhou, Duc-Manh Tran, Thanh Le-Cong, Ting Zhang, Ivana Clairine Irsan, Joshua Sumarlin, Bach Le, David Lo
There are two popular lines of work to address automated vulnerability detection.
3 code implementations • 22 Jun 2024 • Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu, Zijian Wang, Binyuan Hui, Niklas Muennighoff, David Lo, Daniel Fried, Xiaoning Du, Harm de Vries, Leandro von Werra
In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions.
Ranked #1 on
Code Generation
on BigCodeBench-Instruct
1 code implementation • 25 Apr 2024 • Zhensu Sun, Xiaoning Du, Zhou Yang, Li Li, David Lo
To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar.
no code implementations • 17 Mar 2024 • Xin Zhou, DongGyun Han, David Lo
In addition, our experimental results confirm that the simple model and complex model are complementary to each other.
no code implementations • 26 Jan 2024 • Sicong Cao, Xiaobing Sun, Ratnadira Widyasari, David Lo, Xiaoxue Wu, Lili Bo, Jiale Zhang, Bin Li, Wei Liu, Di wu, Yixin Chen
The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE).
no code implementations • 8 Jan 2024 • Dat Nguyen, Hieu M. Vu, Cong-Thanh Le, Bach Le, David Lo, ThanhVu Nguyen, Corina Pasareanu
To tackle the challenge of varying input structures in GNNs, GNNInfer first identifies a set of representative influential structures that contribute significantly towards the prediction of a GNN.
no code implementations • 8 Jan 2024 • Wei Hung Pan, Ming Jie Chok, Jonathan Leong Shan Wong, Yung Xin Shin, Yeong Shian Poon, Zhou Yang, Chun Yong Chong, David Lo, Mei Kuan Lim
This is achieved by generating code in response to a given question using different variants.
no code implementations • 8 Sep 2023 • David Lo
For decades, much software engineering research has been dedicated to devising automated solutions aimed at enhancing developer productivity and elevating software quality.
1 code implementation • 21 Aug 2023 • Martin Weyssow, Xin Zhou, Kisub Kim, David Lo, Houari Sahraoui
In this paper, we deliver a comprehensive study of PEFT techniques for LLMs in the context of automated code generation.
1 code implementation • 21 Aug 2023 • Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, Haoyu Wang
Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages.
1 code implementation • 31 May 2023 • Terry Yue Zhuo, Zhou Yang, Zhensu Sun, YuFei Wang, Li Li, Xiaoning Du, Zhenchang Xing, David Lo
This paper fills this gap by conducting a comprehensive and integrative survey of data augmentation for source code, wherein we systematically compile and encapsulate existing literature to provide a comprehensive overview of the field.
2 code implementations • 23 May 2023 • Truong Giang Nguyen, Thanh Le-Cong, Hong Jin Kang, Ratnadira Widyasari, Chengran Yang, Zhipeng Zhao, Bowen Xu, Jiayuan Zhou, Xin Xia, Ahmed E. Hassan, Xuan-Bach D. Le, David Lo
To address these challenges and boost the effectiveness of prior works, we propose MiDas (Multi-Granularity Detector for Vulnerability Fixes).
no code implementations • 6 May 2023 • Martin Weyssow, Xin Zhou, Kisub Kim, David Lo, Houari Sahraoui
We demonstrate that the most commonly used fine-tuning technique from prior work is not robust enough to handle the dynamic nature of APIs, leading to the loss of previously acquired knowledge i. e., catastrophic forgetting.
no code implementations • 8 Mar 2023 • Aftab Hussain, Md Rafiqul Islam Rabin, Bowen Xu, David Lo, Mohammad Amin Alipour
In this paper, we explore the impact of an unsuperivsed feature enrichment approach based on variable roles on the performance of neural models of code.
no code implementations • 16 Feb 2023 • Zichong Wang, Yang Zhou, Israat Haque, David Lo, Wenbin Zhang
The increasing use of Machine Learning (ML) software can lead to unfair and unethical decisions, thus fairness bugs in software are becoming a growing concern.
1 code implementation • 14 Feb 2023 • Roman Belaire, Pradeep Varakantham, Thanh Nguyen, David Lo
We demonstrate that our approaches provide a significant improvement in performance across a wide variety of benchmarks against leading approaches for robust Deep RL.
1 code implementation • 11 Feb 2023 • Daniel Hao Xian Yuen, Andrew Yong Chen Pang, Zhou Yang, Chun Yong Chong, Mei Kuan Lim, David Lo
To address these limitations, our tool incorporates two novel features: (1) a text transformation module to boost the number of generated test cases and uncover more errors in ASR systems and (2) a phonetic analysis module to identify on which phonemes the ASR system tend to produce errors.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 3 Jan 2023 • Thanh Le-Cong, Duc-Minh Luong, Xuan Bach D. Le, David Lo, Nhat-Hoa Tran, Bui Quang-Huy, Quyet-Thang Huynh
In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax.
1 code implementation • 12 Dec 2022 • Tiezhu Sun, Kevin Allix, Kisub Kim, Xin Zhou, Dongsun Kim, David Lo, Tegawendé F. Bissyandé, Jacques Klein
Central to applying ML to software artifacts (like source or executable code) is converting them into forms suitable for learning.
1 code implementation • 7 Oct 2022 • Chen Gong, Zhou Yang, Yunpeng Bai, Junda He, Jieke Shi, Kecen Li, Arunesh Sinha, Bowen Xu, Xinwen Hou, David Lo, Tianhao Wang
Our experiments conducted on four tasks and four offline RL algorithms expose a disquieting fact: none of the existing offline RL algorithms is immune to such a backdoor attack.
1 code implementation • 13 Sep 2022 • Zhensu Sun, Xiaoning Du, Fu Song, Shangwen Wang, Mingze Ni, Li Li, David Lo
To fill this significant gap, we first investigate the prompts of unhelpful code completions, called "low-return prompts".
1 code implementation • 7 Sep 2022 • Thanh Le-Cong, Hong Jin Kang, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan-Bach D. Le, Huynh Quyet Thang
Given a call graph constructed by traditional static analysis tools, AutoPruner takes a Transformer-based approach to capture the semantic relationships between the caller and callee functions associated with each edge in the call graph.
1 code implementation • 7 Sep 2022 • Truong Giang Nguyen, Thanh Le-Cong, Hong Jin Kang, Xuan-Bach D. Le, David Lo
Open-source software (OSS) vulnerability management process is important nowadays, as the number of discovered OSS vulnerabilities is increasing over time.
1 code implementation • 21 May 2022 • Rahul Yedida, Hong Jin Kang, Huy Tu, Xueqi Yang, David Lo, Tim Menzies
Automatically generated static code warnings suffer from a large number of false alarms.
no code implementations • 5 Apr 2022 • Rishab Sharma, Fuxiang Chen, Fatemeh Fard, David Lo
When identifiers' embeddings are used in CodeBERT, a code-based PLM, the performance is improved by 21-24% in the F1-score of clone detection.
no code implementations • 5 Apr 2022 • Fuxiang Chen, Fatemeh Fard, David Lo, Timofey Bryksin
Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i. e., Ruby and Java code possess very different structure.
no code implementations • 5 Apr 2022 • Mohammad Abdul Hadi, Imam Nur Bani Yusuf, Ferdian Thung, Kien Gia Luong, Jiang Lingxiao, Fatemeh H. Fard, David Lo
We have also identified two different tokenization approaches that can contribute to a significant boost in PTMs' performance for the API sequence generation task.
no code implementations • 2 Mar 2022 • Jiri Gesi, SiQi Liu, Jiawei Li, Iftekhar Ahmed, Nachiappan Nagappan, David Lo, Eduardo Santana de Almeida, Pavneet Singh Kochhar, Lingfeng Bao
We found that our newly identified code smells are prevalent and impactful on the maintenance of DL systems from the developer's perspective.
no code implementations • 14 Nov 2021 • Kien Luong, Mohammad Hadi, Ferdian Thung, Fatemeh Fard, David Lo
Leveraging this observation, we develop FACOS, a context-specific algorithm to capture the semantic and syntactic information of the paragraphs and code snippets in a discussion.
no code implementations • 22 Feb 2021 • Zhiyuan Wan, Xin Xia, David Lo, Jiachi Chen, Xiapu Luo, Xiaohu Yang
Given numerous research efforts in addressing the security issues of smart contracts, we wondered how software practitioners build security into smart contracts in practice.
Software Engineering
1 code implementation • 14 Dec 2020 • Stefanus Agus Haryono, Ferdian Thung, David Lo, Lingxiao Jiang, Julia Lawall, Hong Jin Kang, Lucas Serrano, Gilles Muller
Usages of deprecated APIs in Android apps need to be updated to ensure the apps' compatibility with the old and new versions of Android OS.
Software Engineering
no code implementations • 10 Nov 2020 • Stefanus Agus Haryono, Ferdian Thung, David Lo, Julia Lawall, Lingxiao Jiang
In this paper, we built a tool to automate these updates.
Software Engineering
1 code implementation • 6 Oct 2020 • Yuanrui Fan, Xin Xia, David Lo, Ahmed E. Hassan, Shanping Li
Hence, in this study, we perform an empirical study on academic AI repositories to highlight good software engineering practices of popular academic AI repositories for AI researchers.
Software Engineering
no code implementations • 23 Aug 2020 • Cuiyun Gao, Jichuan Zeng, Zhiyuan Wen, David Lo, Xin Xia, Irwin King, Michael R. Lyu
Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT in identifying emerging app issues, improving the state-of-the-art method by 22. 3% in terms of F1-score.
no code implementations • 25 Jun 2020 • Chao Liu, Cuiyun Gao, Xin Xia, David Lo, John Grundy, Xiaohu Yang
Experimental results show the importance of replicability and reproducibility, where the reported performance of a DL model could not be replicated for an unstable optimization process.
no code implementations • 29 May 2020 • Chao Liu, Xin Xia, David Lo, Zhiwei Liu, Ahmed E. Hassan, Shanping Li
CodeMatcher first collects metadata for query words to identify irrelevant/noisy ones, then iteratively performs fuzzy search with important query words on the codebase that is indexed by the Elasticsearch tool, and finally reranks a set of returned candidate code according to how the tokens in the candidate code snippet sequentially matched the important words in a query.
1 code implementation • 20 May 2020 • Zhipeng Gao, Xin Xia, John Grundy, David Lo, Yuan-Fang Li
Stack Overflow has been heavily used by software developers as a popular way to seek programming-related information from peers via the internet.
Software Engineering
no code implementations • 11 May 2020 • Roy Ka-Wei Lee, Thong Hoang, Richard J. Oentaryo, David Lo
The Act step then recommends to the user which activities to perform on the identified set of items.
1 code implementation • 10 Feb 2020 • Cuiyun Gao, Jichuan Zeng, Xin Xia, David Lo, Michael R. Lyu, Irwin King
Previous studies showed that replying to a user review usually has a positive effect on the rating that is given by the user to the app.
1 code implementation • 20 Jan 2020 • Zhipeng Gao, Lingxiao Jiang, Xin Xia, David Lo, John Grundy
However, many bugs and vulnerabilities have been identified in many contracts which raises serious concerns about smart contract security, not to mention that the blockchain systems on which the smart contracts are built can be buggy.
Software Engineering
1 code implementation • 12 Dec 2019 • Xiao Liang Yu, Omar Al-Bataineh, David Lo, Abhik Roychoudhury
Our approach can be used to optimise the overall security and reliability of smart contracts against malicious attackers.
Software Engineering Cryptography and Security 68N15 D.1.2
no code implementations • 27 Oct 2019 • Vinoj Jayasundara, Nghi Duy Quoc Bui, Lingxiao Jiang, David Lo
Program comprehension is a fundamental task in software development and maintenance processes.
1 code implementation • 16 Sep 2019 • Zhongxin Liu, Xin Xia, Christoph Treude, David Lo, Shanping Li
We build a dataset with over 41K PRs and evaluate our approach on this dataset through ROUGE and a human evaluation.
Software Engineering
1 code implementation • 22 Aug 2019 • Zhipeng Gao, Vinoj Jayasundara, Lingxiao Jiang, Xin Xia, David Lo, John Grundy
In addition to the uses by individual developers, SmartEmbed can also be applied to studies of smart contracts in a large scale.
Software Engineering
no code implementations • 3 May 2019 • Amirreza Shirani, Bowen Xu, David Lo, Thamar Solorio, Amin Alipour
The proposed dataset Stack Overflow is a useful resource to develop novel solutions, specifically data-hungry neural network models, for the prediction of relatedness in technical community question-answering forums.
1 code implementation • 16 Feb 2019 • Thong Hoang, Julia Lawall, Richard J. Oentaryo, Yuan Tian, David Lo
This work proposes PatchNet, an automated tool based on hierarchical deep learning for classifying patches by extracting features from commit messages and code changes.
no code implementations • 27 Feb 2018 • Thong Hoang, Richard J. Oentaryo, Tien-Duy B. Le, David Lo
To help the developers debug, numerous information retrieval (IR)-based and spectrum-based bug localization techniques have been devised.
no code implementations • 1 May 2017 • Ferdian Thung, Richard J. Oentaryo, David Lo, Yuan Tian
In this light, we propose a new, automated approach called WebAPIRec that takes as input a project profile and outputs a ranked list of {web} APIs that can be used to implement the project.
no code implementations • 24 Jun 2016 • Richard J. Oentaryo, Ee-Peng Lim, Freddy Chong Tat Chua, Jia-Wei Low, David Lo
The abundance of user-generated data in social media has incentivized the development of methods to infer the latent attributes of users, which are crucially useful for personalization, advertising and recommendation.
no code implementations • 10 Jun 2016 • Daoyuan Li, Li Li, Dongsun Kim, Tegawendé F. Bissyandé, David Lo, Yves Le Traon
One single code change can significantly influence a wide range of software systems and their users.
Software Engineering
no code implementations • 27th IEEE/ACM International Conference on Automated Software Engineering 2013 • Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, David Lo, Chengnian Sun
Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues.