Search Results for author: Nghi D. Q. Bui

Found 25 papers, 15 papers with code

SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs

no code implementations20 Apr 2025 Minh V. T. Pham, Huy N. Phan, Hoang N. Phan, Cuong Le Chi, Tien N. Nguyen, Nghi D. Q. Bui

Large language models (LLMs) are transforming automated program repair (APR) through agent-based approaches that localize bugs, generate patches, and verify fixes.

Program Repair

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

1 code implementation9 Sep 2024 Huy Nhat Phan, Tien N. Nguyen, Phong X. Nguyen, Nghi D. Q. Bui

Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks.

Fault localization GitHub issue resolution

XMainframe: A Large Language Model for Mainframe Modernization

1 code implementation5 Aug 2024 Anh T. V. Dau, Hieu Trung Dao, Anh Tuan Nguyen, Hieu Trung Tran, Phong X. Nguyen, Nghi D. Q. Bui

To this end, we introduce XMainframe, a state-of-the-art large language model (LLM) specifically designed with knowledge of mainframe legacy systems and COBOL codebases.

Code Summarization Language Modeling +5

On the Impacts of Contexts on Repository-Level Code Generation

1 code implementation17 Jun 2024 Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui

CodeLLMs have gained widespread adoption for code generation tasks, yet their capacity to handle repository-level code generation with complex contextual dependencies remains underexplored.

Code Generation

Envisioning the Next-Generation AI Coding Assistants: Insights & Proposals

no code implementations21 Mar 2024 Khanh Nghiem, Anh Minh Nguyen, Nghi D. Q. Bui

As a research-product hybrid group in AI for Software Engineering (AI4SE), we present four key takeaways from our experience developing in-IDE AI coding assistants.

RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion

1 code implementation10 Mar 2024 Huy N. Phan, Hoang N. Phan, Tien N. Nguyen, Nghi D. Q. Bui

Furthermore, RepoHyper leverages Expand and Refine retrieval method, including a graph expansion and a link prediction algorithm applied to the RSG, enabling the effective retrieval and prioritization of relevant code snippets.

Code Completion Link Prediction +1

Functional Overlap Reranking for Neural Code Generation

2 code implementations16 Oct 2023 Hung Quoc To, Minh Huynh Nguyen, Nghi D. Q. Bui

We introduce SRank, a novel reranking strategy for selecting the best solutions from code generation, focusing on modeling the relationships between clusters of solutions.

Code Generation

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

1 code implementation31 May 2023 Nghi D. Q. Bui, Hung Le, Yue Wang, Junnan Li, Akhilesh Deepak Gotmare, Steven C. H. Hoi

In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

2 code implementations13 May 2023 Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, Steven C. H. Hoi

To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.

Arithmetic Reasoning Code Completion +6

The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

1 code implementation9 May 2023 Dung Nguyen Manh, Nam Le Hai, Anh T. V. Dau, Anh Minh Nguyen, Khanh Nghiem, Jin Guo, Nghi D. Q. Bui

We present The Vault, a dataset of high-quality code-text pairs in multiple programming languages for training large language models to understand and generate code.

Code Generation Code Search +1

Class based Influence Functions for Error Detection

1 code implementation2 May 2023 Thang Nguyen-Duc, Hoang Thanh-Tung, Quan Hung Tran, Dang Huu-Tien, Hieu Ngoc Nguyen, Anh T. V. Dau, Nghi D. Q. Bui

Influence functions (IFs) are a powerful tool for detecting anomalous examples in large scale datasets.

Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5

no code implementations27 Nov 2022 Nghi D. Q. Bui, Yue Wang, Steven Hoi

Specifically, we propose three objectives to adapt the generic CodeT5 for debugging: a bug detection objective to determine whether a given code snippet is buggy or not, a bug localization objective to identify the buggy lines, and a program repair objective to translate the buggy code to its fixed version.

Bug fixing Language Modeling +2

HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations

no code implementations31 May 2022 Minh Huynh Nguyen, Nghi D. Q. Bui, Truong Son Hy, Long Tran-Thanh, Tien N. Nguyen

We propose a novel method for code summarization utilizing Heterogeneous Code Representations (HCRs) and our specially designed HierarchyNet.

Clone Detection Code Classification +2

Towards Using Data-Influence Methods to Detect Noisy Samples in Source Code Corpora

no code implementations25 May 2022 Anh T. V. Dau, Thang Nguyen-Duc, Hoang Thanh-Tung, Nghi D. Q. Bui

Despite the recent trend of developing and applying neural source code models to software engineering tasks, the quality of such models is insufficient for real-world use.

Code Classification Representation Learning

Energy-bounded Learning for Robust Models of Code

no code implementations20 Dec 2021 Nghi D. Q. Bui, Yijun Yu

In programming, learning code representations has a variety of applications, including code classification, code search, comment generation, bug prediction, and so on.

Code Classification Code Search +2

InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees

no code implementations13 Dec 2020 Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang

We trained an InferCode model instance using the Tree-based CNN as the encoder of a large set of Java code and applied it to downstream unsupervised tasks such as code clustering, code clone detection, cross-language code search or reused under a transfer learning scheme to continue training the model weights for supervised tasks such as code classification and method name prediction.

Clone Detection Code Classification +7

TreeCaps: Tree-Based Capsule Networks for Source Code Processing

no code implementations5 Sep 2020 Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang

Although syntax trees are precisely defined according to the language grammar and easier to construct and process than graphs, previous tree-based learning techniques have not been able to learn semantic information from trees to achieve better accuracy than graph-based techniques.

On the Generalizability of Neural Program Models with respect to Semantic-Preserving Program Transformations

1 code implementation31 Jul 2020 Md Rafiqul Islam Rabin, Nghi D. Q. Bui, Ke Wang, Yijun Yu, Lingxiao Jiang, Mohammad Amin Alipour

With the prevalence of publicly available source code repositories to train deep neural network models, neural program models can do well in source code analysis tasks such as predicting method names in given programs that cannot be easily done by traditional program analysis techniques.

Method name prediction

SAR: Learning Cross-Language API Mappings with Little Knowledge

no code implementations10 Jun 2019 Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang

However, all these approaches still require large amount of manual effort in preparing parallel program corpora, ranging from pairs of APIs, to manually identified code in different languages that are considered as functionally equivalent.

Domain Adaptation Translation

Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code

1 code implementation13 Mar 2018 Nghi D. Q. Bui, Lingxiao Jiang

Our preliminary evaluations on about 40, 000 Java and C# source files from 9 software projects show that our approach can automatically learn shared embeddings for various code elements in different languages and identify their cross-language mappings with reasonable Mean Average Precision scores.

Translation Word Embeddings

Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks

1 code implementation17 Oct 2017 Nghi D. Q. Bui, Lingxiao Jiang, Yijun Yu

It is layered on top of two tree-based convolutional neural networks (TBCNNs), each of which recognizes the algorithm of code written in an individual programming language.

Binary Classification C++ code +2

Cannot find the paper you are looking for? You can Submit a new open access paper.